Files
derp/docs/AUDIO.md
user 6083de13f9
Some checks failed
CI / gitleaks (push) Failing after 3s
CI / lint (push) Successful in 22s
CI / test (3.11) (push) Failing after 2m47s
CI / test (3.13) (push) Failing after 2m52s
CI / test (3.12) (push) Failing after 2m54s
CI / build (push) Has been skipped
feat: playlist shuffle, lazy resolution, TTS ducking, kept repair
Music:
- #random URL fragment shuffles playlist tracks before enqueuing
- Lazy playlist resolution: first 10 tracks resolve immediately,
  remaining are fetched in a background task
- !kept repair re-downloads kept tracks with missing local files
- !kept shows [MISSING] marker for tracks without local files
- TTS ducking: music ducks when merlin speaks via voice peer,
  smooth restore after TTS finishes

Performance (from profiling):
- Connection pool: preload_content=True for SOCKS connection reuse
- Pool tuning: 30 pools / 8 connections (up from 20/4)
- _PooledResponse wrapper for stdlib-compatible read interface
- Iterative _extract_videos (replace 51K-deep recursion with stack)
- proxy=False for local SearXNG

Voice + multi-bot:
- Per-bot voice config lookup ([<username>.voice] in TOML)
- Mute detection: skip duck silence when all users muted
- Autoplay shuffle deck (no repeats until full cycle)
- Seek clamp to track duration (prevent seek-past-end stall)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:21:47 +01:00

10 KiB

Audio Engine -- Issues, Fixes, and Consolidation Notes

Technical reference for the Mumble audio pipeline: known issues, applied fixes, architectural decisions, and areas for future work.

Architecture Overview

yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
    -> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble

Key components:

File Role
src/derp/mumble.py stream_audio() -- PCM feed loop, volume ramp, seek
plugins/music.py Queue, play loop, fade orchestration, duck monitor

Volume control layers (evaluated per-frame, highest priority first)

  1. fade_vol -- active during fade-out (skip/stop/pause); set to 0 as target
  2. duck_vol -- voice-activated ducking; snap to floor, linear restore
  3. volume -- user-set level (0-100)

The play loop passes a lambda to stream_audio:

volume=lambda: (
    ps["fade_vol"]   if ps["fade_vol"] is not None else
    ps["duck_vol"]   if ps["duck_vol"] is not None else
    ps["volume"]
) / 100.0

Per-frame volume ramping

stream_audio never jumps to the target volume. Each 20ms frame is ramped from _cur_vol toward target by at most step:

  • _max_step = 0.005 (~4s full ramp) -- ceiling for normal changes
  • fade_in_step -- computed from fade-in duration (default 5s)
  • fade_step -- override from plugin (fade-out on skip/stop/pause)

When abs(diff) < 0.0001, flat scaling is used (avoids ramp artifacts on steady-state frames). Otherwise, _scale_pcm_ramp() linearly interpolates across all 960 samples in the frame.


Issues and Fixes

1. Alpine ffmpeg lacks librubberband

Symptom: 13/15 voice audition samples failed. rubberband audio filter unavailable in ffmpeg.

Root cause: Alpine's ffmpeg package is compiled without --enable-librubberband.

Fix: Added rubberband CLI package to Containerfile. Created _split_fx() in plugins/voice.py to parse FX chains: pitch-shifting goes through the rubberband CLI binary, remaining filters (bass, echo) through ffmpeg. Two-stage pipeline.

Files: Containerfile, plugins/voice.py


2. Self-ducking between bots

Symptom: derp's music volume dropped when merlin spoke (TTS).

Root cause: merlin's TTS output triggered _on_sound_received, which updated the shared registry._voice_ts timestamp. derp's duck monitor saw recent voice activity and ducked.

Fix: _on_sound_received checks registry._bots and returns early for any bot username -- no timestamp update, no listener dispatch.

def _on_sound_received(self, user, sound_chunk) -> None:
    name = user["name"] if isinstance(user, dict) else None
    bots = getattr(self.registry, "_bots", {})
    if name and name in bots:
        return  # ignore audio from bots entirely

Files: src/derp/mumble.py


3. Click/pop on skip/stop (fade-out cancellation)

Symptom: Audible glitch at the end of fade-out when skipping or stopping a track.

Root cause: _fade_and_cancel() fades volume to 0 over ~3s, then calls task.cancel(). In stream_audio, CancelledError triggers clear_buffer(), which drops any frames still queued in pymumble's output -- including frames that were encoded at non-zero amplitude a few frames earlier. The sudden buffer wipe produces a click.

Fix (two-part):

  1. Plugin side (music.py): Added 150ms post-fade drain before cancel, giving pymumble time to flush remaining silent frames.

  2. Engine side (mumble.py): CancelledError handler only calls clear_buffer() if _cur_vol > 0.01. When a fade-out has already driven volume to ~0, the remaining buffer frames are silent and clearing them is unnecessary.

# mumble.py -- CancelledError handler
if _cur_vol > 0.01:
    self._mumble.sound_output.clear_buffer()
# music.py -- _fade_and_cancel()
await asyncio.sleep(duration)
await asyncio.sleep(0.15)  # drain window
task.cancel()

Files: src/derp/mumble.py, plugins/music.py


4. Fade-out math

How it works: _fade_and_cancel(duration=3.0) computes the per-frame step from the current effective volume:

cur_vol = (duck_vol or volume) / 100.0
n_frames = duration / 0.02  # 150 frames for 3s
step = cur_vol / n_frames

The play loop sets ps["fade_vol"] = 0 (the target) and ps["fade_step"] = step (the rate). stream_audio ramps _cur_vol toward 0 at step per frame. At 50% volume: step = 0.0033, reaching zero in exactly 150 frames (3.0s).

Note: fade_vol is set to 0 immediately, making the volume lambda return 0 as the target. The ramp code smoothly transitions -- there is no abrupt jump because _cur_vol tracks actual output level, not the target.


5. Self-mute lifecycle

Requirement: merlin mutes on connect, unmutes only when emitting audio (TTS), re-mutes after a delay.

Implementation:

connect -> mute()
stream_audio start -> cancel pending mute task, unmute()
stream_audio finally -> spawn _delayed_mute(3.0)

The 3-second delay prevents rapid mute/unmute flicker on back-to-back TTS. The mute task is cancelled if new audio starts before it fires.

Config: self_mute = true in [[mumble.extra]]

Files: src/derp/mumble.py


6. Self-deafen on connect

Requirement: merlin deafens on connect (no audio reception needed).

Implementation: self_deaf = true config flag, calls self._mumble.users.myself.deafen() in _on_connected.

Files: src/derp/mumble.py, config/derp.toml


Pause/Resume

Design

!pause toggles between paused and playing states:

Pause: Captures current track + elapsed position + monotonic timestamp. Fades out, cancels play loop. Queue is preserved.

Unpause: Re-inserts track at queue front, starts play loop with seek. Two special behaviors:

  1. Rewind: 3s rewind on unpause for continuity (only if paused >= 3s to prevent anti-flood: rapid toggle doesn't compound the rewind).

  2. Stale stream: If paused > 45s, cached stream files (in data/music/cache/) are deleted so the play loop re-downloads. Kept files (data/music/) are never deleted. Stream URLs from YouTube et al. expire within minutes.

  3. Fade-in: Unpause always uses fade_in=True (5s ramp from 0).

State cleanup: !stop clears ps["paused"]. The play loop's finally block skips _cleanup_track when paused (preserves the file).


Autoplay

Design

When autoplay = true (config), the play loop stays alive after the queue empties:

  1. Waits for silence (duck_silence threshold, default 15s)
  2. Picks one random kept track
  3. Plays it
  4. On completion, loops back to step 1

This replaces the previous bulk-queue approach (shuffle all kept tracks at once). Benefits: no large upfront queue, silence-aware gaps between tracks, indefinite looping.

Resume persistence

A background task saves track URL + elapsed position to the state DB every 10 seconds during playback:

async def _periodic_save():
    while True:
        await asyncio.sleep(10)
        el = cur_seek + progress[0] * 0.02
        if el > 1.0:
            _save_resume(bot, track, el)

On hard kill: resumes from at most ~10s behind. On normal track completion: _clear_resume() wipes the state.


Voice Ducking

Flow

voice detected -> duck_vol = floor (instant)
silence > duck_silence -> linear restore over duck_restore seconds

The duck monitor runs as a background task alongside the play loop. It updates ps["duck_vol"] which the volume lambda reads per-frame.

Restore ramp

Restoration is linear from floor to user volume. The per-frame ramp in stream_audio further smooths each 1-second update from the monitor, eliminating audible steps.

Bot audio isolation

Bot usernames (from registry._bots) are excluded from _on_sound_received entirely -- no timestamp update, no listener dispatch. This prevents self-ducking between derp and merlin.


Seek (in-stream pipeline swap)

Design

Seek rebuilds the ffmpeg pipeline at the new position without cancelling the play loop task. This avoids the overhead of re-downloading.

  1. Set _seek_fading = True, _seek_fade_out = 10 (0.2s ramp-down)
  2. Continue reading frames, scaling by decreasing ratio
  3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
  4. 0.5s fade-in on the new pipeline

Consolidation note

Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop fade-out (3s). This is intentional -- seek should feel responsive. The mechanisms are separate: seek uses frame-counting in stream_audio, skip/stop uses _fade_and_cancel in the plugin.


Consolidation Opportunities

Volume control unification

Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda per-frame. Works but the priority logic is implicit. A future refactor could use a single effective_volume() method that explicitly resolves priority and makes the per-frame cost clearer.

Fade-out ownership

Skip/stop/pause all route through _fade_and_cancel() -- good. But the fade target is communicated indirectly via ps["fade_vol"] = 0 and ps["fade_step"], read by a lambda in the play loop, evaluated in stream_audio. A more explicit signal (e.g. an asyncio.Event or a dedicated fade state machine in stream_audio) could simplify reasoning about timing.

Buffer drain timing

The 150ms post-fade drain is empirical. A more robust approach would be to query sound_output.get_buffer_size() and wait for it to drop below a threshold before cancelling. This would adapt to varying network conditions and pymumble buffer sizes.

Track duration

Duration is probed via ffprobe after download (blocking, run in executor). For kept tracks, it's stored in state metadata. This is duplicated -- kept track metadata already has duration from _fetch_metadata (yt-dlp). The ffprobe path is the fallback for non-kept tracks. Could unify by always probing locally.

Periodic resume save interval

Currently 10s fixed. Could be adaptive -- save more frequently near the start of a track (where losing position is more noticeable) and less frequently later. Marginal benefit vs. complexity though.