# Audio Engine -- Issues, Fixes, and Consolidation Notes Technical reference for the Mumble audio pipeline: known issues, applied fixes, architectural decisions, and areas for future work. ## Architecture Overview ``` yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms) -> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble ``` Key components: | File | Role | |------|------| | `src/derp/mumble.py` | `stream_audio()` -- PCM feed loop, volume ramp, seek | | `plugins/music.py` | Queue, play loop, fade orchestration, duck monitor | ### Volume control layers (evaluated per-frame, highest priority first) 1. **fade_vol** -- active during fade-out (skip/stop/pause); set to 0 as target 2. **duck_vol** -- voice-activated ducking; snap to floor, linear restore 3. **volume** -- user-set level (0-100) The play loop passes a lambda to `stream_audio`: ```python volume=lambda: ( ps["fade_vol"] if ps["fade_vol"] is not None else ps["duck_vol"] if ps["duck_vol"] is not None else ps["volume"] ) / 100.0 ``` ### Per-frame volume ramping `stream_audio` never jumps to the target volume. Each 20ms frame is ramped from `_cur_vol` toward `target` by at most `step`: - **_max_step** = 0.005 (~4s full ramp) -- ceiling for normal changes - **fade_in_step** -- computed from fade-in duration (default 5s) - **fade_step** -- override from plugin (fade-out on skip/stop/pause) When `abs(diff) < 0.0001`, flat scaling is used (avoids ramp artifacts on steady-state frames). Otherwise, `_scale_pcm_ramp()` linearly interpolates across all 960 samples in the frame. --- ## Issues and Fixes ### 1. Alpine ffmpeg lacks librubberband **Symptom:** 13/15 voice audition samples failed. `rubberband` audio filter unavailable in ffmpeg. **Root cause:** Alpine's ffmpeg package is compiled without `--enable-librubberband`. **Fix:** Added `rubberband` CLI package to `Containerfile`. Created `_split_fx()` in `plugins/voice.py` to parse FX chains: pitch-shifting goes through the `rubberband` CLI binary, remaining filters (bass, echo) through ffmpeg. Two-stage pipeline. **Files:** `Containerfile`, `plugins/voice.py` --- ### 2. Self-ducking between bots **Symptom:** derp's music volume dropped when merlin spoke (TTS). **Root cause:** merlin's TTS output triggered `_on_sound_received`, which updated the shared `registry._voice_ts` timestamp. derp's duck monitor saw recent voice activity and ducked. **Fix:** `_on_sound_received` checks `registry._bots` and returns early for any bot username -- no timestamp update, no listener dispatch. ```python def _on_sound_received(self, user, sound_chunk) -> None: name = user["name"] if isinstance(user, dict) else None bots = getattr(self.registry, "_bots", {}) if name and name in bots: return # ignore audio from bots entirely ``` **Files:** `src/derp/mumble.py` --- ### 3. Click/pop on skip/stop (fade-out cancellation) **Symptom:** Audible glitch at the end of fade-out when skipping or stopping a track. **Root cause:** `_fade_and_cancel()` fades volume to 0 over ~3s, then calls `task.cancel()`. In `stream_audio`, `CancelledError` triggers `clear_buffer()`, which drops any frames still queued in pymumble's output -- including frames that were encoded at non-zero amplitude a few frames earlier. The sudden buffer wipe produces a click. **Fix (two-part):** 1. **Plugin side** (`music.py`): Added 150ms post-fade drain before cancel, giving pymumble time to flush remaining silent frames. 2. **Engine side** (`mumble.py`): `CancelledError` handler only calls `clear_buffer()` if `_cur_vol > 0.01`. When a fade-out has already driven volume to ~0, the remaining buffer frames are silent and clearing them is unnecessary. ```python # mumble.py -- CancelledError handler if _cur_vol > 0.01: self._mumble.sound_output.clear_buffer() ``` ```python # music.py -- _fade_and_cancel() await asyncio.sleep(duration) await asyncio.sleep(0.15) # drain window task.cancel() ``` **Files:** `src/derp/mumble.py`, `plugins/music.py` --- ### 4. Fade-out math **How it works:** `_fade_and_cancel(duration=3.0)` computes the per-frame step from the current effective volume: ```python cur_vol = (duck_vol or volume) / 100.0 n_frames = duration / 0.02 # 150 frames for 3s step = cur_vol / n_frames ``` The play loop sets `ps["fade_vol"] = 0` (the target) and `ps["fade_step"] = step` (the rate). `stream_audio` ramps `_cur_vol` toward 0 at `step` per frame. At 50% volume: step = 0.0033, reaching zero in exactly 150 frames (3.0s). **Note:** `fade_vol` is set to 0 immediately, making the volume lambda return 0 as the target. The ramp code smoothly transitions -- there is no abrupt jump because `_cur_vol` tracks actual output level, not the target. --- ### 5. Self-mute lifecycle **Requirement:** merlin mutes on connect, unmutes only when emitting audio (TTS), re-mutes after a delay. **Implementation:** ``` connect -> mute() stream_audio start -> cancel pending mute task, unmute() stream_audio finally -> spawn _delayed_mute(3.0) ``` The 3-second delay prevents rapid mute/unmute flicker on back-to-back TTS. The mute task is cancelled if new audio starts before it fires. **Config:** `self_mute = true` in `[[mumble.extra]]` **Files:** `src/derp/mumble.py` --- ### 6. Self-deafen on connect **Requirement:** merlin deafens on connect (no audio reception needed). **Implementation:** `self_deaf = true` config flag, calls `self._mumble.users.myself.deafen()` in `_on_connected`. **Files:** `src/derp/mumble.py`, `config/derp.toml` --- ## Pause/Resume ### Design `!pause` toggles between paused and playing states: **Pause:** Captures current track + elapsed position + monotonic timestamp. Fades out, cancels play loop. Queue is preserved. **Unpause:** Re-inserts track at queue front, starts play loop with seek. Two special behaviors: 1. **Rewind:** 3s rewind on unpause for continuity (only if paused >= 3s to prevent anti-flood: rapid toggle doesn't compound the rewind). 2. **Stale stream:** If paused > 45s, cached stream files (in `data/music/cache/`) are deleted so the play loop re-downloads. Kept files (`data/music/`) are never deleted. Stream URLs from YouTube et al. expire within minutes. 3. **Fade-in:** Unpause always uses `fade_in=True` (5s ramp from 0). **State cleanup:** `!stop` clears `ps["paused"]`. The play loop's `finally` block skips `_cleanup_track` when paused (preserves the file). --- ## Autoplay ### Design When `autoplay = true` (config), the play loop stays alive after the queue empties: 1. Waits for silence (duck_silence threshold, default 15s) 2. Picks one random kept track 3. Plays it 4. On completion, loops back to step 1 This replaces the previous bulk-queue approach (shuffle all kept tracks at once). Benefits: no large upfront queue, silence-aware gaps between tracks, indefinite looping. ### Resume persistence A background task saves track URL + elapsed position to the state DB every 10 seconds during playback: ```python async def _periodic_save(): while True: await asyncio.sleep(10) el = cur_seek + progress[0] * 0.02 if el > 1.0: _save_resume(bot, track, el) ``` On hard kill: resumes from at most ~10s behind. On normal track completion: `_clear_resume()` wipes the state. --- ## Voice Ducking ### Flow ``` voice detected -> duck_vol = floor (instant) silence > duck_silence -> linear restore over duck_restore seconds ``` The duck monitor runs as a background task alongside the play loop. It updates `ps["duck_vol"]` which the volume lambda reads per-frame. ### Restore ramp Restoration is linear from floor to user volume. The per-frame ramp in `stream_audio` further smooths each 1-second update from the monitor, eliminating audible steps. ### Bot audio isolation Bot usernames (from `registry._bots`) are excluded from `_on_sound_received` entirely -- no timestamp update, no listener dispatch. This prevents self-ducking between derp and merlin. --- ## Seek (in-stream pipeline swap) ### Design Seek rebuilds the ffmpeg pipeline at the new position without cancelling the play loop task. This avoids the overhead of re-downloading. 1. Set `_seek_fading = True`, `_seek_fade_out = 10` (0.2s ramp-down) 2. Continue reading frames, scaling by decreasing ratio 3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline 4. 0.5s fade-in on the new pipeline ### Consolidation note Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop fade-out (3s). This is intentional -- seek should feel responsive. The mechanisms are separate: seek uses frame-counting in `stream_audio`, skip/stop uses `_fade_and_cancel` in the plugin. --- ## Consolidation Opportunities ### Volume control unification Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda per-frame. Works but the priority logic is implicit. A future refactor could use a single `effective_volume()` method that explicitly resolves priority and makes the per-frame cost clearer. ### Fade-out ownership Skip/stop/pause all route through `_fade_and_cancel()` -- good. But the fade target is communicated indirectly via `ps["fade_vol"] = 0` and `ps["fade_step"]`, read by a lambda in the play loop, evaluated in `stream_audio`. A more explicit signal (e.g. an asyncio.Event or a dedicated fade state machine in `stream_audio`) could simplify reasoning about timing. ### Buffer drain timing The 150ms post-fade drain is empirical. A more robust approach would be to query `sound_output.get_buffer_size()` and wait for it to drop below a threshold before cancelling. This would adapt to varying network conditions and pymumble buffer sizes. ### Track duration Duration is probed via `ffprobe` after download (blocking, run in executor). For kept tracks, it's stored in state metadata. This is duplicated -- kept track metadata already has duration from `_fetch_metadata` (yt-dlp). The `ffprobe` path is the fallback for non-kept tracks. Could unify by always probing locally. ### Periodic resume save interval Currently 10s fixed. Could be adaptive -- save more frequently near the start of a track (where losing position is more noticeable) and less frequently later. Marginal benefit vs. complexity though.