feat: playlist shuffle, lazy resolution, TTS ducking, kept repair

Music: - #random URL fragment shuffles playlist tracks before enqueuing - Lazy playlist resolution: first 10 tracks resolve immediately, remaining are fetched in a background task - !kept repair re-downloads kept tracks with missing local files - !kept shows [MISSING] marker for tracks without local files - TTS ducking: music ducks when merlin speaks via voice peer, smooth restore after TTS finishes Performance (from profiling): - Connection pool: preload_content=True for SOCKS connection reuse - Pool tuning: 30 pools / 8 connections (up from 20/4) - _PooledResponse wrapper for stdlib-compatible read interface - Iterative _extract_videos (replace 51K-deep recursion with stack) - proxy=False for local SearXNG Voice + multi-bot: - Per-bot voice config lookup ([<username>.voice] in TOML) - Mute detection: skip duck silence when all users muted - Autoplay shuffle deck (no repeats until full cycle) - Seek clamp to track duration (prevent seek-past-end stall) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:21:47 +01:00
parent 6d6b957557
commit 6083de13f9
17 changed files with 1706 additions and 118 deletions
--- a/docs/AUDIO.md
+++ b/docs/AUDIO.md
@@ -0,0 +1,333 @@
+# Audio Engine -- Issues, Fixes, and Consolidation Notes
+
+Technical reference for the Mumble audio pipeline: known issues,
+applied fixes, architectural decisions, and areas for future work.
+
+## Architecture Overview
+
+```
+yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
+    -> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble
+```
+
+Key components:
+
+| File | Role |
+|------|------|
+| `src/derp/mumble.py` | `stream_audio()` -- PCM feed loop, volume ramp, seek |
+| `plugins/music.py` | Queue, play loop, fade orchestration, duck monitor |
+
+### Volume control layers (evaluated per-frame, highest priority first)
+
+1. **fade_vol** -- active during fade-out (skip/stop/pause); set to 0 as target
+2. **duck_vol** -- voice-activated ducking; snap to floor, linear restore
+3. **volume** -- user-set level (0-100)
+
+The play loop passes a lambda to `stream_audio`:
+
+```python
+volume=lambda: (
+    ps["fade_vol"]   if ps["fade_vol"] is not None else
+    ps["duck_vol"]   if ps["duck_vol"] is not None else
+    ps["volume"]
+) / 100.0
+```
+
+### Per-frame volume ramping
+
+`stream_audio` never jumps to the target volume. Each 20ms frame is
+ramped from `_cur_vol` toward `target` by at most `step`:
+
+- **_max_step** = 0.005 (~4s full ramp) -- ceiling for normal changes
+- **fade_in_step** -- computed from fade-in duration (default 5s)
+- **fade_step** -- override from plugin (fade-out on skip/stop/pause)
+
+When `abs(diff) < 0.0001`, flat scaling is used (avoids ramp artifacts
+on steady-state frames). Otherwise, `_scale_pcm_ramp()` linearly
+interpolates across all 960 samples in the frame.
+
+---
+
+## Issues and Fixes
+
+### 1. Alpine ffmpeg lacks librubberband
+
+**Symptom:** 13/15 voice audition samples failed. `rubberband` audio
+filter unavailable in ffmpeg.
+
+**Root cause:** Alpine's ffmpeg package is compiled without
+`--enable-librubberband`.
+
+**Fix:** Added `rubberband` CLI package to `Containerfile`. Created
+`_split_fx()` in `plugins/voice.py` to parse FX chains: pitch-shifting
+goes through the `rubberband` CLI binary, remaining filters (bass, echo)
+through ffmpeg. Two-stage pipeline.
+
+**Files:** `Containerfile`, `plugins/voice.py`
+
+---
+
+### 2. Self-ducking between bots
+
+**Symptom:** derp's music volume dropped when merlin spoke (TTS).
+
+**Root cause:** merlin's TTS output triggered `_on_sound_received`,
+which updated the shared `registry._voice_ts` timestamp. derp's duck
+monitor saw recent voice activity and ducked.
+
+**Fix:** `_on_sound_received` checks `registry._bots` and returns early
+for any bot username -- no timestamp update, no listener dispatch.
+
+```python
+def _on_sound_received(self, user, sound_chunk) -> None:
+    name = user["name"] if isinstance(user, dict) else None
+    bots = getattr(self.registry, "_bots", {})
+    if name and name in bots:
+        return  # ignore audio from bots entirely
+```
+
+**Files:** `src/derp/mumble.py`
+
+---
+
+### 3. Click/pop on skip/stop (fade-out cancellation)
+
+**Symptom:** Audible glitch at the end of fade-out when skipping or
+stopping a track.
+
+**Root cause:** `_fade_and_cancel()` fades volume to 0 over ~3s, then
+calls `task.cancel()`. In `stream_audio`, `CancelledError` triggers
+`clear_buffer()`, which drops any frames still queued in pymumble's
+output -- including frames that were encoded at non-zero amplitude a
+few frames earlier. The sudden buffer wipe produces a click.
+
+**Fix (two-part):**
+
+1. **Plugin side** (`music.py`): Added 150ms post-fade drain before
+   cancel, giving pymumble time to flush remaining silent frames.
+
+2. **Engine side** (`mumble.py`): `CancelledError` handler only calls
+   `clear_buffer()` if `_cur_vol > 0.01`. When a fade-out has already
+   driven volume to ~0, the remaining buffer frames are silent and
+   clearing them is unnecessary.
+
+```python
+# mumble.py -- CancelledError handler
+if _cur_vol > 0.01:
+    self._mumble.sound_output.clear_buffer()
+```
+
+```python
+# music.py -- _fade_and_cancel()
+await asyncio.sleep(duration)
+await asyncio.sleep(0.15)  # drain window
+task.cancel()
+```
+
+**Files:** `src/derp/mumble.py`, `plugins/music.py`
+
+---
+
+### 4. Fade-out math
+
+**How it works:** `_fade_and_cancel(duration=3.0)` computes the
+per-frame step from the current effective volume:
+
+```python
+cur_vol = (duck_vol or volume) / 100.0
+n_frames = duration / 0.02  # 150 frames for 3s
+step = cur_vol / n_frames
+```
+
+The play loop sets `ps["fade_vol"] = 0` (the target) and
+`ps["fade_step"] = step` (the rate). `stream_audio` ramps `_cur_vol`
+toward 0 at `step` per frame. At 50% volume: step = 0.0033, reaching
+zero in exactly 150 frames (3.0s).
+
+**Note:** `fade_vol` is set to 0 immediately, making the volume lambda
+return 0 as the target. The ramp code smoothly transitions -- there is
+no abrupt jump because `_cur_vol` tracks actual output level, not the
+target.
+
+---
+
+### 5. Self-mute lifecycle
+
+**Requirement:** merlin mutes on connect, unmutes only when emitting
+audio (TTS), re-mutes after a delay.
+
+**Implementation:**
+
+```
+connect -> mute()
+stream_audio start -> cancel pending mute task, unmute()
+stream_audio finally -> spawn _delayed_mute(3.0)
+```
+
+The 3-second delay prevents rapid mute/unmute flicker on back-to-back
+TTS. The mute task is cancelled if new audio starts before it fires.
+
+**Config:** `self_mute = true` in `[[mumble.extra]]`
+
+**Files:** `src/derp/mumble.py`
+
+---
+
+### 6. Self-deafen on connect
+
+**Requirement:** merlin deafens on connect (no audio reception needed).
+
+**Implementation:** `self_deaf = true` config flag, calls
+`self._mumble.users.myself.deafen()` in `_on_connected`.
+
+**Files:** `src/derp/mumble.py`, `config/derp.toml`
+
+---
+
+## Pause/Resume
+
+### Design
+
+`!pause` toggles between paused and playing states:
+
+**Pause:** Captures current track + elapsed position + monotonic
+timestamp. Fades out, cancels play loop. Queue is preserved.
+
+**Unpause:** Re-inserts track at queue front, starts play loop with
+seek. Two special behaviors:
+
+1. **Rewind:** 3s rewind on unpause for continuity (only if paused >= 3s
+   to prevent anti-flood: rapid toggle doesn't compound the rewind).
+
+2. **Stale stream:** If paused > 45s, cached stream files (in
+   `data/music/cache/`) are deleted so the play loop re-downloads.
+   Kept files (`data/music/`) are never deleted. Stream URLs from
+   YouTube et al. expire within minutes.
+
+3. **Fade-in:** Unpause always uses `fade_in=True` (5s ramp from 0).
+
+**State cleanup:** `!stop` clears `ps["paused"]`. The play loop's
+`finally` block skips `_cleanup_track` when paused (preserves the file).
+
+---
+
+## Autoplay
+
+### Design
+
+When `autoplay = true` (config), the play loop stays alive after the
+queue empties:
+
+1. Waits for silence (duck_silence threshold, default 15s)
+2. Picks one random kept track
+3. Plays it
+4. On completion, loops back to step 1
+
+This replaces the previous bulk-queue approach (shuffle all kept tracks
+at once). Benefits: no large upfront queue, silence-aware gaps between
+tracks, indefinite looping.
+
+### Resume persistence
+
+A background task saves track URL + elapsed position to the state DB
+every 10 seconds during playback:
+
+```python
+async def _periodic_save():
+    while True:
+        await asyncio.sleep(10)
+        el = cur_seek + progress[0] * 0.02
+        if el > 1.0:
+            _save_resume(bot, track, el)
+```
+
+On hard kill: resumes from at most ~10s behind. On normal track
+completion: `_clear_resume()` wipes the state.
+
+---
+
+## Voice Ducking
+
+### Flow
+
+```
+voice detected -> duck_vol = floor (instant)
+silence > duck_silence -> linear restore over duck_restore seconds
+```
+
+The duck monitor runs as a background task alongside the play loop.
+It updates `ps["duck_vol"]` which the volume lambda reads per-frame.
+
+### Restore ramp
+
+Restoration is linear from floor to user volume. The per-frame ramp in
+`stream_audio` further smooths each 1-second update from the monitor,
+eliminating audible steps.
+
+### Bot audio isolation
+
+Bot usernames (from `registry._bots`) are excluded from
+`_on_sound_received` entirely -- no timestamp update, no listener
+dispatch. This prevents self-ducking between derp and merlin.
+
+---
+
+## Seek (in-stream pipeline swap)
+
+### Design
+
+Seek rebuilds the ffmpeg pipeline at the new position without cancelling
+the play loop task. This avoids the overhead of re-downloading.
+
+1. Set `_seek_fading = True`, `_seek_fade_out = 10` (0.2s ramp-down)
+2. Continue reading frames, scaling by decreasing ratio
+3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
+4. 0.5s fade-in on the new pipeline
+
+### Consolidation note
+
+Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop
+fade-out (3s). This is intentional -- seek should feel responsive.
+The mechanisms are separate: seek uses frame-counting in
+`stream_audio`, skip/stop uses `_fade_and_cancel` in the plugin.
+
+---
+
+## Consolidation Opportunities
+
+### Volume control unification
+
+Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda
+per-frame. Works but the priority logic is implicit. A future refactor
+could use a single `effective_volume()` method that explicitly resolves
+priority and makes the per-frame cost clearer.
+
+### Fade-out ownership
+
+Skip/stop/pause all route through `_fade_and_cancel()` -- good. But the
+fade target is communicated indirectly via `ps["fade_vol"] = 0` and
+`ps["fade_step"]`, read by a lambda in the play loop, evaluated in
+`stream_audio`. A more explicit signal (e.g. an asyncio.Event or a
+dedicated fade state machine in `stream_audio`) could simplify reasoning
+about timing.
+
+### Buffer drain timing
+
+The 150ms post-fade drain is empirical. A more robust approach would be
+to query `sound_output.get_buffer_size()` and wait for it to drop below
+a threshold before cancelling. This would adapt to varying network
+conditions and pymumble buffer sizes.
+
+### Track duration
+
+Duration is probed via `ffprobe` after download (blocking, run in
+executor). For kept tracks, it's stored in state metadata. This is
+duplicated -- kept track metadata already has duration from
+`_fetch_metadata` (yt-dlp). The `ffprobe` path is the fallback for
+non-kept tracks. Could unify by always probing locally.
+
+### Periodic resume save interval
+
+Currently 10s fixed. Could be adaptive -- save more frequently near
+the start of a track (where losing position is more noticeable) and
+less frequently later. Marginal benefit vs. complexity though.