Files
derp/docs/AUDIO.md
user 6083de13f9
Some checks failed
CI / gitleaks (push) Failing after 3s
CI / lint (push) Successful in 22s
CI / test (3.11) (push) Failing after 2m47s
CI / test (3.13) (push) Failing after 2m52s
CI / test (3.12) (push) Failing after 2m54s
CI / build (push) Has been skipped
feat: playlist shuffle, lazy resolution, TTS ducking, kept repair
Music:
- #random URL fragment shuffles playlist tracks before enqueuing
- Lazy playlist resolution: first 10 tracks resolve immediately,
  remaining are fetched in a background task
- !kept repair re-downloads kept tracks with missing local files
- !kept shows [MISSING] marker for tracks without local files
- TTS ducking: music ducks when merlin speaks via voice peer,
  smooth restore after TTS finishes

Performance (from profiling):
- Connection pool: preload_content=True for SOCKS connection reuse
- Pool tuning: 30 pools / 8 connections (up from 20/4)
- _PooledResponse wrapper for stdlib-compatible read interface
- Iterative _extract_videos (replace 51K-deep recursion with stack)
- proxy=False for local SearXNG

Voice + multi-bot:
- Per-bot voice config lookup ([<username>.voice] in TOML)
- Mute detection: skip duck silence when all users muted
- Autoplay shuffle deck (no repeats until full cycle)
- Seek clamp to track duration (prevent seek-past-end stall)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 16:21:47 +01:00

334 lines
10 KiB
Markdown

# Audio Engine -- Issues, Fixes, and Consolidation Notes
Technical reference for the Mumble audio pipeline: known issues,
applied fixes, architectural decisions, and areas for future work.
## Architecture Overview
```
yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
-> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble
```
Key components:
| File | Role |
|------|------|
| `src/derp/mumble.py` | `stream_audio()` -- PCM feed loop, volume ramp, seek |
| `plugins/music.py` | Queue, play loop, fade orchestration, duck monitor |
### Volume control layers (evaluated per-frame, highest priority first)
1. **fade_vol** -- active during fade-out (skip/stop/pause); set to 0 as target
2. **duck_vol** -- voice-activated ducking; snap to floor, linear restore
3. **volume** -- user-set level (0-100)
The play loop passes a lambda to `stream_audio`:
```python
volume=lambda: (
ps["fade_vol"] if ps["fade_vol"] is not None else
ps["duck_vol"] if ps["duck_vol"] is not None else
ps["volume"]
) / 100.0
```
### Per-frame volume ramping
`stream_audio` never jumps to the target volume. Each 20ms frame is
ramped from `_cur_vol` toward `target` by at most `step`:
- **_max_step** = 0.005 (~4s full ramp) -- ceiling for normal changes
- **fade_in_step** -- computed from fade-in duration (default 5s)
- **fade_step** -- override from plugin (fade-out on skip/stop/pause)
When `abs(diff) < 0.0001`, flat scaling is used (avoids ramp artifacts
on steady-state frames). Otherwise, `_scale_pcm_ramp()` linearly
interpolates across all 960 samples in the frame.
---
## Issues and Fixes
### 1. Alpine ffmpeg lacks librubberband
**Symptom:** 13/15 voice audition samples failed. `rubberband` audio
filter unavailable in ffmpeg.
**Root cause:** Alpine's ffmpeg package is compiled without
`--enable-librubberband`.
**Fix:** Added `rubberband` CLI package to `Containerfile`. Created
`_split_fx()` in `plugins/voice.py` to parse FX chains: pitch-shifting
goes through the `rubberband` CLI binary, remaining filters (bass, echo)
through ffmpeg. Two-stage pipeline.
**Files:** `Containerfile`, `plugins/voice.py`
---
### 2. Self-ducking between bots
**Symptom:** derp's music volume dropped when merlin spoke (TTS).
**Root cause:** merlin's TTS output triggered `_on_sound_received`,
which updated the shared `registry._voice_ts` timestamp. derp's duck
monitor saw recent voice activity and ducked.
**Fix:** `_on_sound_received` checks `registry._bots` and returns early
for any bot username -- no timestamp update, no listener dispatch.
```python
def _on_sound_received(self, user, sound_chunk) -> None:
name = user["name"] if isinstance(user, dict) else None
bots = getattr(self.registry, "_bots", {})
if name and name in bots:
return # ignore audio from bots entirely
```
**Files:** `src/derp/mumble.py`
---
### 3. Click/pop on skip/stop (fade-out cancellation)
**Symptom:** Audible glitch at the end of fade-out when skipping or
stopping a track.
**Root cause:** `_fade_and_cancel()` fades volume to 0 over ~3s, then
calls `task.cancel()`. In `stream_audio`, `CancelledError` triggers
`clear_buffer()`, which drops any frames still queued in pymumble's
output -- including frames that were encoded at non-zero amplitude a
few frames earlier. The sudden buffer wipe produces a click.
**Fix (two-part):**
1. **Plugin side** (`music.py`): Added 150ms post-fade drain before
cancel, giving pymumble time to flush remaining silent frames.
2. **Engine side** (`mumble.py`): `CancelledError` handler only calls
`clear_buffer()` if `_cur_vol > 0.01`. When a fade-out has already
driven volume to ~0, the remaining buffer frames are silent and
clearing them is unnecessary.
```python
# mumble.py -- CancelledError handler
if _cur_vol > 0.01:
self._mumble.sound_output.clear_buffer()
```
```python
# music.py -- _fade_and_cancel()
await asyncio.sleep(duration)
await asyncio.sleep(0.15) # drain window
task.cancel()
```
**Files:** `src/derp/mumble.py`, `plugins/music.py`
---
### 4. Fade-out math
**How it works:** `_fade_and_cancel(duration=3.0)` computes the
per-frame step from the current effective volume:
```python
cur_vol = (duck_vol or volume) / 100.0
n_frames = duration / 0.02 # 150 frames for 3s
step = cur_vol / n_frames
```
The play loop sets `ps["fade_vol"] = 0` (the target) and
`ps["fade_step"] = step` (the rate). `stream_audio` ramps `_cur_vol`
toward 0 at `step` per frame. At 50% volume: step = 0.0033, reaching
zero in exactly 150 frames (3.0s).
**Note:** `fade_vol` is set to 0 immediately, making the volume lambda
return 0 as the target. The ramp code smoothly transitions -- there is
no abrupt jump because `_cur_vol` tracks actual output level, not the
target.
---
### 5. Self-mute lifecycle
**Requirement:** merlin mutes on connect, unmutes only when emitting
audio (TTS), re-mutes after a delay.
**Implementation:**
```
connect -> mute()
stream_audio start -> cancel pending mute task, unmute()
stream_audio finally -> spawn _delayed_mute(3.0)
```
The 3-second delay prevents rapid mute/unmute flicker on back-to-back
TTS. The mute task is cancelled if new audio starts before it fires.
**Config:** `self_mute = true` in `[[mumble.extra]]`
**Files:** `src/derp/mumble.py`
---
### 6. Self-deafen on connect
**Requirement:** merlin deafens on connect (no audio reception needed).
**Implementation:** `self_deaf = true` config flag, calls
`self._mumble.users.myself.deafen()` in `_on_connected`.
**Files:** `src/derp/mumble.py`, `config/derp.toml`
---
## Pause/Resume
### Design
`!pause` toggles between paused and playing states:
**Pause:** Captures current track + elapsed position + monotonic
timestamp. Fades out, cancels play loop. Queue is preserved.
**Unpause:** Re-inserts track at queue front, starts play loop with
seek. Two special behaviors:
1. **Rewind:** 3s rewind on unpause for continuity (only if paused >= 3s
to prevent anti-flood: rapid toggle doesn't compound the rewind).
2. **Stale stream:** If paused > 45s, cached stream files (in
`data/music/cache/`) are deleted so the play loop re-downloads.
Kept files (`data/music/`) are never deleted. Stream URLs from
YouTube et al. expire within minutes.
3. **Fade-in:** Unpause always uses `fade_in=True` (5s ramp from 0).
**State cleanup:** `!stop` clears `ps["paused"]`. The play loop's
`finally` block skips `_cleanup_track` when paused (preserves the file).
---
## Autoplay
### Design
When `autoplay = true` (config), the play loop stays alive after the
queue empties:
1. Waits for silence (duck_silence threshold, default 15s)
2. Picks one random kept track
3. Plays it
4. On completion, loops back to step 1
This replaces the previous bulk-queue approach (shuffle all kept tracks
at once). Benefits: no large upfront queue, silence-aware gaps between
tracks, indefinite looping.
### Resume persistence
A background task saves track URL + elapsed position to the state DB
every 10 seconds during playback:
```python
async def _periodic_save():
while True:
await asyncio.sleep(10)
el = cur_seek + progress[0] * 0.02
if el > 1.0:
_save_resume(bot, track, el)
```
On hard kill: resumes from at most ~10s behind. On normal track
completion: `_clear_resume()` wipes the state.
---
## Voice Ducking
### Flow
```
voice detected -> duck_vol = floor (instant)
silence > duck_silence -> linear restore over duck_restore seconds
```
The duck monitor runs as a background task alongside the play loop.
It updates `ps["duck_vol"]` which the volume lambda reads per-frame.
### Restore ramp
Restoration is linear from floor to user volume. The per-frame ramp in
`stream_audio` further smooths each 1-second update from the monitor,
eliminating audible steps.
### Bot audio isolation
Bot usernames (from `registry._bots`) are excluded from
`_on_sound_received` entirely -- no timestamp update, no listener
dispatch. This prevents self-ducking between derp and merlin.
---
## Seek (in-stream pipeline swap)
### Design
Seek rebuilds the ffmpeg pipeline at the new position without cancelling
the play loop task. This avoids the overhead of re-downloading.
1. Set `_seek_fading = True`, `_seek_fade_out = 10` (0.2s ramp-down)
2. Continue reading frames, scaling by decreasing ratio
3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
4. 0.5s fade-in on the new pipeline
### Consolidation note
Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop
fade-out (3s). This is intentional -- seek should feel responsive.
The mechanisms are separate: seek uses frame-counting in
`stream_audio`, skip/stop uses `_fade_and_cancel` in the plugin.
---
## Consolidation Opportunities
### Volume control unification
Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda
per-frame. Works but the priority logic is implicit. A future refactor
could use a single `effective_volume()` method that explicitly resolves
priority and makes the per-frame cost clearer.
### Fade-out ownership
Skip/stop/pause all route through `_fade_and_cancel()` -- good. But the
fade target is communicated indirectly via `ps["fade_vol"] = 0` and
`ps["fade_step"]`, read by a lambda in the play loop, evaluated in
`stream_audio`. A more explicit signal (e.g. an asyncio.Event or a
dedicated fade state machine in `stream_audio`) could simplify reasoning
about timing.
### Buffer drain timing
The 150ms post-fade drain is empirical. A more robust approach would be
to query `sound_output.get_buffer_size()` and wait for it to drop below
a threshold before cancelling. This would adapt to varying network
conditions and pymumble buffer sizes.
### Track duration
Duration is probed via `ffprobe` after download (blocking, run in
executor). For kept tracks, it's stored in state metadata. This is
duplicated -- kept track metadata already has duration from
`_fetch_metadata` (yt-dlp). The `ffprobe` path is the fallback for
non-kept tracks. Could unify by always probing locally.
### Periodic resume save interval
Currently 10s fixed. Could be adaptive -- save more frequently near
the start of a track (where losing position is more noticeable) and
less frequently later. Marginal benefit vs. complexity though.