Music: - #random URL fragment shuffles playlist tracks before enqueuing - Lazy playlist resolution: first 10 tracks resolve immediately, remaining are fetched in a background task - !kept repair re-downloads kept tracks with missing local files - !kept shows [MISSING] marker for tracks without local files - TTS ducking: music ducks when merlin speaks via voice peer, smooth restore after TTS finishes Performance (from profiling): - Connection pool: preload_content=True for SOCKS connection reuse - Pool tuning: 30 pools / 8 connections (up from 20/4) - _PooledResponse wrapper for stdlib-compatible read interface - Iterative _extract_videos (replace 51K-deep recursion with stack) - proxy=False for local SearXNG Voice + multi-bot: - Per-bot voice config lookup ([<username>.voice] in TOML) - Mute detection: skip duck silence when all users muted - Autoplay shuffle deck (no repeats until full cycle) - Seek clamp to track duration (prevent seek-past-end stall) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
Audio Engine -- Issues, Fixes, and Consolidation Notes
Technical reference for the Mumble audio pipeline: known issues, applied fixes, architectural decisions, and areas for future work.
Architecture Overview
yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
-> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble
Key components:
| File | Role |
|---|---|
src/derp/mumble.py |
stream_audio() -- PCM feed loop, volume ramp, seek |
plugins/music.py |
Queue, play loop, fade orchestration, duck monitor |
Volume control layers (evaluated per-frame, highest priority first)
- fade_vol -- active during fade-out (skip/stop/pause); set to 0 as target
- duck_vol -- voice-activated ducking; snap to floor, linear restore
- volume -- user-set level (0-100)
The play loop passes a lambda to stream_audio:
volume=lambda: (
ps["fade_vol"] if ps["fade_vol"] is not None else
ps["duck_vol"] if ps["duck_vol"] is not None else
ps["volume"]
) / 100.0
Per-frame volume ramping
stream_audio never jumps to the target volume. Each 20ms frame is
ramped from _cur_vol toward target by at most step:
- _max_step = 0.005 (~4s full ramp) -- ceiling for normal changes
- fade_in_step -- computed from fade-in duration (default 5s)
- fade_step -- override from plugin (fade-out on skip/stop/pause)
When abs(diff) < 0.0001, flat scaling is used (avoids ramp artifacts
on steady-state frames). Otherwise, _scale_pcm_ramp() linearly
interpolates across all 960 samples in the frame.
Issues and Fixes
1. Alpine ffmpeg lacks librubberband
Symptom: 13/15 voice audition samples failed. rubberband audio
filter unavailable in ffmpeg.
Root cause: Alpine's ffmpeg package is compiled without
--enable-librubberband.
Fix: Added rubberband CLI package to Containerfile. Created
_split_fx() in plugins/voice.py to parse FX chains: pitch-shifting
goes through the rubberband CLI binary, remaining filters (bass, echo)
through ffmpeg. Two-stage pipeline.
Files: Containerfile, plugins/voice.py
2. Self-ducking between bots
Symptom: derp's music volume dropped when merlin spoke (TTS).
Root cause: merlin's TTS output triggered _on_sound_received,
which updated the shared registry._voice_ts timestamp. derp's duck
monitor saw recent voice activity and ducked.
Fix: _on_sound_received checks registry._bots and returns early
for any bot username -- no timestamp update, no listener dispatch.
def _on_sound_received(self, user, sound_chunk) -> None:
name = user["name"] if isinstance(user, dict) else None
bots = getattr(self.registry, "_bots", {})
if name and name in bots:
return # ignore audio from bots entirely
Files: src/derp/mumble.py
3. Click/pop on skip/stop (fade-out cancellation)
Symptom: Audible glitch at the end of fade-out when skipping or stopping a track.
Root cause: _fade_and_cancel() fades volume to 0 over ~3s, then
calls task.cancel(). In stream_audio, CancelledError triggers
clear_buffer(), which drops any frames still queued in pymumble's
output -- including frames that were encoded at non-zero amplitude a
few frames earlier. The sudden buffer wipe produces a click.
Fix (two-part):
-
Plugin side (
music.py): Added 150ms post-fade drain before cancel, giving pymumble time to flush remaining silent frames. -
Engine side (
mumble.py):CancelledErrorhandler only callsclear_buffer()if_cur_vol > 0.01. When a fade-out has already driven volume to ~0, the remaining buffer frames are silent and clearing them is unnecessary.
# mumble.py -- CancelledError handler
if _cur_vol > 0.01:
self._mumble.sound_output.clear_buffer()
# music.py -- _fade_and_cancel()
await asyncio.sleep(duration)
await asyncio.sleep(0.15) # drain window
task.cancel()
Files: src/derp/mumble.py, plugins/music.py
4. Fade-out math
How it works: _fade_and_cancel(duration=3.0) computes the
per-frame step from the current effective volume:
cur_vol = (duck_vol or volume) / 100.0
n_frames = duration / 0.02 # 150 frames for 3s
step = cur_vol / n_frames
The play loop sets ps["fade_vol"] = 0 (the target) and
ps["fade_step"] = step (the rate). stream_audio ramps _cur_vol
toward 0 at step per frame. At 50% volume: step = 0.0033, reaching
zero in exactly 150 frames (3.0s).
Note: fade_vol is set to 0 immediately, making the volume lambda
return 0 as the target. The ramp code smoothly transitions -- there is
no abrupt jump because _cur_vol tracks actual output level, not the
target.
5. Self-mute lifecycle
Requirement: merlin mutes on connect, unmutes only when emitting audio (TTS), re-mutes after a delay.
Implementation:
connect -> mute()
stream_audio start -> cancel pending mute task, unmute()
stream_audio finally -> spawn _delayed_mute(3.0)
The 3-second delay prevents rapid mute/unmute flicker on back-to-back TTS. The mute task is cancelled if new audio starts before it fires.
Config: self_mute = true in [[mumble.extra]]
Files: src/derp/mumble.py
6. Self-deafen on connect
Requirement: merlin deafens on connect (no audio reception needed).
Implementation: self_deaf = true config flag, calls
self._mumble.users.myself.deafen() in _on_connected.
Files: src/derp/mumble.py, config/derp.toml
Pause/Resume
Design
!pause toggles between paused and playing states:
Pause: Captures current track + elapsed position + monotonic timestamp. Fades out, cancels play loop. Queue is preserved.
Unpause: Re-inserts track at queue front, starts play loop with seek. Two special behaviors:
-
Rewind: 3s rewind on unpause for continuity (only if paused >= 3s to prevent anti-flood: rapid toggle doesn't compound the rewind).
-
Stale stream: If paused > 45s, cached stream files (in
data/music/cache/) are deleted so the play loop re-downloads. Kept files (data/music/) are never deleted. Stream URLs from YouTube et al. expire within minutes. -
Fade-in: Unpause always uses
fade_in=True(5s ramp from 0).
State cleanup: !stop clears ps["paused"]. The play loop's
finally block skips _cleanup_track when paused (preserves the file).
Autoplay
Design
When autoplay = true (config), the play loop stays alive after the
queue empties:
- Waits for silence (duck_silence threshold, default 15s)
- Picks one random kept track
- Plays it
- On completion, loops back to step 1
This replaces the previous bulk-queue approach (shuffle all kept tracks at once). Benefits: no large upfront queue, silence-aware gaps between tracks, indefinite looping.
Resume persistence
A background task saves track URL + elapsed position to the state DB every 10 seconds during playback:
async def _periodic_save():
while True:
await asyncio.sleep(10)
el = cur_seek + progress[0] * 0.02
if el > 1.0:
_save_resume(bot, track, el)
On hard kill: resumes from at most ~10s behind. On normal track
completion: _clear_resume() wipes the state.
Voice Ducking
Flow
voice detected -> duck_vol = floor (instant)
silence > duck_silence -> linear restore over duck_restore seconds
The duck monitor runs as a background task alongside the play loop.
It updates ps["duck_vol"] which the volume lambda reads per-frame.
Restore ramp
Restoration is linear from floor to user volume. The per-frame ramp in
stream_audio further smooths each 1-second update from the monitor,
eliminating audible steps.
Bot audio isolation
Bot usernames (from registry._bots) are excluded from
_on_sound_received entirely -- no timestamp update, no listener
dispatch. This prevents self-ducking between derp and merlin.
Seek (in-stream pipeline swap)
Design
Seek rebuilds the ffmpeg pipeline at the new position without cancelling the play loop task. This avoids the overhead of re-downloading.
- Set
_seek_fading = True,_seek_fade_out = 10(0.2s ramp-down) - Continue reading frames, scaling by decreasing ratio
- At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
- 0.5s fade-in on the new pipeline
Consolidation note
Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop
fade-out (3s). This is intentional -- seek should feel responsive.
The mechanisms are separate: seek uses frame-counting in
stream_audio, skip/stop uses _fade_and_cancel in the plugin.
Consolidation Opportunities
Volume control unification
Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda
per-frame. Works but the priority logic is implicit. A future refactor
could use a single effective_volume() method that explicitly resolves
priority and makes the per-frame cost clearer.
Fade-out ownership
Skip/stop/pause all route through _fade_and_cancel() -- good. But the
fade target is communicated indirectly via ps["fade_vol"] = 0 and
ps["fade_step"], read by a lambda in the play loop, evaluated in
stream_audio. A more explicit signal (e.g. an asyncio.Event or a
dedicated fade state machine in stream_audio) could simplify reasoning
about timing.
Buffer drain timing
The 150ms post-fade drain is empirical. A more robust approach would be
to query sound_output.get_buffer_size() and wait for it to drop below
a threshold before cancelling. This would adapt to varying network
conditions and pymumble buffer sizes.
Track duration
Duration is probed via ffprobe after download (blocking, run in
executor). For kept tracks, it's stored in state metadata. This is
duplicated -- kept track metadata already has duration from
_fetch_metadata (yt-dlp). The ffprobe path is the fallback for
non-kept tracks. Could unify by always probing locally.
Periodic resume save interval
Currently 10s fixed. Could be adaptive -- save more frequently near the start of a track (where losing position is more noticeable) and less frequently later. Marginal benefit vs. complexity though.