Some checks failed
Music: - #random URL fragment shuffles playlist tracks before enqueuing - Lazy playlist resolution: first 10 tracks resolve immediately, remaining are fetched in a background task - !kept repair re-downloads kept tracks with missing local files - !kept shows [MISSING] marker for tracks without local files - TTS ducking: music ducks when merlin speaks via voice peer, smooth restore after TTS finishes Performance (from profiling): - Connection pool: preload_content=True for SOCKS connection reuse - Pool tuning: 30 pools / 8 connections (up from 20/4) - _PooledResponse wrapper for stdlib-compatible read interface - Iterative _extract_videos (replace 51K-deep recursion with stack) - proxy=False for local SearXNG Voice + multi-bot: - Per-bot voice config lookup ([<username>.voice] in TOML) - Mute detection: skip duck silence when all users muted - Autoplay shuffle deck (no repeats until full cycle) - Seek clamp to track duration (prevent seek-past-end stall) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
334 lines
10 KiB
Markdown
334 lines
10 KiB
Markdown
# Audio Engine -- Issues, Fixes, and Consolidation Notes
|
|
|
|
Technical reference for the Mumble audio pipeline: known issues,
|
|
applied fixes, architectural decisions, and areas for future work.
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
|
|
-> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble
|
|
```
|
|
|
|
Key components:
|
|
|
|
| File | Role |
|
|
|------|------|
|
|
| `src/derp/mumble.py` | `stream_audio()` -- PCM feed loop, volume ramp, seek |
|
|
| `plugins/music.py` | Queue, play loop, fade orchestration, duck monitor |
|
|
|
|
### Volume control layers (evaluated per-frame, highest priority first)
|
|
|
|
1. **fade_vol** -- active during fade-out (skip/stop/pause); set to 0 as target
|
|
2. **duck_vol** -- voice-activated ducking; snap to floor, linear restore
|
|
3. **volume** -- user-set level (0-100)
|
|
|
|
The play loop passes a lambda to `stream_audio`:
|
|
|
|
```python
|
|
volume=lambda: (
|
|
ps["fade_vol"] if ps["fade_vol"] is not None else
|
|
ps["duck_vol"] if ps["duck_vol"] is not None else
|
|
ps["volume"]
|
|
) / 100.0
|
|
```
|
|
|
|
### Per-frame volume ramping
|
|
|
|
`stream_audio` never jumps to the target volume. Each 20ms frame is
|
|
ramped from `_cur_vol` toward `target` by at most `step`:
|
|
|
|
- **_max_step** = 0.005 (~4s full ramp) -- ceiling for normal changes
|
|
- **fade_in_step** -- computed from fade-in duration (default 5s)
|
|
- **fade_step** -- override from plugin (fade-out on skip/stop/pause)
|
|
|
|
When `abs(diff) < 0.0001`, flat scaling is used (avoids ramp artifacts
|
|
on steady-state frames). Otherwise, `_scale_pcm_ramp()` linearly
|
|
interpolates across all 960 samples in the frame.
|
|
|
|
---
|
|
|
|
## Issues and Fixes
|
|
|
|
### 1. Alpine ffmpeg lacks librubberband
|
|
|
|
**Symptom:** 13/15 voice audition samples failed. `rubberband` audio
|
|
filter unavailable in ffmpeg.
|
|
|
|
**Root cause:** Alpine's ffmpeg package is compiled without
|
|
`--enable-librubberband`.
|
|
|
|
**Fix:** Added `rubberband` CLI package to `Containerfile`. Created
|
|
`_split_fx()` in `plugins/voice.py` to parse FX chains: pitch-shifting
|
|
goes through the `rubberband` CLI binary, remaining filters (bass, echo)
|
|
through ffmpeg. Two-stage pipeline.
|
|
|
|
**Files:** `Containerfile`, `plugins/voice.py`
|
|
|
|
---
|
|
|
|
### 2. Self-ducking between bots
|
|
|
|
**Symptom:** derp's music volume dropped when merlin spoke (TTS).
|
|
|
|
**Root cause:** merlin's TTS output triggered `_on_sound_received`,
|
|
which updated the shared `registry._voice_ts` timestamp. derp's duck
|
|
monitor saw recent voice activity and ducked.
|
|
|
|
**Fix:** `_on_sound_received` checks `registry._bots` and returns early
|
|
for any bot username -- no timestamp update, no listener dispatch.
|
|
|
|
```python
|
|
def _on_sound_received(self, user, sound_chunk) -> None:
|
|
name = user["name"] if isinstance(user, dict) else None
|
|
bots = getattr(self.registry, "_bots", {})
|
|
if name and name in bots:
|
|
return # ignore audio from bots entirely
|
|
```
|
|
|
|
**Files:** `src/derp/mumble.py`
|
|
|
|
---
|
|
|
|
### 3. Click/pop on skip/stop (fade-out cancellation)
|
|
|
|
**Symptom:** Audible glitch at the end of fade-out when skipping or
|
|
stopping a track.
|
|
|
|
**Root cause:** `_fade_and_cancel()` fades volume to 0 over ~3s, then
|
|
calls `task.cancel()`. In `stream_audio`, `CancelledError` triggers
|
|
`clear_buffer()`, which drops any frames still queued in pymumble's
|
|
output -- including frames that were encoded at non-zero amplitude a
|
|
few frames earlier. The sudden buffer wipe produces a click.
|
|
|
|
**Fix (two-part):**
|
|
|
|
1. **Plugin side** (`music.py`): Added 150ms post-fade drain before
|
|
cancel, giving pymumble time to flush remaining silent frames.
|
|
|
|
2. **Engine side** (`mumble.py`): `CancelledError` handler only calls
|
|
`clear_buffer()` if `_cur_vol > 0.01`. When a fade-out has already
|
|
driven volume to ~0, the remaining buffer frames are silent and
|
|
clearing them is unnecessary.
|
|
|
|
```python
|
|
# mumble.py -- CancelledError handler
|
|
if _cur_vol > 0.01:
|
|
self._mumble.sound_output.clear_buffer()
|
|
```
|
|
|
|
```python
|
|
# music.py -- _fade_and_cancel()
|
|
await asyncio.sleep(duration)
|
|
await asyncio.sleep(0.15) # drain window
|
|
task.cancel()
|
|
```
|
|
|
|
**Files:** `src/derp/mumble.py`, `plugins/music.py`
|
|
|
|
---
|
|
|
|
### 4. Fade-out math
|
|
|
|
**How it works:** `_fade_and_cancel(duration=3.0)` computes the
|
|
per-frame step from the current effective volume:
|
|
|
|
```python
|
|
cur_vol = (duck_vol or volume) / 100.0
|
|
n_frames = duration / 0.02 # 150 frames for 3s
|
|
step = cur_vol / n_frames
|
|
```
|
|
|
|
The play loop sets `ps["fade_vol"] = 0` (the target) and
|
|
`ps["fade_step"] = step` (the rate). `stream_audio` ramps `_cur_vol`
|
|
toward 0 at `step` per frame. At 50% volume: step = 0.0033, reaching
|
|
zero in exactly 150 frames (3.0s).
|
|
|
|
**Note:** `fade_vol` is set to 0 immediately, making the volume lambda
|
|
return 0 as the target. The ramp code smoothly transitions -- there is
|
|
no abrupt jump because `_cur_vol` tracks actual output level, not the
|
|
target.
|
|
|
|
---
|
|
|
|
### 5. Self-mute lifecycle
|
|
|
|
**Requirement:** merlin mutes on connect, unmutes only when emitting
|
|
audio (TTS), re-mutes after a delay.
|
|
|
|
**Implementation:**
|
|
|
|
```
|
|
connect -> mute()
|
|
stream_audio start -> cancel pending mute task, unmute()
|
|
stream_audio finally -> spawn _delayed_mute(3.0)
|
|
```
|
|
|
|
The 3-second delay prevents rapid mute/unmute flicker on back-to-back
|
|
TTS. The mute task is cancelled if new audio starts before it fires.
|
|
|
|
**Config:** `self_mute = true` in `[[mumble.extra]]`
|
|
|
|
**Files:** `src/derp/mumble.py`
|
|
|
|
---
|
|
|
|
### 6. Self-deafen on connect
|
|
|
|
**Requirement:** merlin deafens on connect (no audio reception needed).
|
|
|
|
**Implementation:** `self_deaf = true` config flag, calls
|
|
`self._mumble.users.myself.deafen()` in `_on_connected`.
|
|
|
|
**Files:** `src/derp/mumble.py`, `config/derp.toml`
|
|
|
|
---
|
|
|
|
## Pause/Resume
|
|
|
|
### Design
|
|
|
|
`!pause` toggles between paused and playing states:
|
|
|
|
**Pause:** Captures current track + elapsed position + monotonic
|
|
timestamp. Fades out, cancels play loop. Queue is preserved.
|
|
|
|
**Unpause:** Re-inserts track at queue front, starts play loop with
|
|
seek. Two special behaviors:
|
|
|
|
1. **Rewind:** 3s rewind on unpause for continuity (only if paused >= 3s
|
|
to prevent anti-flood: rapid toggle doesn't compound the rewind).
|
|
|
|
2. **Stale stream:** If paused > 45s, cached stream files (in
|
|
`data/music/cache/`) are deleted so the play loop re-downloads.
|
|
Kept files (`data/music/`) are never deleted. Stream URLs from
|
|
YouTube et al. expire within minutes.
|
|
|
|
3. **Fade-in:** Unpause always uses `fade_in=True` (5s ramp from 0).
|
|
|
|
**State cleanup:** `!stop` clears `ps["paused"]`. The play loop's
|
|
`finally` block skips `_cleanup_track` when paused (preserves the file).
|
|
|
|
---
|
|
|
|
## Autoplay
|
|
|
|
### Design
|
|
|
|
When `autoplay = true` (config), the play loop stays alive after the
|
|
queue empties:
|
|
|
|
1. Waits for silence (duck_silence threshold, default 15s)
|
|
2. Picks one random kept track
|
|
3. Plays it
|
|
4. On completion, loops back to step 1
|
|
|
|
This replaces the previous bulk-queue approach (shuffle all kept tracks
|
|
at once). Benefits: no large upfront queue, silence-aware gaps between
|
|
tracks, indefinite looping.
|
|
|
|
### Resume persistence
|
|
|
|
A background task saves track URL + elapsed position to the state DB
|
|
every 10 seconds during playback:
|
|
|
|
```python
|
|
async def _periodic_save():
|
|
while True:
|
|
await asyncio.sleep(10)
|
|
el = cur_seek + progress[0] * 0.02
|
|
if el > 1.0:
|
|
_save_resume(bot, track, el)
|
|
```
|
|
|
|
On hard kill: resumes from at most ~10s behind. On normal track
|
|
completion: `_clear_resume()` wipes the state.
|
|
|
|
---
|
|
|
|
## Voice Ducking
|
|
|
|
### Flow
|
|
|
|
```
|
|
voice detected -> duck_vol = floor (instant)
|
|
silence > duck_silence -> linear restore over duck_restore seconds
|
|
```
|
|
|
|
The duck monitor runs as a background task alongside the play loop.
|
|
It updates `ps["duck_vol"]` which the volume lambda reads per-frame.
|
|
|
|
### Restore ramp
|
|
|
|
Restoration is linear from floor to user volume. The per-frame ramp in
|
|
`stream_audio` further smooths each 1-second update from the monitor,
|
|
eliminating audible steps.
|
|
|
|
### Bot audio isolation
|
|
|
|
Bot usernames (from `registry._bots`) are excluded from
|
|
`_on_sound_received` entirely -- no timestamp update, no listener
|
|
dispatch. This prevents self-ducking between derp and merlin.
|
|
|
|
---
|
|
|
|
## Seek (in-stream pipeline swap)
|
|
|
|
### Design
|
|
|
|
Seek rebuilds the ffmpeg pipeline at the new position without cancelling
|
|
the play loop task. This avoids the overhead of re-downloading.
|
|
|
|
1. Set `_seek_fading = True`, `_seek_fade_out = 10` (0.2s ramp-down)
|
|
2. Continue reading frames, scaling by decreasing ratio
|
|
3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
|
|
4. 0.5s fade-in on the new pipeline
|
|
|
|
### Consolidation note
|
|
|
|
Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop
|
|
fade-out (3s). This is intentional -- seek should feel responsive.
|
|
The mechanisms are separate: seek uses frame-counting in
|
|
`stream_audio`, skip/stop uses `_fade_and_cancel` in the plugin.
|
|
|
|
---
|
|
|
|
## Consolidation Opportunities
|
|
|
|
### Volume control unification
|
|
|
|
Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda
|
|
per-frame. Works but the priority logic is implicit. A future refactor
|
|
could use a single `effective_volume()` method that explicitly resolves
|
|
priority and makes the per-frame cost clearer.
|
|
|
|
### Fade-out ownership
|
|
|
|
Skip/stop/pause all route through `_fade_and_cancel()` -- good. But the
|
|
fade target is communicated indirectly via `ps["fade_vol"] = 0` and
|
|
`ps["fade_step"]`, read by a lambda in the play loop, evaluated in
|
|
`stream_audio`. A more explicit signal (e.g. an asyncio.Event or a
|
|
dedicated fade state machine in `stream_audio`) could simplify reasoning
|
|
about timing.
|
|
|
|
### Buffer drain timing
|
|
|
|
The 150ms post-fade drain is empirical. A more robust approach would be
|
|
to query `sound_output.get_buffer_size()` and wait for it to drop below
|
|
a threshold before cancelling. This would adapt to varying network
|
|
conditions and pymumble buffer sizes.
|
|
|
|
### Track duration
|
|
|
|
Duration is probed via `ffprobe` after download (blocking, run in
|
|
executor). For kept tracks, it's stored in state metadata. This is
|
|
duplicated -- kept track metadata already has duration from
|
|
`_fetch_metadata` (yt-dlp). The `ffprobe` path is the fallback for
|
|
non-kept tracks. Could unify by always probing locally.
|
|
|
|
### Periodic resume save interval
|
|
|
|
Currently 10s fixed. Could be adaptive -- save more frequently near
|
|
the start of a track (where losing position is more noticeable) and
|
|
less frequently later. Marginal benefit vs. complexity though.
|