feat: playlist shuffle, lazy resolution, TTS ducking, kept repair
Some checks failed
Some checks failed
Music: - #random URL fragment shuffles playlist tracks before enqueuing - Lazy playlist resolution: first 10 tracks resolve immediately, remaining are fetched in a background task - !kept repair re-downloads kept tracks with missing local files - !kept shows [MISSING] marker for tracks without local files - TTS ducking: music ducks when merlin speaks via voice peer, smooth restore after TTS finishes Performance (from profiling): - Connection pool: preload_content=True for SOCKS connection reuse - Pool tuning: 30 pools / 8 connections (up from 20/4) - _PooledResponse wrapper for stdlib-compatible read interface - Iterative _extract_videos (replace 51K-deep recursion with stack) - proxy=False for local SearXNG Voice + multi-bot: - Per-bot voice config lookup ([<username>.voice] in TOML) - Mute detection: skip duck silence when all users muted - Autoplay shuffle deck (no repeats until full cycle) - Seek clamp to track duration (prevent seek-past-end stall) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
333
docs/AUDIO.md
Normal file
333
docs/AUDIO.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Audio Engine -- Issues, Fixes, and Consolidation Notes
|
||||
|
||||
Technical reference for the Mumble audio pipeline: known issues,
|
||||
applied fixes, architectural decisions, and areas for future work.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
yt-dlp -> ffmpeg (decode to s16le 48kHz mono) -> PCM frames (20ms)
|
||||
-> volume ramp/scale -> pymumble sound_output -> Opus encode -> Mumble
|
||||
```
|
||||
|
||||
Key components:
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `src/derp/mumble.py` | `stream_audio()` -- PCM feed loop, volume ramp, seek |
|
||||
| `plugins/music.py` | Queue, play loop, fade orchestration, duck monitor |
|
||||
|
||||
### Volume control layers (evaluated per-frame, highest priority first)
|
||||
|
||||
1. **fade_vol** -- active during fade-out (skip/stop/pause); set to 0 as target
|
||||
2. **duck_vol** -- voice-activated ducking; snap to floor, linear restore
|
||||
3. **volume** -- user-set level (0-100)
|
||||
|
||||
The play loop passes a lambda to `stream_audio`:
|
||||
|
||||
```python
|
||||
volume=lambda: (
|
||||
ps["fade_vol"] if ps["fade_vol"] is not None else
|
||||
ps["duck_vol"] if ps["duck_vol"] is not None else
|
||||
ps["volume"]
|
||||
) / 100.0
|
||||
```
|
||||
|
||||
### Per-frame volume ramping
|
||||
|
||||
`stream_audio` never jumps to the target volume. Each 20ms frame is
|
||||
ramped from `_cur_vol` toward `target` by at most `step`:
|
||||
|
||||
- **_max_step** = 0.005 (~4s full ramp) -- ceiling for normal changes
|
||||
- **fade_in_step** -- computed from fade-in duration (default 5s)
|
||||
- **fade_step** -- override from plugin (fade-out on skip/stop/pause)
|
||||
|
||||
When `abs(diff) < 0.0001`, flat scaling is used (avoids ramp artifacts
|
||||
on steady-state frames). Otherwise, `_scale_pcm_ramp()` linearly
|
||||
interpolates across all 960 samples in the frame.
|
||||
|
||||
---
|
||||
|
||||
## Issues and Fixes
|
||||
|
||||
### 1. Alpine ffmpeg lacks librubberband
|
||||
|
||||
**Symptom:** 13/15 voice audition samples failed. `rubberband` audio
|
||||
filter unavailable in ffmpeg.
|
||||
|
||||
**Root cause:** Alpine's ffmpeg package is compiled without
|
||||
`--enable-librubberband`.
|
||||
|
||||
**Fix:** Added `rubberband` CLI package to `Containerfile`. Created
|
||||
`_split_fx()` in `plugins/voice.py` to parse FX chains: pitch-shifting
|
||||
goes through the `rubberband` CLI binary, remaining filters (bass, echo)
|
||||
through ffmpeg. Two-stage pipeline.
|
||||
|
||||
**Files:** `Containerfile`, `plugins/voice.py`
|
||||
|
||||
---
|
||||
|
||||
### 2. Self-ducking between bots
|
||||
|
||||
**Symptom:** derp's music volume dropped when merlin spoke (TTS).
|
||||
|
||||
**Root cause:** merlin's TTS output triggered `_on_sound_received`,
|
||||
which updated the shared `registry._voice_ts` timestamp. derp's duck
|
||||
monitor saw recent voice activity and ducked.
|
||||
|
||||
**Fix:** `_on_sound_received` checks `registry._bots` and returns early
|
||||
for any bot username -- no timestamp update, no listener dispatch.
|
||||
|
||||
```python
|
||||
def _on_sound_received(self, user, sound_chunk) -> None:
|
||||
name = user["name"] if isinstance(user, dict) else None
|
||||
bots = getattr(self.registry, "_bots", {})
|
||||
if name and name in bots:
|
||||
return # ignore audio from bots entirely
|
||||
```
|
||||
|
||||
**Files:** `src/derp/mumble.py`
|
||||
|
||||
---
|
||||
|
||||
### 3. Click/pop on skip/stop (fade-out cancellation)
|
||||
|
||||
**Symptom:** Audible glitch at the end of fade-out when skipping or
|
||||
stopping a track.
|
||||
|
||||
**Root cause:** `_fade_and_cancel()` fades volume to 0 over ~3s, then
|
||||
calls `task.cancel()`. In `stream_audio`, `CancelledError` triggers
|
||||
`clear_buffer()`, which drops any frames still queued in pymumble's
|
||||
output -- including frames that were encoded at non-zero amplitude a
|
||||
few frames earlier. The sudden buffer wipe produces a click.
|
||||
|
||||
**Fix (two-part):**
|
||||
|
||||
1. **Plugin side** (`music.py`): Added 150ms post-fade drain before
|
||||
cancel, giving pymumble time to flush remaining silent frames.
|
||||
|
||||
2. **Engine side** (`mumble.py`): `CancelledError` handler only calls
|
||||
`clear_buffer()` if `_cur_vol > 0.01`. When a fade-out has already
|
||||
driven volume to ~0, the remaining buffer frames are silent and
|
||||
clearing them is unnecessary.
|
||||
|
||||
```python
|
||||
# mumble.py -- CancelledError handler
|
||||
if _cur_vol > 0.01:
|
||||
self._mumble.sound_output.clear_buffer()
|
||||
```
|
||||
|
||||
```python
|
||||
# music.py -- _fade_and_cancel()
|
||||
await asyncio.sleep(duration)
|
||||
await asyncio.sleep(0.15) # drain window
|
||||
task.cancel()
|
||||
```
|
||||
|
||||
**Files:** `src/derp/mumble.py`, `plugins/music.py`
|
||||
|
||||
---
|
||||
|
||||
### 4. Fade-out math
|
||||
|
||||
**How it works:** `_fade_and_cancel(duration=3.0)` computes the
|
||||
per-frame step from the current effective volume:
|
||||
|
||||
```python
|
||||
cur_vol = (duck_vol or volume) / 100.0
|
||||
n_frames = duration / 0.02 # 150 frames for 3s
|
||||
step = cur_vol / n_frames
|
||||
```
|
||||
|
||||
The play loop sets `ps["fade_vol"] = 0` (the target) and
|
||||
`ps["fade_step"] = step` (the rate). `stream_audio` ramps `_cur_vol`
|
||||
toward 0 at `step` per frame. At 50% volume: step = 0.0033, reaching
|
||||
zero in exactly 150 frames (3.0s).
|
||||
|
||||
**Note:** `fade_vol` is set to 0 immediately, making the volume lambda
|
||||
return 0 as the target. The ramp code smoothly transitions -- there is
|
||||
no abrupt jump because `_cur_vol` tracks actual output level, not the
|
||||
target.
|
||||
|
||||
---
|
||||
|
||||
### 5. Self-mute lifecycle
|
||||
|
||||
**Requirement:** merlin mutes on connect, unmutes only when emitting
|
||||
audio (TTS), re-mutes after a delay.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```
|
||||
connect -> mute()
|
||||
stream_audio start -> cancel pending mute task, unmute()
|
||||
stream_audio finally -> spawn _delayed_mute(3.0)
|
||||
```
|
||||
|
||||
The 3-second delay prevents rapid mute/unmute flicker on back-to-back
|
||||
TTS. The mute task is cancelled if new audio starts before it fires.
|
||||
|
||||
**Config:** `self_mute = true` in `[[mumble.extra]]`
|
||||
|
||||
**Files:** `src/derp/mumble.py`
|
||||
|
||||
---
|
||||
|
||||
### 6. Self-deafen on connect
|
||||
|
||||
**Requirement:** merlin deafens on connect (no audio reception needed).
|
||||
|
||||
**Implementation:** `self_deaf = true` config flag, calls
|
||||
`self._mumble.users.myself.deafen()` in `_on_connected`.
|
||||
|
||||
**Files:** `src/derp/mumble.py`, `config/derp.toml`
|
||||
|
||||
---
|
||||
|
||||
## Pause/Resume
|
||||
|
||||
### Design
|
||||
|
||||
`!pause` toggles between paused and playing states:
|
||||
|
||||
**Pause:** Captures current track + elapsed position + monotonic
|
||||
timestamp. Fades out, cancels play loop. Queue is preserved.
|
||||
|
||||
**Unpause:** Re-inserts track at queue front, starts play loop with
|
||||
seek. Two special behaviors:
|
||||
|
||||
1. **Rewind:** 3s rewind on unpause for continuity (only if paused >= 3s
|
||||
to prevent anti-flood: rapid toggle doesn't compound the rewind).
|
||||
|
||||
2. **Stale stream:** If paused > 45s, cached stream files (in
|
||||
`data/music/cache/`) are deleted so the play loop re-downloads.
|
||||
Kept files (`data/music/`) are never deleted. Stream URLs from
|
||||
YouTube et al. expire within minutes.
|
||||
|
||||
3. **Fade-in:** Unpause always uses `fade_in=True` (5s ramp from 0).
|
||||
|
||||
**State cleanup:** `!stop` clears `ps["paused"]`. The play loop's
|
||||
`finally` block skips `_cleanup_track` when paused (preserves the file).
|
||||
|
||||
---
|
||||
|
||||
## Autoplay
|
||||
|
||||
### Design
|
||||
|
||||
When `autoplay = true` (config), the play loop stays alive after the
|
||||
queue empties:
|
||||
|
||||
1. Waits for silence (duck_silence threshold, default 15s)
|
||||
2. Picks one random kept track
|
||||
3. Plays it
|
||||
4. On completion, loops back to step 1
|
||||
|
||||
This replaces the previous bulk-queue approach (shuffle all kept tracks
|
||||
at once). Benefits: no large upfront queue, silence-aware gaps between
|
||||
tracks, indefinite looping.
|
||||
|
||||
### Resume persistence
|
||||
|
||||
A background task saves track URL + elapsed position to the state DB
|
||||
every 10 seconds during playback:
|
||||
|
||||
```python
|
||||
async def _periodic_save():
|
||||
while True:
|
||||
await asyncio.sleep(10)
|
||||
el = cur_seek + progress[0] * 0.02
|
||||
if el > 1.0:
|
||||
_save_resume(bot, track, el)
|
||||
```
|
||||
|
||||
On hard kill: resumes from at most ~10s behind. On normal track
|
||||
completion: `_clear_resume()` wipes the state.
|
||||
|
||||
---
|
||||
|
||||
## Voice Ducking
|
||||
|
||||
### Flow
|
||||
|
||||
```
|
||||
voice detected -> duck_vol = floor (instant)
|
||||
silence > duck_silence -> linear restore over duck_restore seconds
|
||||
```
|
||||
|
||||
The duck monitor runs as a background task alongside the play loop.
|
||||
It updates `ps["duck_vol"]` which the volume lambda reads per-frame.
|
||||
|
||||
### Restore ramp
|
||||
|
||||
Restoration is linear from floor to user volume. The per-frame ramp in
|
||||
`stream_audio` further smooths each 1-second update from the monitor,
|
||||
eliminating audible steps.
|
||||
|
||||
### Bot audio isolation
|
||||
|
||||
Bot usernames (from `registry._bots`) are excluded from
|
||||
`_on_sound_received` entirely -- no timestamp update, no listener
|
||||
dispatch. This prevents self-ducking between derp and merlin.
|
||||
|
||||
---
|
||||
|
||||
## Seek (in-stream pipeline swap)
|
||||
|
||||
### Design
|
||||
|
||||
Seek rebuilds the ffmpeg pipeline at the new position without cancelling
|
||||
the play loop task. This avoids the overhead of re-downloading.
|
||||
|
||||
1. Set `_seek_fading = True`, `_seek_fade_out = 10` (0.2s ramp-down)
|
||||
2. Continue reading frames, scaling by decreasing ratio
|
||||
3. At fade-out = 0: kill ffmpeg, clear buffer, spawn new pipeline
|
||||
4. 0.5s fade-in on the new pipeline
|
||||
|
||||
### Consolidation note
|
||||
|
||||
Seek fade-out (10 frames / 0.2s) is much shorter than skip/stop
|
||||
fade-out (3s). This is intentional -- seek should feel responsive.
|
||||
The mechanisms are separate: seek uses frame-counting in
|
||||
`stream_audio`, skip/stop uses `_fade_and_cancel` in the plugin.
|
||||
|
||||
---
|
||||
|
||||
## Consolidation Opportunities
|
||||
|
||||
### Volume control unification
|
||||
|
||||
Three volume layers (fade_vol, duck_vol, volume) evaluated in a lambda
|
||||
per-frame. Works but the priority logic is implicit. A future refactor
|
||||
could use a single `effective_volume()` method that explicitly resolves
|
||||
priority and makes the per-frame cost clearer.
|
||||
|
||||
### Fade-out ownership
|
||||
|
||||
Skip/stop/pause all route through `_fade_and_cancel()` -- good. But the
|
||||
fade target is communicated indirectly via `ps["fade_vol"] = 0` and
|
||||
`ps["fade_step"]`, read by a lambda in the play loop, evaluated in
|
||||
`stream_audio`. A more explicit signal (e.g. an asyncio.Event or a
|
||||
dedicated fade state machine in `stream_audio`) could simplify reasoning
|
||||
about timing.
|
||||
|
||||
### Buffer drain timing
|
||||
|
||||
The 150ms post-fade drain is empirical. A more robust approach would be
|
||||
to query `sound_output.get_buffer_size()` and wait for it to drop below
|
||||
a threshold before cancelling. This would adapt to varying network
|
||||
conditions and pymumble buffer sizes.
|
||||
|
||||
### Track duration
|
||||
|
||||
Duration is probed via `ffprobe` after download (blocking, run in
|
||||
executor). For kept tracks, it's stored in state metadata. This is
|
||||
duplicated -- kept track metadata already has duration from
|
||||
`_fetch_metadata` (yt-dlp). The `ffprobe` path is the fallback for
|
||||
non-kept tracks. Could unify by always probing locally.
|
||||
|
||||
### Periodic resume save interval
|
||||
|
||||
Currently 10s fixed. Could be adaptive -- save more frequently near
|
||||
the start of a track (where losing position is more noticeable) and
|
||||
less frequently later. Marginal benefit vs. complexity though.
|
||||
Reference in New Issue
Block a user