diff --git a/docs/USAGE.md b/docs/USAGE.md index 9f536c5..c14c86d 100644 --- a/docs/USAGE.md +++ b/docs/USAGE.md @@ -1677,3 +1677,40 @@ file (natural dedup). - Use `!kept` to list preserved files and their sizes - Use `!kept clear` to delete all preserved files - On cancel/error, files are not deleted (needed for `!resume`) + +### Voice STT/TTS + +Transcribe voice from Mumble users via Whisper STT and speak text aloud +via Piper TTS. Requires local Whisper and Piper services. + +``` +!listen [on|off] Toggle voice-to-text transcription (admin) +!listen Show current listen status +!say Speak text aloud via TTS (max 500 chars) +``` + +STT behavior: + +- When enabled, the bot buffers incoming voice PCM per user +- After a configurable silence gap (default 1.5s), the buffer is + transcribed via Whisper and posted as an action message +- Utterances shorter than 0.5s are discarded (noise filter) +- Utterances are capped at 30s to bound memory and latency +- Transcription results are posted as: `* derp heard Alice say: hello` +- The listener survives reconnects when `!listen` is on + +TTS behavior: + +- `!say` fetches WAV from Piper and plays it via `stream_audio()` +- Piper outputs 22050Hz WAV; ffmpeg resamples to 48kHz automatically +- TTS shares the audio output with music playback +- Text is limited to 500 characters + +Configuration (optional): + +```toml +[voice] +whisper_url = "http://192.168.122.1:8080/inference" +piper_url = "http://192.168.122.1:5000/" +silence_gap = 1.5 +```