2026-05-17 17:54:08 -07:00
|
|
|
# rvoice — push-to-talk dictation for remote rclaude sessions
|
|
|
|
|
|
|
|
|
|
`/voice` in Claude Code opens the mic on **whichever host the claude binary is
|
|
|
|
|
running on**. When you're sshed to apricot through `cc` / `rclaude resume`,
|
|
|
|
|
that's apricot — which has no mic. `rvoice` fills the gap.
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
It records audio locally on macOS, transcribes via the **LAN speech-synthesis
|
|
|
|
|
service on apricot** (Whisper, GPU-accelerated, no API keys / no network
|
|
|
|
|
egress beyond the local LAN), and injects the transcript into the active
|
|
|
|
|
remote tmux session via `tmux send-keys` over ssh. The target session is
|
|
|
|
|
auto-detected from the focused iTerm2 tab title (set by the canonical
|
|
|
|
|
session-tools `tmux.conf` to `<host> · <session>`).
|
2026-05-17 17:54:08 -07:00
|
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
[ Right ⌥ down ] ──Hammerspoon──▶ rvoice start ──▶ ffmpeg → recording.wav
|
|
|
|
|
[ Right ⌥ up ] ──Hammerspoon──▶ rvoice stop
|
|
|
|
|
│
|
|
|
|
|
▼
|
2026-05-17 18:12:14 -07:00
|
|
|
POST WAV → http://apricot.lan:8000/stt/transcribe
|
|
|
|
|
(faster-whisper on GPU, ~base model)
|
2026-05-17 17:54:08 -07:00
|
|
|
│
|
|
|
|
|
▼
|
2026-05-17 18:12:14 -07:00
|
|
|
iTerm2 active tab title → "apricot · claude-…"
|
2026-05-17 17:54:08 -07:00
|
|
|
│
|
|
|
|
|
▼
|
2026-05-17 18:12:14 -07:00
|
|
|
ssh apricot tmux send-keys -t claude-… -l "<text>"
|
2026-05-17 17:54:08 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Files
|
|
|
|
|
|
|
|
|
|
| Path | Role |
|
|
|
|
|
|------------------------------------------------------|---------------------------------------|
|
|
|
|
|
| `bin/rvoice` | CLI: `start`/`stop`/`cancel`/`target`/`log` |
|
|
|
|
|
| `hammerspoon/rvoice.lua` | Right-⌥ hold detector → calls `rvoice` |
|
2026-05-17 18:12:14 -07:00
|
|
|
| `~/.config/rvoice/config` | Sourced at startup; overrides STT URL, model, etc. |
|
2026-05-17 17:54:08 -07:00
|
|
|
| `$TMPDIR/rvoice/` | Per-recording state (pid, wav, log) |
|
|
|
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
Prerequisites: `ffmpeg`, `jq`, `curl` (all `brew install`able), Hammerspoon
|
|
|
|
|
(`brew install --cask hammerspoon`), and the LAN speech-synthesis service
|
|
|
|
|
running on apricot (already deployed at `apricot.lan:8000`, exposes
|
|
|
|
|
`/stt/transcribe`). No API keys, no cloud round-trip.
|
2026-05-17 17:54:08 -07:00
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
# 1. Symlink rvoice (already done if you ran install.sh)
|
|
|
|
|
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
# 2. (Optional) override defaults in ~/.config/rvoice/config — see the
|
|
|
|
|
# "Config" section below. The default is to POST to apricot.lan:8000 and
|
|
|
|
|
# use the `base` Whisper model.
|
2026-05-17 17:54:08 -07:00
|
|
|
|
|
|
|
|
# 3. Wire up Hammerspoon
|
|
|
|
|
mkdir -p ~/.hammerspoon
|
|
|
|
|
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
|
|
|
|
|
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
|
|
|
|
|
open /Applications/Hammerspoon.app
|
|
|
|
|
|
|
|
|
|
# 4. From Hammerspoon's menu bar → Reload Config.
|
|
|
|
|
# Grant Accessibility + Microphone permission when macOS prompts.
|
2026-05-17 18:12:14 -07:00
|
|
|
|
|
|
|
|
# 5. Smoke-test the STT endpoint without Hammerspoon:
|
|
|
|
|
ffmpeg -f avfoundation -i ":0" -ac 1 -ar 16000 -t 5 /tmp/me.wav
|
|
|
|
|
curl -F "audio=@/tmp/me.wav" -F "model=base" -F "language=en" -F "task=transcribe" \
|
|
|
|
|
http://apricot.lan:8000/stt/transcribe | jq .text
|
2026-05-17 17:54:08 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
|
|
From any iTerm2 tab that's attached to a remote claude session via `cc` or
|
|
|
|
|
`rclaude resume`:
|
|
|
|
|
|
|
|
|
|
1. **Hold Right ⌥** → "listening…" notification, Tink sound
|
|
|
|
|
2. **Speak**
|
|
|
|
|
3. **Release** → recording stops, transcript types into your claude prompt,
|
|
|
|
|
Pop sound on success / Funk sound on error
|
|
|
|
|
4. **Hit Enter** when you're ready (review first), or set `RVOICE_AUTOSEND=1`
|
|
|
|
|
to skip the manual confirmation
|
|
|
|
|
|
|
|
|
|
## Config (`~/.config/rvoice/config`)
|
|
|
|
|
|
|
|
|
|
Plain shell fragment sourced at startup. Defaults shown.
|
|
|
|
|
|
|
|
|
|
```sh
|
2026-05-17 18:12:14 -07:00
|
|
|
export RVOICE_STT_URL=http://apricot.lan:8000 # speech-synthesis service
|
|
|
|
|
export RVOICE_MODEL=base # tiny|base|small|medium|large-v2|large-v3
|
|
|
|
|
export RVOICE_LANG=en # omit/empty = auto-detect
|
2026-05-17 17:54:08 -07:00
|
|
|
export RVOICE_AUTOSEND=0 # 1 = press Enter after inject
|
|
|
|
|
export RVOICE_MIN_MS=200 # ignore taps shorter than this (debounce)
|
|
|
|
|
export RVOICE_MAX_S=60 # hard cap on a single recording
|
|
|
|
|
export RVOICE_HOST=apricot.lan # force target host (overrides iTerm2 detection)
|
|
|
|
|
export RVOICE_SESSION=claude-natalie-… # force target tmux session
|
|
|
|
|
```
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
Override any of these per-invocation: `RVOICE_MODEL=small rvoice stop`.
|
|
|
|
|
|
|
|
|
|
**Model trade-offs** (apricot's GPU; latency rough):
|
|
|
|
|
- `tiny.en` / `base` — sub-second, fine for short prompts
|
|
|
|
|
- `small` — ~1s, noticeable quality bump
|
|
|
|
|
- `medium` / `large-v3` — 2-4s, near-perfect, worth it for paragraphs
|
2026-05-17 17:54:08 -07:00
|
|
|
|
|
|
|
|
## Subcommands
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
rvoice start # begin recording (Hammerspoon calls this on key-down)
|
|
|
|
|
rvoice stop # stop, transcribe, inject (called on key-up)
|
|
|
|
|
rvoice cancel # stop without transcribing (called on quick-tap abort)
|
|
|
|
|
rvoice target # debug: echo the host+session rvoice WOULD inject into
|
|
|
|
|
rvoice log # tail -50 of the action log
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
- **"STT request failed"** — apricot's speech service isn't reachable. Check
|
|
|
|
|
`curl http://apricot.lan:8000/health` and `ssh apricot.lan systemctl --user
|
|
|
|
|
status` for the relevant unit. Most likely you're off the LAN/VPN.
|
2026-05-17 17:54:08 -07:00
|
|
|
- **"no target session resolvable"** — the focused iTerm2 tab title isn't in
|
|
|
|
|
`<host> · <session>` format. Either: (a) you're not in an rclaude/ssh
|
|
|
|
|
session, or (b) the remote tmux config didn't get the title-setting fragment.
|
|
|
|
|
`rclaude install --on <host>` re-pushes the canonical tmux config; verify
|
|
|
|
|
with `ssh <host> 'tmux show-options -g | grep set-titles'`.
|
|
|
|
|
- **Hammerspoon doesn't see Right ⌥** — System Settings → Privacy &
|
|
|
|
|
Security → Accessibility → enable Hammerspoon. Also Microphone for the
|
|
|
|
|
recording step. Restart Hammerspoon after granting.
|
2026-05-17 18:12:14 -07:00
|
|
|
- **Transcription returns empty / nonsense** — bump the model: `RVOICE_MODEL=small`
|
|
|
|
|
or `medium`. Default `base` trades accuracy for sub-second latency. Models
|
|
|
|
|
list: `curl http://apricot.lan:8000/stt/models`.
|
2026-05-17 17:54:08 -07:00
|
|
|
- **Injection types into the wrong session** — `rvoice target` shows what it
|
|
|
|
|
will hit. If wrong, set `RVOICE_HOST` / `RVOICE_SESSION` in config to pin
|
|
|
|
|
the target.
|
2026-05-17 18:12:14 -07:00
|
|
|
- **Latency feels high** — first call after service idle warms the model on
|
|
|
|
|
apricot's GPU (1-2s one-time). Subsequent calls are sub-second for `base`.
|
|
|
|
|
Switch to `tiny.en` for the lowest-latency tier.
|
2026-05-17 17:54:08 -07:00
|
|
|
|
|
|
|
|
## Why this architecture (vs. /voice over ssh)
|
|
|
|
|
|
|
|
|
|
`/voice` is a feature of the `claude` binary itself; it opens the mic via
|
|
|
|
|
the OS audio API on whichever host it runs on. ssh has no audio channel and
|
|
|
|
|
doesn't forward CoreAudio events. The only ways to make `/voice` work over a
|
|
|
|
|
remote rclaude session would be:
|
|
|
|
|
|
|
|
|
|
1. **Run claude locally** (lose apricot's compute / project files / LAN
|
|
|
|
|
services — not viable for our workflow)
|
|
|
|
|
2. **Forward audio via PulseAudio** (brittle on macOS, breaks on every
|
|
|
|
|
claude release)
|
|
|
|
|
3. **Reproduce /voice's behavior with our own pieces** ← this is rvoice
|
|
|
|
|
|
2026-05-17 18:12:14 -07:00
|
|
|
`rvoice` keeps the mic and the hotkey on the Mac, runs transcription on
|
|
|
|
|
apricot's own LAN-resident speech-synthesis service (GPU Whisper, zero
|
|
|
|
|
local model RAM, no cloud egress), and uses tmux's existing send-keys
|
2026-05-17 17:54:08 -07:00
|
|
|
protocol to deliver text — every layer is well-understood and stable.
|