session-tools/docs/rvoice.md

# rvoice — push-to-talk dictation for remote rclaude sessions

`/voice` in Claude Code opens the mic on **whichever host the claude binary is
running on**. When you're sshed to apricot through `cc` / `rclaude resume`,
that's apricot — which has no mic. `rvoice` fills the gap.

It records audio locally on macOS, transcribes via the **LAN speech-synthesis
service on apricot** (Whisper, GPU-accelerated, no API keys / no network
egress beyond the local LAN), and injects the transcript into the active
remote tmux session via `tmux send-keys` over ssh. The target session is
auto-detected from the focused iTerm2 tab title (set by the canonical
session-tools `tmux.conf` to `<host> · <session>`).

## Architecture

```
[ Right ⌥ down ]  ──Hammerspoon──▶  rvoice start  ──▶  ffmpeg → recording.wav
[ Right ⌥ up ]    ──Hammerspoon──▶  rvoice stop
                                          │
                                          ▼
                  POST WAV → http://apricot.lan:8000/stt/transcribe
                              (faster-whisper on GPU, ~base model)
                                          │
                                          ▼
                  iTerm2 active tab title → "apricot · claude-…"
                                          │
                                          ▼
                  ssh apricot tmux send-keys -t claude-… -l "<text>"
```

## Files

| Path                                                 | Role                                  |
|------------------------------------------------------|---------------------------------------|
| `bin/rvoice`                                         | CLI: `start`/`stop`/`cancel`/`target`/`log` |
| `hammerspoon/rvoice.lua`                             | Right-⌥ hold detector → calls `rvoice` |
| `~/.config/rvoice/config`                            | Sourced at startup; overrides STT URL, model, etc. |
| `$TMPDIR/rvoice/`                                    | Per-recording state (pid, wav, log)   |

## Install

Prerequisites: `ffmpeg`, `jq`, `curl` (all `brew install`able), Hammerspoon
(`brew install --cask hammerspoon`), and the LAN speech-synthesis service
running on apricot (already deployed at `apricot.lan:8000`, exposes
`/stt/transcribe`). No API keys, no cloud round-trip.

```sh
# 1. Symlink rvoice (already done if you ran install.sh)
ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice

# 2. (Optional) override defaults in ~/.config/rvoice/config — see the
#    "Config" section below. The default is to POST to apricot.lan:8000 and
#    use the `base` Whisper model.

# 3. Wire up Hammerspoon
mkdir -p ~/.hammerspoon
ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua
echo 'require("rvoice")' >> ~/.hammerspoon/init.lua
open /Applications/Hammerspoon.app

# 4. From Hammerspoon's menu bar → Reload Config.
#    Grant Accessibility + Microphone permission when macOS prompts.

# 5. Smoke-test the STT endpoint without Hammerspoon:
ffmpeg -f avfoundation -i ":0" -ac 1 -ar 16000 -t 5 /tmp/me.wav
curl -F "audio=@/tmp/me.wav" -F "model=base" -F "language=en" -F "task=transcribe" \
  http://apricot.lan:8000/stt/transcribe | jq .text
```

## Usage

From any iTerm2 tab that's attached to a remote claude session via `cc` or
`rclaude resume`:

1. **Hold Right ⌥** → "listening…" notification, Tink sound
2. **Speak**
3. **Release** → recording stops, transcript types into your claude prompt,
   Pop sound on success / Funk sound on error
4. **Hit Enter** when you're ready (review first), or set `RVOICE_AUTOSEND=1`
   to skip the manual confirmation

## Config (`~/.config/rvoice/config`)

Plain shell fragment sourced at startup. Defaults shown.

```sh
export RVOICE_STT_URL=http://apricot.lan:8000        # speech-synthesis service
export RVOICE_MODEL=base                             # tiny|base|small|medium|large-v2|large-v3
export RVOICE_LANG=en                                # omit/empty = auto-detect
export RVOICE_AUTOSEND=0                             # 1 = press Enter after inject
export RVOICE_MIN_MS=200                             # ignore taps shorter than this (debounce)
export RVOICE_MAX_S=60                               # hard cap on a single recording
export RVOICE_HOST=apricot.lan                       # force target host (overrides iTerm2 detection)
export RVOICE_SESSION=claude-natalie-…               # force target tmux session
```

Override any of these per-invocation: `RVOICE_MODEL=small rvoice stop`.

**Model trade-offs** (apricot's GPU; latency rough):
- `tiny.en` / `base` — sub-second, fine for short prompts
- `small` — ~1s, noticeable quality bump
- `medium` / `large-v3` — 2-4s, near-perfect, worth it for paragraphs

## Subcommands

```sh
rvoice start    # begin recording (Hammerspoon calls this on key-down)
rvoice stop     # stop, transcribe, inject (called on key-up)
rvoice cancel   # stop without transcribing (called on quick-tap abort)
rvoice target   # debug: echo the host+session rvoice WOULD inject into
rvoice log      # tail -50 of the action log
```

## Troubleshooting

- **"STT request failed"** — apricot's speech service isn't reachable. Check
  `curl http://apricot.lan:8000/health` and `ssh apricot.lan systemctl --user
  status` for the relevant unit. Most likely you're off the LAN/VPN.
- **"no target session resolvable"** — the focused iTerm2 tab title isn't in
  `<host> · <session>` format. Either: (a) you're not in an rclaude/ssh
  session, or (b) the remote tmux config didn't get the title-setting fragment.
  `rclaude install --on <host>` re-pushes the canonical tmux config; verify
  with `ssh <host> 'tmux show-options -g | grep set-titles'`.
- **Hammerspoon doesn't see Right ⌥** — System Settings → Privacy &
  Security → Accessibility → enable Hammerspoon. Also Microphone for the
  recording step. Restart Hammerspoon after granting.
- **Transcription returns empty / nonsense** — bump the model: `RVOICE_MODEL=small`
  or `medium`. Default `base` trades accuracy for sub-second latency. Models
  list: `curl http://apricot.lan:8000/stt/models`.
- **Injection types into the wrong session** — `rvoice target` shows what it
  will hit. If wrong, set `RVOICE_HOST` / `RVOICE_SESSION` in config to pin
  the target.
- **Latency feels high** — first call after service idle warms the model on
  apricot's GPU (1-2s one-time). Subsequent calls are sub-second for `base`.
  Switch to `tiny.en` for the lowest-latency tier.

## Why this architecture (vs. /voice over ssh)

`/voice` is a feature of the `claude` binary itself; it opens the mic via
the OS audio API on whichever host it runs on. ssh has no audio channel and
doesn't forward CoreAudio events. The only ways to make `/voice` work over a
remote rclaude session would be:

1. **Run claude locally** (lose apricot's compute / project files / LAN
   services — not viable for our workflow)
2. **Forward audio via PulseAudio** (brittle on macOS, breaks on every
   claude release)
3. **Reproduce /voice's behavior with our own pieces** ← this is rvoice

`rvoice` keeps the mic and the hotkey on the Mac, runs transcription on
apricot's own LAN-resident speech-synthesis service (GPU Whisper, zero
local model RAM, no cloud egress), and uses tmux's existing send-keys
protocol to deliver text — every layer is well-understood and stable.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`# rvoice — push-to-talk dictation for remote rclaude sessions`

			`/voice` in Claude Code opens the mic on **whichever host the claude binary is
			running on**. When you're sshed to apricot through `cc` / `rclaude resume`,
			that's apricot — which has no mic. `rvoice` fills the gap.

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`It records audio locally on macOS, transcribes via the **LAN speech-synthesis`
			`service on apricot** (Whisper, GPU-accelerated, no API keys / no network`
			`egress beyond the local LAN), and injects the transcript into the active`
			remote tmux session via `tmux send-keys` over ssh. The target session is
			`auto-detected from the focused iTerm2 tab title (set by the canonical`
			session-tools `tmux.conf` to `<host> · <session>`).
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00
			`## Architecture`

			```
			`[ Right ⌥ down ] ──Hammerspoon──▶ rvoice start ──▶ ffmpeg → recording.wav`
			`[ Right ⌥ up ] ──Hammerspoon──▶ rvoice stop`
			`│`
			`▼`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`POST WAV → http://apricot.lan:8000/stt/transcribe`
			`(faster-whisper on GPU, ~base model)`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`│`
			`▼`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`iTerm2 active tab title → "apricot · claude-…"`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`│`
			`▼`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`ssh apricot tmux send-keys -t claude-… -l "<text>"`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			```

			`## Files`

			`\| Path \| Role \|`
			`\|------------------------------------------------------\|---------------------------------------\|`
			\| `bin/rvoice` \| CLI: `start`/`stop`/`cancel`/`target`/`log` \|
			\| `hammerspoon/rvoice.lua` \| Right-⌥ hold detector → calls `rvoice` \|
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			\| `~/.config/rvoice/config` \| Sourced at startup; overrides STT URL, model, etc. \|
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			\| `$TMPDIR/rvoice/` \| Per-recording state (pid, wav, log) \|

			`## Install`

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			Prerequisites: `ffmpeg`, `jq`, `curl` (all `brew install`able), Hammerspoon
			(`brew install --cask hammerspoon`), and the LAN speech-synthesis service
			running on apricot (already deployed at `apricot.lan:8000`, exposes
			`/stt/transcribe`). No API keys, no cloud round-trip.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00
			```sh
			`# 1. Symlink rvoice (already done if you ran install.sh)`
			`ln -sfn ~/Code/@scripts/session-tools/bin/rvoice ~/.local/bin/rvoice`

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`# 2. (Optional) override defaults in ~/.config/rvoice/config — see the`
			`# "Config" section below. The default is to POST to apricot.lan:8000 and`
			# use the `base` Whisper model.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00
			`# 3. Wire up Hammerspoon`
			`mkdir -p ~/.hammerspoon`
			`ln -sfn ~/Code/@scripts/session-tools/hammerspoon/rvoice.lua ~/.hammerspoon/rvoice.lua`
			`echo 'require("rvoice")' >> ~/.hammerspoon/init.lua`
			`open /Applications/Hammerspoon.app`

			`# 4. From Hammerspoon's menu bar → Reload Config.`
			`# Grant Accessibility + Microphone permission when macOS prompts.`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00
			`# 5. Smoke-test the STT endpoint without Hammerspoon:`
			`ffmpeg -f avfoundation -i ":0" -ac 1 -ar 16000 -t 5 /tmp/me.wav`
			`curl -F "audio=@/tmp/me.wav" -F "model=base" -F "language=en" -F "task=transcribe" \`
			`http://apricot.lan:8000/stt/transcribe \| jq .text`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			```

			`## Usage`

			From any iTerm2 tab that's attached to a remote claude session via `cc` or
			`rclaude resume`:

			`1. Hold Right ⌥ → "listening…" notification, Tink sound`
			`2. Speak`
			`3. Release → recording stops, transcript types into your claude prompt,`
			`Pop sound on success / Funk sound on error`
			4. Hit Enter when you're ready (review first), or set `RVOICE_AUTOSEND=1`
			`to skip the manual confirmation`

			## Config (`~/.config/rvoice/config`)

			`Plain shell fragment sourced at startup. Defaults shown.`

			```sh
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`export RVOICE_STT_URL=http://apricot.lan:8000 # speech-synthesis service`
			`export RVOICE_MODEL=base # tiny\|base\|small\|medium\|large-v2\|large-v3`
			`export RVOICE_LANG=en # omit/empty = auto-detect`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`export RVOICE_AUTOSEND=0 # 1 = press Enter after inject`
			`export RVOICE_MIN_MS=200 # ignore taps shorter than this (debounce)`
			`export RVOICE_MAX_S=60 # hard cap on a single recording`
			`export RVOICE_HOST=apricot.lan # force target host (overrides iTerm2 detection)`
			`export RVOICE_SESSION=claude-natalie-… # force target tmux session`
			```

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			Override any of these per-invocation: `RVOICE_MODEL=small rvoice stop`.

			`Model trade-offs (apricot's GPU; latency rough):`
			- `tiny.en` / `base` — sub-second, fine for short prompts
			- `small` — ~1s, noticeable quality bump
			- `medium` / `large-v3` — 2-4s, near-perfect, worth it for paragraphs
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00
			`## Subcommands`

			```sh
			`rvoice start # begin recording (Hammerspoon calls this on key-down)`
			`rvoice stop # stop, transcribe, inject (called on key-up)`
			`rvoice cancel # stop without transcribing (called on quick-tap abort)`
			`rvoice target # debug: echo the host+session rvoice WOULD inject into`
			`rvoice log # tail -50 of the action log`
			```

			`## Troubleshooting`

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`- "STT request failed" — apricot's speech service isn't reachable. Check`
			`curl http://apricot.lan:8000/health` and `ssh apricot.lan systemctl --user
			status` for the relevant unit. Most likely you're off the LAN/VPN.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`- "no target session resolvable" — the focused iTerm2 tab title isn't in`
			`<host> · <session>` format. Either: (a) you're not in an rclaude/ssh
			`session, or (b) the remote tmux config didn't get the title-setting fragment.`
			`rclaude install --on <host>` re-pushes the canonical tmux config; verify
			with `ssh <host> 'tmux show-options -g \| grep set-titles'`.
			`- Hammerspoon doesn't see Right ⌥ — System Settings → Privacy &`
			`Security → Accessibility → enable Hammerspoon. Also Microphone for the`
			`recording step. Restart Hammerspoon after granting.`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			- Transcription returns empty / nonsense — bump the model: `RVOICE_MODEL=small`
			or `medium`. Default `base` trades accuracy for sub-second latency. Models
			list: `curl http://apricot.lan:8000/stt/models`.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			- Injection types into the wrong session — `rvoice target` shows what it
			will hit. If wrong, set `RVOICE_HOST` / `RVOICE_SESSION` in config to pin
			`the target.`
docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`- Latency feels high — first call after service idle warms the model on`
			apricot's GPU (1-2s one-time). Subsequent calls are sub-second for `base`.
			Switch to `tiny.en` for the lowest-latency tier.
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00
			`## Why this architecture (vs. /voice over ssh)`

			`/voice` is a feature of the `claude` binary itself; it opens the mic via
			`the OS audio API on whichever host it runs on. ssh has no audio channel and`
			doesn't forward CoreAudio events. The only ways to make `/voice` work over a
			`remote rclaude session would be:`

			`1. Run claude locally (lose apricot's compute / project files / LAN`
			`services — not viable for our workflow)`
			`2. Forward audio via PulseAudio (brittle on macOS, breaks on every`
			`claude release)`
			`3. Reproduce /voice's behavior with our own pieces ← this is rvoice`

docs(@scripts): ✨ update rvoice docs to use LAN speech-synthesis Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 18:12:14 -07:00			`rvoice` keeps the mic and the hotkey on the Mac, runs transcription on
			`apricot's own LAN-resident speech-synthesis service (GPU Whisper, zero`
			`local model RAM, no cloud egress), and uses tmux's existing send-keys`
feat(@scripts/session-tools): ✨ add rvoice dictation tool Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> 2026-05-17 17:54:08 -07:00			`protocol to deliver text — every layer is well-understood and stable.`