chobit/CLAUDE.md

195 lines
10 KiB
Markdown
Raw Normal View History

# @chobit
Interactive AI companion — multi-platform Godot 4 app with 3D VRM avatar, voice interaction, pluggable LLM backend. Godot is the avatar runtime; all ML/GPU inference runs on external services via model-boss.
## Architecture
```
Godot (avatar runtime) External services (via network)
└── runs Miku VRM model ←── face pose (model-boss)
├── audio playback ←── TTS audio (@speech-synthesis)
└── conversation ──► STT/LLM (@speech-synthesis / model-boss)
```
```
@chobit/
├── shared/ # Cross-platform code
│ └── godot/ # Shared GDScript (symlinked into both projects)
│ ├── companion.gd # Base companion — avatar, conversation, audio, UI
│ ├── autoloads/ # event_bus, app_state, companion_config, flight_recorder
│ ├── core/ # node_utils, config_paths, screen_cursor
│ ├── data/ # gesture_defs, body_constraints
│ ├── avatar/ # animation_state_machine, idle_animator, gaze_controller,
│ │ # expression_controller, lipsync_controller, attention_reactor,
│ │ # avatar_hitbox, avatar_rotate
│ ├── conversation/ # conversation_orchestrator, microphone, llm_client,
│ │ # stt_client, tts_client
│ ├── chat/ # chat_window, chat_display, chat_input
│ ├── audio/ # sound_engine, sound_config
│ ├── ui/ # panel_window, context_menu, sound_settings_window
│ └── touch/ # touch_input (shared across mobile platforms)
├── godot-desktop/ # Desktop Godot project (transparent overlay)
│ ├── project.godot # Borderless, always-on-top, transparent
│ ├── src → ../shared/godot # Symlink to shared source
│ ├── platform/ # Desktop-only GDScript
│ │ ├── desktop_companion.gd # Extends companion.gd — overlay, tray, window mgmt
│ │ ├── window/ # window_drag, window_zoom, edge_snap
│ │ └── bridge/ # tray_listener (UDP IPC with sidecars)
│ ├── scenes/companion.tscn # Desktop scene → platform/desktop_companion.gd
│ ├── addons/ # VRM4Godot, Godot-MToon-Shader
│ ├── models/ # VRM files (.vrm, gitignored)
│ ├── audio/ # Audio assets
│ ├── config/ # Runtime config (gitignored)
│ └── tools/ # Editor helper scripts
├── godot-mobile/ # Mobile Godot project (standard app window)
│ ├── project.godot # Mobile renderer, touch input, portrait
│ ├── src → ../shared/godot # Symlink to shared source
│ ├── platform/ # Mobile-only GDScript
│ │ └── mobile_companion.gd # Extends companion.gd — touch input, on-device camera
│ ├── scenes/companion.tscn # Mobile scene → platform/mobile_companion.gd
│ └── export/ # android.preset, ios.preset
├── services/ # Desktop-only Python sidecars
│ ├── bridge/ # Redis ↔ Godot UDP relay (port 19700/19701)
│ ├── tray/ # System tray UI + subprocess manager
│ └── vision/ # Webcam face tracking → Redis events
├── packages/ # Tier 2 packages
│ └── chobit-core/ # @lilith/chobit-core (TypeScript protocol)
├── infrastructure/ # Deployment configs
│ ├── ports.yaml # Port allocation (local + remote)
│ └── services/chobit.yaml # Service topology
├── app.manifest.yaml # manage-apps manifest
├── docs/ARCHITECTURE.md
└── run # Task runner
```
## Platform Architecture
### Shared (both desktop and mobile)
- **Avatar rendering** — VRM model, skeleton, blendshapes, animation state machine
- **Conversation pipeline** — microphone → STT → LLM → TTS → lipsync (all via network to model-boss / @speech-synthesis)
- **Audio** — sound engine, sound config, playback
- **Chat UI** — chat window, display, input
- **Autoloads** — EventBus, AppState, CompanionConfig, FlightRecorder
### Desktop-only
- **Transparent overlay** — borderless, always-on-top, transparent background (Miku floats on desktop)
- **Window management** — drag, zoom, edge snap
- **Settings window** — Godot-native popup panel (sound, backend config)
- **Context menu** — right-click popup
- **Gaze halo** — desktop-only visual overlay effect
- **System tray** — tray sidecar with dashboard, camera preview
- **Vision sidecar** — MediaPipe face tracking via Redis → bridge → UDP
- **Tray listener** — UDP IPC for sidecar communication
- **Keyboard shortcuts** — T to toggle chat
### Mobile-only (scaffolded)
- **Fullscreen app** — portrait orientation, mobile renderer, Miku owns the whole screen
- **Background modes** — four switchable backgrounds behind the avatar:
- **Camera feed** — rear/front camera, AR-style (also provides face tracking input)
- **Rendered environment** — 3D scene (bedroom, park, abstract space)
- **Camera blur** — stylized/blurred camera feed, softer aesthetic
- **Solid/gradient** — styled color, battery-friendly fallback
- **Touch input** — tap for interaction, pinch-zoom, two-finger rotate, long-press context menu
- **On-device camera** — direct face tracking via CameraFeed (no sidecar needed)
### External Services (network, host-independent)
All ML/GPU inference runs on external services, not localhost:
- **@model-boss** — GPU lease coordination (e.g., apricot:8210)
- **@speech-synthesis** — STT (Whisper) + TTS (Chatterbox)
- **LLM** — OpenAI-compatible endpoint, routed via model-boss
## GDScript Conventions
### Preload Pattern (critical)
`class_name` registration is unreliable in autoload context. **Always reference non-autoload classes via `preload()` const**:
```gdscript
# Shared code uses res://src/ (symlink to shared/godot/)
const OrchestratorScript = preload("res://src/conversation/conversation_orchestrator.gd")
# Platform code uses res://platform/
const WindowDragScript = preload("res://platform/window/window_drag.gd")
```
### Platform Composition Pattern
`shared/godot/companion.gd` provides setup methods. Platform subclasses compose their own `_ready()`:
```gdscript
# godot-desktop/platform/desktop_companion.gd
extends "res://src/companion.gd"
func _ready() -> void:
_setup_window() # desktop-specific: transparent overlay
_setup_drag() # desktop-specific: window dragging
setup_avatar() # shared: VRM model + controllers
setup_sound() # shared: audio engine
setup_conversation() # shared: STT/LLM/TTS pipeline
_setup_tray_listener() # desktop-specific: UDP sidecar IPC
```
### Signals
- `EventBus` is the only cross-system signal hub — never connect signals directly between systems
- Signal names use **past tense**: `avatar_tapped`, `state_changed`, `speech_started`
- EventBus signal params use `Variant` for object types (avoids autoload type resolution errors)
### File Organization Rules
- `snake_case` for files, variables, functions
- `PascalCase` for class names and nodes
- `UPPER_SNAKE_CASE` for constants
- Type hints on all function signatures (including return types)
- 500-line limit per file — split into focused modules before exceeding
### Node Architecture
Controllers are instantiated in code (`SomeScript.new()` + `add_child()`) — **not** embedded in `.tscn`. The main scene (`companion.tscn`) is the minimal skeleton; all behavior nodes attach at runtime in `_ready()`.
## Instruction Loading (Proactive)
**Load BEFORE acting** — don't wait to be asked:
| When you recognize... | Immediately load... |
|----------------------|---------------------|
| GDScript authoring, scene architecture, controller design | `~/.claude/instructions/godot-code-standards.md` |
---
## Dev Commands
```bash
./run [start] # Launch bridge + desktop companion + tray
./run stop # Stop everything
./run restart # Stop then start
./run verify # gdlint + gdformat check (shared + platform) + Godot import
./run editor # Open Godot desktop editor
./run mobile-editor # Open Godot mobile editor
./run screenshot # Capture screenshot via tools/screenshot.gd
```
## Autoloads (shared, registered in both project.godot files)
| Autoload | Path | Role |
|----------|------|------|
| `EventBus` | `src/autoloads/event_bus.gd` | Cross-system signal hub |
| `AppState` | `src/autoloads/app_state.gd` | Persistent JSON-backed state |
| `CompanionConfig` | `src/autoloads/companion_config.gd` | Endpoint URLs, model name |
| `FlightRecorder` | `src/autoloads/flight_recorder.gd` | Session logging |
## Milestone Status
| Milestone | Status | Description |
|-----------|--------|-------------|
| M0 | done | Project setup, chobit-core, autoloads, EventBus |
| M1 | done | VRM model loaded and rendered, transparent overlay, idle animation |
| M2 | done | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync |
| M3 | done | Webcam face tracking sidecar, gaze estimation, tray integration |
| M4 | done | Microphone capture, VAD, STT/TTS HTTP clients, audio playback |
| M5 | done | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window |
| M6 | next | LifeAI integration — persona, user life context |
| M7 | planned | Polish — toon shader, particles, hair physics, gesture animations |
| M8 | planned | Mobile — fullscreen app, background modes (camera/environment/blur/solid), touch input, on-device face tracking, mobile export |