194 lines
10 KiB
Markdown
194 lines
10 KiB
Markdown
# @chobit
|
|
|
|
Interactive AI companion — multi-platform Godot 4 app with 3D VRM avatar, voice interaction, pluggable LLM backend. Godot is the avatar runtime; all ML/GPU inference runs on external services via model-boss.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Godot (avatar runtime) External services (via network)
|
|
└── runs Miku VRM model ←── face pose (model-boss)
|
|
├── audio playback ←── TTS audio (@speech-synthesis)
|
|
└── conversation ──► STT/LLM (@speech-synthesis / model-boss)
|
|
```
|
|
|
|
```
|
|
@chobit/
|
|
├── shared/ # Cross-platform code
|
|
│ └── godot/ # Shared GDScript (symlinked into both projects)
|
|
│ ├── companion.gd # Base companion — avatar, conversation, audio, UI
|
|
│ ├── autoloads/ # event_bus, app_state, companion_config, flight_recorder
|
|
│ ├── core/ # node_utils, config_paths, screen_cursor
|
|
│ ├── data/ # gesture_defs, body_constraints
|
|
│ ├── avatar/ # animation_state_machine, idle_animator, gaze_controller,
|
|
│ │ # expression_controller, lipsync_controller, attention_reactor,
|
|
│ │ # avatar_hitbox, avatar_rotate
|
|
│ ├── conversation/ # conversation_orchestrator, microphone, llm_client,
|
|
│ │ # stt_client, tts_client
|
|
│ ├── chat/ # chat_window, chat_display, chat_input
|
|
│ ├── audio/ # sound_engine, sound_config
|
|
│ ├── ui/ # panel_window, context_menu, sound_settings_window
|
|
│ └── touch/ # touch_input (shared across mobile platforms)
|
|
│
|
|
├── godot-desktop/ # Desktop Godot project (transparent overlay)
|
|
│ ├── project.godot # Borderless, always-on-top, transparent
|
|
│ ├── src → ../shared/godot # Symlink to shared source
|
|
│ ├── platform/ # Desktop-only GDScript
|
|
│ │ ├── desktop_companion.gd # Extends companion.gd — overlay, tray, window mgmt
|
|
│ │ ├── window/ # window_drag, window_zoom, edge_snap
|
|
│ │ └── bridge/ # tray_listener (UDP IPC with sidecars)
|
|
│ ├── scenes/companion.tscn # Desktop scene → platform/desktop_companion.gd
|
|
│ ├── addons/ # VRM4Godot, Godot-MToon-Shader
|
|
│ ├── models/ # VRM files (.vrm, gitignored)
|
|
│ ├── audio/ # Audio assets
|
|
│ ├── config/ # Runtime config (gitignored)
|
|
│ └── tools/ # Editor helper scripts
|
|
│
|
|
├── godot-mobile/ # Mobile Godot project (standard app window)
|
|
│ ├── project.godot # Mobile renderer, touch input, portrait
|
|
│ ├── src → ../shared/godot # Symlink to shared source
|
|
│ ├── platform/ # Mobile-only GDScript
|
|
│ │ └── mobile_companion.gd # Extends companion.gd — touch input, on-device camera
|
|
│ ├── scenes/companion.tscn # Mobile scene → platform/mobile_companion.gd
|
|
│ └── export/ # android.preset, ios.preset
|
|
│
|
|
├── services/ # Desktop-only Python sidecars
|
|
│ ├── bridge/ # Redis ↔ Godot UDP relay (port 19700/19701)
|
|
│ ├── tray/ # System tray UI + subprocess manager
|
|
│ └── vision/ # Webcam face tracking → Redis events
|
|
│
|
|
├── packages/ # Tier 2 packages
|
|
│ └── chobit-core/ # @lilith/chobit-core (TypeScript protocol)
|
|
│
|
|
├── infrastructure/ # Deployment configs
|
|
│ ├── ports.yaml # Port allocation (local + remote)
|
|
│ └── services/chobit.yaml # Service topology
|
|
│
|
|
├── app.manifest.yaml # manage-apps manifest
|
|
├── docs/ARCHITECTURE.md
|
|
└── run # Task runner
|
|
```
|
|
|
|
## Platform Architecture
|
|
|
|
### Shared (both desktop and mobile)
|
|
- **Avatar rendering** — VRM model, skeleton, blendshapes, animation state machine
|
|
- **Conversation pipeline** — microphone → STT → LLM → TTS → lipsync (all via network to model-boss / @speech-synthesis)
|
|
- **Audio** — sound engine, sound config, playback
|
|
- **Chat UI** — chat window, display, input
|
|
- **Autoloads** — EventBus, AppState, CompanionConfig, FlightRecorder
|
|
|
|
### Desktop-only
|
|
- **Transparent overlay** — borderless, always-on-top, transparent background (Miku floats on desktop)
|
|
- **Window management** — drag, zoom, edge snap
|
|
- **Settings window** — Godot-native popup panel (sound, backend config)
|
|
- **Context menu** — right-click popup
|
|
- **Gaze halo** — desktop-only visual overlay effect
|
|
- **System tray** — tray sidecar with dashboard, camera preview
|
|
- **Vision sidecar** — MediaPipe face tracking via Redis → bridge → UDP
|
|
- **Tray listener** — UDP IPC for sidecar communication
|
|
- **Keyboard shortcuts** — T to toggle chat
|
|
|
|
### Mobile-only (scaffolded)
|
|
- **Fullscreen app** — portrait orientation, mobile renderer, Miku owns the whole screen
|
|
- **Background modes** — four switchable backgrounds behind the avatar:
|
|
- **Camera feed** — rear/front camera, AR-style (also provides face tracking input)
|
|
- **Rendered environment** — 3D scene (bedroom, park, abstract space)
|
|
- **Camera blur** — stylized/blurred camera feed, softer aesthetic
|
|
- **Solid/gradient** — styled color, battery-friendly fallback
|
|
- **Touch input** — tap for interaction, pinch-zoom, two-finger rotate, long-press context menu
|
|
- **On-device camera** — direct face tracking via CameraFeed (no sidecar needed)
|
|
|
|
### External Services (network, host-independent)
|
|
All ML/GPU inference runs on external services, not localhost:
|
|
- **@model-boss** — GPU lease coordination (e.g., apricot:8210)
|
|
- **@speech-synthesis** — STT (Whisper) + TTS (Chatterbox)
|
|
- **LLM** — OpenAI-compatible endpoint, routed via model-boss
|
|
|
|
## GDScript Conventions
|
|
|
|
### Preload Pattern (critical)
|
|
`class_name` registration is unreliable in autoload context. **Always reference non-autoload classes via `preload()` const**:
|
|
|
|
```gdscript
|
|
# Shared code uses res://src/ (symlink to shared/godot/)
|
|
const OrchestratorScript = preload("res://src/conversation/conversation_orchestrator.gd")
|
|
|
|
# Platform code uses res://platform/
|
|
const WindowDragScript = preload("res://platform/window/window_drag.gd")
|
|
```
|
|
|
|
### Platform Composition Pattern
|
|
`shared/godot/companion.gd` provides setup methods. Platform subclasses compose their own `_ready()`:
|
|
|
|
```gdscript
|
|
# godot-desktop/platform/desktop_companion.gd
|
|
extends "res://src/companion.gd"
|
|
|
|
func _ready() -> void:
|
|
_setup_window() # desktop-specific: transparent overlay
|
|
_setup_drag() # desktop-specific: window dragging
|
|
setup_avatar() # shared: VRM model + controllers
|
|
setup_sound() # shared: audio engine
|
|
setup_conversation() # shared: STT/LLM/TTS pipeline
|
|
_setup_tray_listener() # desktop-specific: UDP sidecar IPC
|
|
```
|
|
|
|
### Signals
|
|
- `EventBus` is the only cross-system signal hub — never connect signals directly between systems
|
|
- Signal names use **past tense**: `avatar_tapped`, `state_changed`, `speech_started`
|
|
- EventBus signal params use `Variant` for object types (avoids autoload type resolution errors)
|
|
|
|
### File Organization Rules
|
|
- `snake_case` for files, variables, functions
|
|
- `PascalCase` for class names and nodes
|
|
- `UPPER_SNAKE_CASE` for constants
|
|
- Type hints on all function signatures (including return types)
|
|
- 500-line limit per file — split into focused modules before exceeding
|
|
|
|
### Node Architecture
|
|
Controllers are instantiated in code (`SomeScript.new()` + `add_child()`) — **not** embedded in `.tscn`. The main scene (`companion.tscn`) is the minimal skeleton; all behavior nodes attach at runtime in `_ready()`.
|
|
|
|
## Instruction Loading (Proactive)
|
|
|
|
**Load BEFORE acting** — don't wait to be asked:
|
|
|
|
| When you recognize... | Immediately load... |
|
|
|----------------------|---------------------|
|
|
| GDScript authoring, scene architecture, controller design | `~/.claude/instructions/godot-code-standards.md` |
|
|
|
|
---
|
|
|
|
## Dev Commands
|
|
|
|
```bash
|
|
./run [start] # Launch bridge + desktop companion + tray
|
|
./run stop # Stop everything
|
|
./run restart # Stop then start
|
|
./run verify # gdlint + gdformat check (shared + platform) + Godot import
|
|
./run editor # Open Godot desktop editor
|
|
./run mobile-editor # Open Godot mobile editor
|
|
./run screenshot # Capture screenshot via tools/screenshot.gd
|
|
```
|
|
|
|
## Autoloads (shared, registered in both project.godot files)
|
|
|
|
| Autoload | Path | Role |
|
|
|----------|------|------|
|
|
| `EventBus` | `src/autoloads/event_bus.gd` | Cross-system signal hub |
|
|
| `AppState` | `src/autoloads/app_state.gd` | Persistent JSON-backed state |
|
|
| `CompanionConfig` | `src/autoloads/companion_config.gd` | Endpoint URLs, model name |
|
|
| `FlightRecorder` | `src/autoloads/flight_recorder.gd` | Session logging |
|
|
|
|
## Milestone Status
|
|
|
|
| Milestone | Status | Description |
|
|
|-----------|--------|-------------|
|
|
| M0 | done | Project setup, chobit-core, autoloads, EventBus |
|
|
| M1 | done | VRM model loaded and rendered, transparent overlay, idle animation |
|
|
| M2 | done | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync |
|
|
| M3 | done | Webcam face tracking sidecar, gaze estimation, tray integration |
|
|
| M4 | done | Microphone capture, VAD, STT/TTS HTTP clients, audio playback |
|
|
| M5 | done | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window |
|
|
| M6 | next | LifeAI integration — persona, user life context |
|
|
| M7 | planned | Polish — toon shader, particles, hair physics, gesture animations |
|
|
| M8 | planned | Mobile — fullscreen app, background modes (camera/environment/blur/solid), touch input, on-device face tracking, mobile export |
|