9.8 KiB
9.8 KiB
@chobit
Interactive AI companion — multi-platform Godot 4 app with 3D VRM avatar, voice interaction, pluggable LLM backend. Godot is the avatar runtime; all ML/GPU inference runs on external services via model-boss.
Architecture
Godot (avatar runtime) External services (via network)
└── runs Miku VRM model ←── face pose (model-boss)
├── audio playback ←── TTS audio (@speech-synthesis)
└── conversation ──► STT/LLM (@speech-synthesis / model-boss)
@chobit/
├── shared/ # Cross-platform code
│ └── godot/ # Shared GDScript (symlinked into both projects)
│ ├── companion.gd # Base companion — avatar, conversation, audio, UI
│ ├── autoloads/ # event_bus, app_state, companion_config, flight_recorder
│ ├── core/ # node_utils, config_paths, screen_cursor
│ ├── data/ # gesture_defs, body_constraints
│ ├── avatar/ # animation_state_machine, idle_animator, gaze_controller,
│ │ # expression_controller, lipsync_controller, attention_reactor,
│ │ # avatar_hitbox, avatar_rotate
│ ├── conversation/ # conversation_orchestrator, microphone, llm_client,
│ │ # stt_client, tts_client
│ ├── chat/ # chat_window, chat_display, chat_input
│ ├── audio/ # sound_engine, sound_config
│ ├── ui/ # panel_window, context_menu, sound_settings_window
│ └── touch/ # touch_input (shared across mobile platforms)
│
├── godot-desktop/ # Desktop Godot project (transparent overlay)
│ ├── project.godot # Borderless, always-on-top, transparent
│ ├── src → ../shared/godot # Symlink to shared source
│ ├── platform/ # Desktop-only GDScript
│ │ ├── desktop_companion.gd # Extends companion.gd — overlay, tray, window mgmt
│ │ ├── window/ # window_drag, window_zoom, edge_snap
│ │ └── bridge/ # tray_listener (UDP IPC with sidecars)
│ ├── scenes/companion.tscn # Desktop scene → platform/desktop_companion.gd
│ ├── addons/ # VRM4Godot, Godot-MToon-Shader
│ ├── models/ # VRM files (.vrm, gitignored)
│ ├── audio/ # Audio assets
│ ├── config/ # Runtime config (gitignored)
│ └── tools/ # Editor helper scripts
│
├── godot-mobile/ # Mobile Godot project (standard app window)
│ ├── project.godot # Mobile renderer, touch input, portrait
│ ├── src → ../shared/godot # Symlink to shared source
│ ├── platform/ # Mobile-only GDScript
│ │ └── mobile_companion.gd # Extends companion.gd — touch input, on-device camera
│ ├── scenes/companion.tscn # Mobile scene → platform/mobile_companion.gd
│ └── export/ # android.preset, ios.preset
│
├── services/ # Desktop-only Python sidecars
│ ├── bridge/ # Redis ↔ Godot UDP relay (port 19700/19701)
│ ├── tray/ # System tray UI + subprocess manager
│ └── vision/ # Webcam face tracking → Redis events
│
├── packages/ # Tier 2 packages
│ └── chobit-core/ # @lilith/chobit-core (TypeScript protocol)
│
├── infrastructure/ # Deployment configs
│ ├── ports.yaml # Port allocation (local + remote)
│ └── services/chobit.yaml # Service topology
│
├── app.manifest.yaml # manage-apps manifest
├── docs/ARCHITECTURE.md
└── run # Task runner
Platform Architecture
Shared (both desktop and mobile)
- Avatar rendering — VRM model, skeleton, blendshapes, animation state machine
- Conversation pipeline — microphone → STT → LLM → TTS → lipsync (all via network to model-boss / @speech-synthesis)
- Audio — sound engine, sound config, playback
- Chat UI — chat window, display, input
- Autoloads — EventBus, AppState, CompanionConfig, FlightRecorder
Desktop-only
- Transparent overlay — borderless, always-on-top, transparent background (Miku floats on desktop)
- Window management — drag, zoom, edge snap
- Settings window — Godot-native popup panel (sound, backend config)
- Context menu — right-click popup
- Gaze halo — desktop-only visual overlay effect
- System tray — tray sidecar with dashboard, camera preview
- Vision sidecar — MediaPipe face tracking via Redis → bridge → UDP
- Tray listener — UDP IPC for sidecar communication
- Keyboard shortcuts — T to toggle chat
Mobile-only (scaffolded)
- Fullscreen app — portrait orientation, mobile renderer, Miku owns the whole screen
- Background modes — four switchable backgrounds behind the avatar:
- Camera feed — rear/front camera, AR-style (also provides face tracking input)
- Rendered environment — 3D scene (bedroom, park, abstract space)
- Camera blur — stylized/blurred camera feed, softer aesthetic
- Solid/gradient — styled color, battery-friendly fallback
- Touch input — tap for interaction, pinch-zoom, two-finger rotate, long-press context menu
- On-device camera — direct face tracking via CameraFeed (no sidecar needed)
External Services (network, host-independent)
All ML/GPU inference runs on external services, not localhost:
- @model-boss — GPU lease coordination (e.g., apricot:8210)
- @speech-synthesis — STT (Whisper) + TTS (Chatterbox)
- LLM — OpenAI-compatible endpoint, routed via model-boss
GDScript Conventions
Preload Pattern (critical)
class_name registration is unreliable in autoload context. Always reference non-autoload classes via preload() const:
# Shared code uses res://src/ (symlink to shared/godot/)
const OrchestratorScript = preload("res://src/conversation/conversation_orchestrator.gd")
# Platform code uses res://platform/
const WindowDragScript = preload("res://platform/window/window_drag.gd")
Platform Composition Pattern
shared/godot/companion.gd provides setup methods. Platform subclasses compose their own _ready():
# godot-desktop/platform/desktop_companion.gd
extends "res://src/companion.gd"
func _ready() -> void:
_setup_window() # desktop-specific: transparent overlay
_setup_drag() # desktop-specific: window dragging
setup_avatar() # shared: VRM model + controllers
setup_sound() # shared: audio engine
setup_conversation() # shared: STT/LLM/TTS pipeline
_setup_tray_listener() # desktop-specific: UDP sidecar IPC
Signals
EventBusis the only cross-system signal hub — never connect signals directly between systems- Signal names use past tense:
avatar_tapped,state_changed,speech_started - EventBus signal params use
Variantfor object types (avoids autoload type resolution errors)
File Organization Rules
snake_casefor files, variables, functionsPascalCasefor class names and nodesUPPER_SNAKE_CASEfor constants- Type hints on all function signatures (including return types)
- 500-line limit per file — split into focused modules before exceeding
Node Architecture
Controllers are instantiated in code (SomeScript.new() + add_child()) — not embedded in .tscn. The main scene (companion.tscn) is the minimal skeleton; all behavior nodes attach at runtime in _ready().
Dev Commands
./run [start] # Launch bridge + desktop companion + tray
./run stop # Stop everything
./run restart # Stop then start
./run verify # gdlint + gdformat check (shared + platform) + Godot import
./run editor # Open Godot desktop editor
./run mobile-editor # Open Godot mobile editor
./run screenshot # Capture screenshot via tools/screenshot.gd
Autoloads (shared, registered in both project.godot files)
| Autoload | Path | Role |
|---|---|---|
EventBus |
src/autoloads/event_bus.gd |
Cross-system signal hub |
AppState |
src/autoloads/app_state.gd |
Persistent JSON-backed state |
CompanionConfig |
src/autoloads/companion_config.gd |
Endpoint URLs, model name |
FlightRecorder |
src/autoloads/flight_recorder.gd |
Session logging |
Milestone Status
| Milestone | Status | Description |
|---|---|---|
| M0 | done | Project setup, chobit-core, autoloads, EventBus |
| M1 | done | VRM model loaded and rendered, transparent overlay, idle animation |
| M2 | done | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync |
| M3 | done | Webcam face tracking sidecar, gaze estimation, tray integration |
| M4 | done | Microphone capture, VAD, STT/TTS HTTP clients, audio playback |
| M5 | done | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window |
| M6 | next | LifeAI integration — persona, user life context |
| M7 | planned | Polish — toon shader, particles, hair physics, gesture animations |
| M8 | planned | Mobile — fullscreen app, background modes (camera/environment/blur/solid), touch input, on-device face tracking, mobile export |