docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Claude Code 2026-03-28 04:11:55 -07:00
parent 7fc8fe80e0
commit 9abcdeac9d

View file

@ -20,7 +20,7 @@ Stream-based project management for the Chobit interactive AI companion.
## Active Streams
None yet — project is in initial scaffolding phase.
None active.
## Milestone Roadmap
@ -31,59 +31,56 @@ None yet — project is in initial scaffolding phase.
- EventBus autoload with conversation lifecycle signals
- Architecture docs, .gitignore, project structure
### M1: Godot Skeleton
- Install VRM4Godot addon
- Download test VRM model (free from VRoid Hub)
- Create `companion.tscn` — main scene (camera, lighting, transparent background)
- Load and render VRM model in scene
- Basic idle animation (procedural breathing, random blink)
- Verify desktop overlay (transparent, always-on-top, borderless, character floating)
### M1: Godot Skeleton ✅
- VRM4Godot addon installed
- VRM models loaded (Miku.vrm, Seed-san.vrm)
- companion.tscn — transparent window, camera, lighting, avatar root
- Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd)
- Desktop overlay verified (transparent, always-on-top, borderless)
### M2: Avatar Animation & Attention System
- AnimationTree state machine (idle, listening, processing, speaking, interrupted)
- Expression blendshapes driven by emotion input (6 VRM blendshapes)
- **Desktop Gaze** — cursor tracking via LookAtModifier3D (idle mode)
- **Face-to-Face** — webcam-based gaze target (conversation mode)
- Gaze mode transition (smooth blend on conversation state change)
- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape
### M2: Avatar Animation & Attention System
- AnimationTree FSM (idle, listening, processing, speaking, interrupted)
- Expression blendshapes (6 VRM expressions via expression_controller.gd)
- Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode)
- Face-to-Face — webcam gaze target blend on conversation state change
- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd)
- attention_reactor.gd for event-driven gaze/posture reactions
### M3: Motion Mirroring
- Webcam gesture detection pipeline (MediaPipe or lightweight classifier)
- Gesture classification: wave, nod, head cock, head shake, lean, thumbs up
- Gesture → animation trigger mapping with personality variance
- Deliberate response delay (0.2-0.5s) for natural feel
- Mirroring as overlay layer on AnimationTree (blends with conversation state)
- Graceful fallback when no camera available
### M3: Sidecars & Tray Integration ✅
- vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze.*, chobit.face.*)
- bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701)
- tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management
- tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior
- ./run script: start/stop/restart/verify/editor/screenshot
### M4: Voice Pipeline
- Microphone capture via AudioEffectCapture
- VAD (voice activity detection) in GDScript (energy-based + optional Silero)
- HTTP client for STT (@speech-synthesis Whisper endpoint)
- HTTP client for TTS (@speech-synthesis Chatterbox endpoint)
- Audio playback queue with lipsync coordination
### M4: Voice Pipeline
- microphone.gd: AudioEffectCapture + energy-based VAD
- stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint
- tts_client.gd: HTTP client for Chatterbox TTS endpoint
- sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination
- Startup sound (uwu-base.mp3)
### M5: Conversation Loop
- LLM client (HTTP streaming, OpenAI-compatible)
- Sentence streaming (buffer tokens → sentences → TTS) matching chobit-core SentenceStream
- Emotion extraction from LLM output matching chobit-core EmotionExtractor
- Full loop: VAD → STT → LLM → TTS → avatar animation
- Voice interruption (cancel stream, stop audio, transition to listening)
- Conversation history management
### M5: Conversation Loop ✅
- llm_client.gd: HTTP streaming, OpenAI-compatible
- conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop
- Sentence-level streaming matching chobit-core SentenceStream
- Emotion extraction matching chobit-core EmotionExtractor
- Voice interruption (cancel stream, stop audio, → listening)
- chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd
- window_drag.gd, window_zoom.gd, edge_snap.gd: window management
### M6: LifeAI Integration
### M6: LifeAI Integration 🔲
- Connect to LifeAI companion service endpoint
- Persona and character context from LifeAI
- User life context (habits, goals, schedule)
- Embed as desktop companion for the @life platform
### M7: Polish
### M7: Polish 🔲
- Toon/anime shader for character rendering
- Particle effects for emotional states
- Hair/cloth physics (Godot physics or VRM spring bones)
- Hair/cloth physics (VRM spring bones)
- Gesture animations on sentence breaks
- Settings UI (model, voice, backend config)
- System tray integration
- Multi-monitor awareness
- Multi-monitor awareness improvements
## Key Technical Decisions