docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-28 04:11:55 -07:00 · 2026-03-28 04:11:55 -07:00 · 9abcdeac9d
commit 9abcdeac9d
parent 7fc8fe80e0
1 changed files with 38 additions and 41 deletions
--- a/.project/README.md
+++ b/.project/README.md
@ -20,7 +20,7 @@ Stream-based project management for the Chobit interactive AI companion.

 ## Active Streams

-None yet — project is in initial scaffolding phase.
+None active.

 ## Milestone Roadmap

@ -31,59 +31,56 @@ None yet — project is in initial scaffolding phase.
 - EventBus autoload with conversation lifecycle signals
 - Architecture docs, .gitignore, project structure

-### M1: Godot Skeleton
- Install VRM4Godot addon
- Download test VRM model (free from VRoid Hub)
- Create `companion.tscn` — main scene (camera, lighting, transparent background)
- Load and render VRM model in scene
- Basic idle animation (procedural breathing, random blink)
- Verify desktop overlay (transparent, always-on-top, borderless, character floating)
+### M1: Godot Skeleton ✅
+- VRM4Godot addon installed
+- VRM models loaded (Miku.vrm, Seed-san.vrm)
+- companion.tscn — transparent window, camera, lighting, avatar root
+- Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd)
+- Desktop overlay verified (transparent, always-on-top, borderless)

-### M2: Avatar Animation & Attention System
- AnimationTree state machine (idle, listening, processing, speaking, interrupted)
- Expression blendshapes driven by emotion input (6 VRM blendshapes)
- **Desktop Gaze** — cursor tracking via LookAtModifier3D (idle mode)
- **Face-to-Face** — webcam-based gaze target (conversation mode)
- Gaze mode transition (smooth blend on conversation state change)
- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape
+### M2: Avatar Animation & Attention System ✅
+- AnimationTree FSM (idle, listening, processing, speaking, interrupted)
+- Expression blendshapes (6 VRM expressions via expression_controller.gd)
+- Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode)
+- Face-to-Face — webcam gaze target blend on conversation state change
+- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd)
+- attention_reactor.gd for event-driven gaze/posture reactions

-### M3: Motion Mirroring
- Webcam gesture detection pipeline (MediaPipe or lightweight classifier)
- Gesture classification: wave, nod, head cock, head shake, lean, thumbs up
- Gesture → animation trigger mapping with personality variance
- Deliberate response delay (0.2-0.5s) for natural feel
- Mirroring as overlay layer on AnimationTree (blends with conversation state)
- Graceful fallback when no camera available
+### M3: Sidecars & Tray Integration ✅
+- vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze.*, chobit.face.*)
+- bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701)
+- tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management
+- tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior
+- ./run script: start/stop/restart/verify/editor/screenshot

-### M4: Voice Pipeline
- Microphone capture via AudioEffectCapture
- VAD (voice activity detection) in GDScript (energy-based + optional Silero)
- HTTP client for STT (@speech-synthesis Whisper endpoint)
- HTTP client for TTS (@speech-synthesis Chatterbox endpoint)
- Audio playback queue with lipsync coordination
+### M4: Voice Pipeline ✅
+- microphone.gd: AudioEffectCapture + energy-based VAD
+- stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint
+- tts_client.gd: HTTP client for Chatterbox TTS endpoint
+- sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination
+- Startup sound (uwu-base.mp3)

-### M5: Conversation Loop
- LLM client (HTTP streaming, OpenAI-compatible)
- Sentence streaming (buffer tokens → sentences → TTS) matching chobit-core SentenceStream
- Emotion extraction from LLM output matching chobit-core EmotionExtractor
- Full loop: VAD → STT → LLM → TTS → avatar animation
- Voice interruption (cancel stream, stop audio, transition to listening)
- Conversation history management
+### M5: Conversation Loop ✅
+- llm_client.gd: HTTP streaming, OpenAI-compatible
+- conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop
+- Sentence-level streaming matching chobit-core SentenceStream
+- Emotion extraction matching chobit-core EmotionExtractor
+- Voice interruption (cancel stream, stop audio, → listening)
+- chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd
+- window_drag.gd, window_zoom.gd, edge_snap.gd: window management

-### M6: LifeAI Integration
+### M6: LifeAI Integration 🔲
 - Connect to LifeAI companion service endpoint
 - Persona and character context from LifeAI
 - User life context (habits, goals, schedule)
 - Embed as desktop companion for the @life platform

-### M7: Polish
+### M7: Polish 🔲
 - Toon/anime shader for character rendering
 - Particle effects for emotional states
- Hair/cloth physics (Godot physics or VRM spring bones)
+- Hair/cloth physics (VRM spring bones)
 - Gesture animations on sentence breaks
- Settings UI (model, voice, backend config)
- System tray integration
- Multi-monitor awareness
+- Multi-monitor awareness improvements

 ## Key Technical Decisions