From 9abcdeac9d8e3232e347c694376341338294e5a1 Mon Sep 17 00:00:00 2001 From: Claude Code Date: Sat, 28 Mar 2026 04:11:55 -0700 Subject: [PATCH] =?UTF-8?q?docs(root-root):=20=F0=9F=93=9D=20Improve=20pro?= =?UTF-8?q?ject=20clarity=20with=20updated=20README.md=20documentation=20f?= =?UTF-8?q?or=20better=20onboarding?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .project/README.md | 79 ++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 41 deletions(-) diff --git a/.project/README.md b/.project/README.md index 92947de..5d67c3f 100644 --- a/.project/README.md +++ b/.project/README.md @@ -20,7 +20,7 @@ Stream-based project management for the Chobit interactive AI companion. ## Active Streams -None yet — project is in initial scaffolding phase. +None active. ## Milestone Roadmap @@ -31,59 +31,56 @@ None yet — project is in initial scaffolding phase. - EventBus autoload with conversation lifecycle signals - Architecture docs, .gitignore, project structure -### M1: Godot Skeleton -- Install VRM4Godot addon -- Download test VRM model (free from VRoid Hub) -- Create `companion.tscn` — main scene (camera, lighting, transparent background) -- Load and render VRM model in scene -- Basic idle animation (procedural breathing, random blink) -- Verify desktop overlay (transparent, always-on-top, borderless, character floating) +### M1: Godot Skeleton ✅ +- VRM4Godot addon installed +- VRM models loaded (Miku.vrm, Seed-san.vrm) +- companion.tscn — transparent window, camera, lighting, avatar root +- Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd) +- Desktop overlay verified (transparent, always-on-top, borderless) -### M2: Avatar Animation & Attention System -- AnimationTree state machine (idle, listening, processing, speaking, interrupted) -- Expression blendshapes driven by emotion input (6 VRM blendshapes) -- **Desktop Gaze** — cursor tracking via LookAtModifier3D (idle mode) -- **Face-to-Face** — webcam-based gaze target (conversation mode) -- Gaze mode transition (smooth blend on conversation state change) -- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape +### M2: Avatar Animation & Attention System ✅ +- AnimationTree FSM (idle, listening, processing, speaking, interrupted) +- Expression blendshapes (6 VRM expressions via expression_controller.gd) +- Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode) +- Face-to-Face — webcam gaze target blend on conversation state change +- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd) +- attention_reactor.gd for event-driven gaze/posture reactions -### M3: Motion Mirroring -- Webcam gesture detection pipeline (MediaPipe or lightweight classifier) -- Gesture classification: wave, nod, head cock, head shake, lean, thumbs up -- Gesture → animation trigger mapping with personality variance -- Deliberate response delay (0.2-0.5s) for natural feel -- Mirroring as overlay layer on AnimationTree (blends with conversation state) -- Graceful fallback when no camera available +### M3: Sidecars & Tray Integration ✅ +- vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze.*, chobit.face.*) +- bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701) +- tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management +- tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior +- ./run script: start/stop/restart/verify/editor/screenshot -### M4: Voice Pipeline -- Microphone capture via AudioEffectCapture -- VAD (voice activity detection) in GDScript (energy-based + optional Silero) -- HTTP client for STT (@speech-synthesis Whisper endpoint) -- HTTP client for TTS (@speech-synthesis Chatterbox endpoint) -- Audio playback queue with lipsync coordination +### M4: Voice Pipeline ✅ +- microphone.gd: AudioEffectCapture + energy-based VAD +- stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint +- tts_client.gd: HTTP client for Chatterbox TTS endpoint +- sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination +- Startup sound (uwu-base.mp3) -### M5: Conversation Loop -- LLM client (HTTP streaming, OpenAI-compatible) -- Sentence streaming (buffer tokens → sentences → TTS) matching chobit-core SentenceStream -- Emotion extraction from LLM output matching chobit-core EmotionExtractor -- Full loop: VAD → STT → LLM → TTS → avatar animation -- Voice interruption (cancel stream, stop audio, transition to listening) -- Conversation history management +### M5: Conversation Loop ✅ +- llm_client.gd: HTTP streaming, OpenAI-compatible +- conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop +- Sentence-level streaming matching chobit-core SentenceStream +- Emotion extraction matching chobit-core EmotionExtractor +- Voice interruption (cancel stream, stop audio, → listening) +- chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd +- window_drag.gd, window_zoom.gd, edge_snap.gd: window management -### M6: LifeAI Integration +### M6: LifeAI Integration 🔲 - Connect to LifeAI companion service endpoint - Persona and character context from LifeAI - User life context (habits, goals, schedule) - Embed as desktop companion for the @life platform -### M7: Polish +### M7: Polish 🔲 - Toon/anime shader for character rendering - Particle effects for emotional states -- Hair/cloth physics (Godot physics or VRM spring bones) +- Hair/cloth physics (VRM spring bones) - Gesture animations on sentence breaks -- Settings UI (model, voice, backend config) -- System tray integration -- Multi-monitor awareness +- Multi-monitor awareness improvements ## Key Technical Decisions