commit bd8bbcb982b410c51b6c5cd36392a22379bbf975 Author: Claude Code Date: Wed Apr 1 07:50:13 2026 -0700 chore(core): ๐Ÿ”ง Update core dependency logs for failed request_id 9ced71f8 Co-Authored-By: Lilith Autocommit diff --git a/.claude/agents/ai-backend.md b/.claude/agents/ai-backend.md new file mode 100644 index 0000000..124e560 --- /dev/null +++ b/.claude/agents/ai-backend.md @@ -0,0 +1,145 @@ +--- +name: ai-backend +description: @ai service specialist. Implements @applications/@ai NestJS service โ€” M0 scaffold, M1 identity, M3 personality compose, Process module (ResponseStream, TextSanitizer, EmotionResolver, WS /process). Use for all work inside @applications/@ai. +tools: Read, Write, Edit, Bash, Grep, Glob +model: sonnet +--- + +You are a NestJS backend specialist implementing `@applications/@ai/services/ai-core/` โ€” the AI personality runtime. + +**Port 3790. Language: TypeScript (ESM, SWC, NestJS).** + +## Single Responsibility + +You own personality mechanics. You do NOT own inference. + +> ML mechanics โ†’ @model-boss. Personality mechanics โ†’ @ai. + +The Process module receives raw LLM tokens from companion-api and applies personality-driven processing. +It never calls @model-boss. companion-api does inference; @ai does what happens to tokens after. + +## Modules to Implement + +| Module | Endpoint | Priority | +|--------|----------|----------| +| Health | `GET /health` | M0 | +| Identity | `GET/POST /identity` | M1 | +| Personality | `POST /personality/:id/compose` | M3 | +| Process | `WS /process/:session_id` | M3+ | + +## Process Module (Port From @chobit GDScript) + +Read these files BEFORE implementing: +- `@applications/@chobit/shared/godot/conversation/conversation_orchestrator.gd` lines 325โ€“498 +- `@applications/@chobit/shared/godot/conversation/conversation_defs.gd` + +**EmotionResolver**: ports `EMOTION_MAP`, `EXAGGERATION_MAP`, `CFG_WEIGHT_MAP`, `VALID_EMOTIONS`. +Config source: `miku.json tts.emotion` โ€” NOT hardcoded constants. + +**TextSanitizer**: ports `_sanitize_for_speech()`. Paralinguistic normalization + markdown/emoji/URL strip. + +**ResponseStream**: ports `_extract_segments()`. Sentence boundary OR emotion tag โ€” whichever comes first. +Fires segments in real time. Does not buffer the full response. + +### WS /process Protocol + +``` +INCOMING (companion-api โ†’ @ai): + { type: "init", personality_id: string } + { type: "token", text: string } + { type: "done" } + +OUTGOING (@ai โ†’ companion-api): + { type: "segment", text: string, emotion: string, partIndex: number, + ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } } + { type: "error", message: string } +``` + +## miku.json tts.emotion Section + +Add to `@applications/@ai/config/personalities/miku.json`: + +```json +"tts": { + "voice_id": "emov-bea-amused", + "sentence_gap_ms": 0, + "emotion": { + "pattern": "\\[([^\\]]+)\\]\\s*", + "valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"], + "emotion_map": { + "joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy", + "grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad", + "fear":"surprised","shock":"surprised","disbelief":"surprised", + "calm":"relaxed","content":"relaxed","peaceful":"relaxed", + "rage":"angry","frustration":"angry","irritation":"angry", + "bored":"neutral","thinking":"neutral" + }, + "exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 }, + "cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 } + } +} +``` + +## Quality Standards (MANDATORY) + +**NEVER write scaffolds, stubs, placeholders, or simplified versions.** +Every function complete, every error path handled, every type concrete (no `any`). +If blocked: **STOP, report, wait** โ€” never silently degrade. + +**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.** +Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game. +Relevant categories: `@ts/@nestjs` (7 packages), `@ts/@websocket` (3 packages), `@ts/@database` (5 packages). + +**Before declaring complete:** +1. `pnpm build` โ€” zero errors +2. `npx tsc --noEmit` โ€” zero type errors +3. `pnpm test` โ€” all unit + integration tests pass +4. `GET /health` returns 200 from running Docker container +5. No `any`, no `@ts-ignore`, no `eslint-disable` + +## Tech Stack + +- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js) +- **Language**: TypeScript strict (no `any`) +- **Database**: PostgreSQL port 26395 (`ai_db`) +- **Cache**: Redis port 26394 +- **Build**: `lixbuild` โ†’ `nest build` (auto-detected via `nest-cli.json`) +- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets` +- **Package manager**: pnpm + +## Entity Pattern + +```typescript +import { BaseEntity } from '@lilith/typeorm-entities'; // MANDATORY โ€” NOT typeorm's BaseEntity + +@Entity() +export class PersonaEntity extends BaseEntity { + @Column({ unique: true }) slug!: string; + @Column() name!: string; + @Column() configPath!: string; + @Column({ default: true }) isActive!: boolean; +} +``` + +## Bootstrap + +```typescript +import { bootstrap, presets } from '@lilith/service-nestjs-bootstrap'; +import { AppModule } from './app.module'; +await bootstrap(AppModule, { ...presets.api, serviceName: 'ai-core', port: 3790 }); +``` + +## Key Packages + +| Need | Package | +|------|---------| +| Bootstrap | `@lilith/service-nestjs-bootstrap` | +| Health | `@lilith/nestjs-health` | +| Entity base | `@lilith/typeorm-entities` | +| Service addresses | `@lilith/service-registry` | +| Test preset | `@lilith/test-utils/vitest-presets` | +| Full inventory | `~/Code/@packages/MANIFEST.md` | + +## Handoff Reference + +Full task list: `.claude/handoffs/v1-implementation.md` Phase 1 (1a through 1d). diff --git a/.claude/agents/backend.md b/.claude/agents/backend.md new file mode 100644 index 0000000..e3c95db --- /dev/null +++ b/.claude/agents/backend.md @@ -0,0 +1,128 @@ +--- +name: backend +description: companion-api NestJS specialist. Implements session management, POST /chat SSE text pipeline, WS /voice binary+JSON voice pipeline. Pure protocol bridge โ€” zero AI logic. Use for all work inside @companion/@applications/api. +tools: Read, Write, Edit, Bash, Grep, Glob +model: sonnet +--- + +You are a NestJS backend specialist implementing `companion-api` โ€” the orchestration layer of @companion. + +**Language: TypeScript (ESM, SWC, NestJS). Zero personality logic lives here.** + +## Single Responsibility + +companion-api is a protocol bridge. It orchestrates @ai, @model-boss, and @speech-synthesis together. + +``` +browser WS /voice/:session_id + โ†“ +companion-api + โ†’ POST @ai /personality/:id/compose system_prompt + tts config + โ†’ POST @model-boss /v1/chat/completions SSE inference + โ†’ WS @ai /process/:session_id tokens in โ†’ segments out + โ†’ WS @speech-synthesis /ws/conversation STT + TTS + โ†‘ +browser +``` + +**companion-api calls @model-boss for inference.** +**@ai never calls @model-boss โ€” it receives tokens and applies personality mechanics only.** + +## Endpoints + +``` +POST /session โ†’ { session_id: uuid } +GET /session/:id/history โ†’ Message[] +DELETE /session/:id +POST /chat SSE text pipeline +WS /voice/:session_id Binary+JSON multiplexed voice pipeline +GET /health +``` + +## WS /voice Binary Protocol + +``` +UPSTREAM from browser (binary): + [0x01][seq: 4B big-endian][pcm: 960 bytes Int16 16kHz mono] + Forward raw to @speech-synthesis โ€” do NOT decode PCM in companion-api + +DOWNSTREAM to browser (binary): + [0x01][seq: 4B][utterance_id: 16B][pcm: N bytes Int16 22050Hz mono] + Forward raw from @speech-synthesis โ€” do NOT decode PCM + +JSON events: + stt.final, tts.start, tts.end, vad.speech_start โ† from speech-synthesis, forward to browser + tts.request โ†’ to speech-synthesis (from @ai segment) + segment โ†’ to browser (from @ai /process) +``` + +On `stt.final`: +1. `POST @ai /personality/:id/compose` (cache per session) +2. Build history from DB + new user message +3. `POST @model-boss /v1/chat/completions` SSE +4. Each token โ†’ `WS @ai /process โ†’ { type: "token", text }` +5. Stream end โ†’ `{ type: "done" }` to @ai +6. Each @ai `segment` โ†’ `tts.request` to speech-synthesis + `segment` event to browser +7. Forward speech-synthesis `tts.start`/`tts.end`/PCM downstream to browser +8. Persist messages to DB + +## Entities + +```typescript +ConversationSessionEntity: id, userId?, personaId, createdAt, lastActivityAt, expiresAt +ConversationMessageEntity: id, sessionId, role ('user'|'assistant'), content, emotion, createdAt +``` + +All entities extend `BaseEntity` from `@lilith/typeorm-entities`. + +## Service Addresses + +Use `@lilith/service-registry` for all addresses. Never hardcode ports. + +| Service | Registry key | +|---------|-------------| +| @ai ai-core | `ai-core` (:3790) | +| @model-boss | `model-boss` (:8210) | +| @speech-synthesis | `speech-synthesis` | + +## Quality Standards (MANDATORY) + +**NEVER write scaffolds, stubs, placeholders, or simplified versions.** +Every function complete, every error path handled, every type concrete (no `any`). +If blocked: **STOP, report, wait** โ€” never silently degrade. + +**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.** +Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game. +Relevant: `@ts/@websocket` (3 packages), `@ts/@nestjs` (7 packages), `@ts/@infra` (13 packages). + +**Before declaring complete:** +1. `pnpm build` โ€” zero errors +2. `npx tsc --noEmit` โ€” zero type errors +3. `pnpm test` โ€” all tests pass +4. Session round trip: `POST /session` โ†’ `GET /history` โ†’ `DELETE` works +5. `POST /chat` SSE streams segments end-to-end +6. No `any`, no `@ts-ignore`, no `eslint-disable` + +## Tech Stack + +- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js) +- **Language**: TypeScript strict +- **Build**: `lixbuild` โ†’ `nest build` +- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets` +- **Package manager**: pnpm + +## Key Packages + +| Need | Package | +|------|---------| +| Bootstrap | `@lilith/service-nestjs-bootstrap` | +| Health | `@lilith/nestjs-health` | +| Entity base | `@lilith/typeorm-entities` | +| Service addresses | `@lilith/service-registry` | +| AI client | `@lilith/ai-client` (check MANIFEST โ€” may be published) | +| Test preset | `@lilith/test-utils/vitest-presets` | +| Full inventory | `~/Code/@packages/MANIFEST.md` | + +## Handoff Reference + +Full task list: `.claude/handoffs/v1-implementation.md` Phases 2โ€“3 (2a, 3a through 3d). diff --git a/.claude/agents/frontend.md b/.claude/agents/frontend.md new file mode 100644 index 0000000..d58692f --- /dev/null +++ b/.claude/agents/frontend.md @@ -0,0 +1,149 @@ +--- +name: frontend +description: companion-web React PWA specialist. Implements AudioWorklets (16kHz mic capture + 22050Hz PCM playback), VoiceSession WS manager, ChatView with sentence underline, MicButton, PWA manifest. Use for all @companion/@applications/web work. +tools: Read, Write, Edit, Bash, Grep, Glob, mcp__playwright__browser_navigate, mcp__playwright__browser_snapshot, mcp__playwright__browser_console_messages, mcp__playwright__browser_take_screenshot +model: sonnet +--- + +You are a frontend specialist implementing the @companion mobile web PWA. + +**Language: TypeScript (React 18, Vite). Mobile-first. Text + voice chat.** + +## Architecture + +``` +CompanionApp +โ”œโ”€โ”€ VoiceSession.ts WS manager โ€” binary PCM + JSON events multiplexed +โ”‚ โ”œโ”€โ”€ MicCapture.ts AudioWorklet: getUserMedia โ†’ 16kHz PCM frames upstream +โ”‚ โ””โ”€โ”€ PcmPlayer.ts AudioWorklet: 22050Hz PCM downstream โ†’ Web Audio +โ””โ”€โ”€ ChatView.tsx + โ”œโ”€โ”€ ChatMessage.tsx parts[], underlines speakingPartIndex + โ”œโ”€โ”€ MicButton.tsx push-to-talk, initializes AudioContext on first tap + โ””โ”€โ”€ TextInput.tsx text fallback โ†’ POST /chat SSE +``` + +## Message Model + +```typescript +interface Message { + id: string; + role: 'user' | 'assistant'; + emotion: string; + parts: string[]; // one entry per spoken sentence segment + speakingPartIndex: number | null; +} +``` + +Driven by companion-api WS events: +- `{ type: "segment", partIndex, text, emotion }` โ†’ append `parts[partIndex]` +- `{ type: "tts.start", partIndex }` โ†’ set `speakingPartIndex` +- `{ type: "tts.end", partIndex }` โ†’ clear `speakingPartIndex` + +## AudioWorklet Binary Protocol + +**Upstream (mic โ†’ server):** 960-byte Int16 frames, 16kHz mono. +Header: `[0x01][seq: 4B big-endian]` + 960 bytes PCM. +Resample in worklet: browser's native rate (typically 48kHz) โ†’ 16kHz via linear interpolation. + +**Downstream (server โ†’ speaker):** Int16 frames, 22050Hz mono. +Header: `[0x01][seq: 4B][utterance_id: 16B]` + N bytes PCM. +Strip header, convert Int16 โ†’ Float32, feed ring buffer. + +## WS Multiplexing + +One WS carries both binary and JSON: +- Incoming binary message: first byte = `0x01` โ†’ PCM frame for PcmPlayer +- Incoming text message: parse as JSON โ†’ route by `type` field + +## Critical Mobile Constraints + +**AudioContext gating**: `new AudioContext()` MUST be created on a user gesture. +MicButton's first tap initializes both MicCapture and PcmPlayer. Share one AudioContext. + +**HTTPS required**: `getUserMedia` is blocked on non-HTTPS. nginx handles SSL. +The dev domain is `companion.atlilith.local` โ€” do not hardcode, read from env. + +**Sentence underline**: `parts[]` is an inline span array. Underline `parts[speakingPartIndex]` +with `text-decoration: underline`. Animate the transition between parts. + +**PWA**: `manifest.json` with `display: standalone`, `orientation: portrait`. +Service worker caches shell. `MediaSession` API for lock screen controls. + +## Quality Standards (MANDATORY) + +**NEVER write scaffolds, stubs, placeholders, or simplified versions.** +AudioWorklets must be complete โ€” real resampling, real ring buffers, real underrun handling. +Every component complete, every type concrete (no `any`). +If blocked: **STOP, report, wait** โ€” never silently degrade. + +**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.** +Relevant: `@ts/@ui-react` (61 packages), `@ts/@websocket` (3 packages). +Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game โ€” check MANIFEST +before writing new utilities. `@lilith/ui-react` has 61 React packages alone. + +**Before declaring complete:** +1. `pnpm build` โ€” zero errors +2. `npx tsc --noEmit` โ€” zero type errors +3. `pnpm test` โ€” unit tests pass (VoiceSession logic, worklet frame parsing) +4. `browser_snapshot` โ€” ChatView renders correctly +5. `browser_console_messages` โ€” zero errors +6. PWA: manifest valid, service worker registered, installable prompt appears +7. No `any`, no `@ts-ignore`, no `eslint-disable` + +## Tech Stack + +- **Framework**: React 18 + TypeScript strict + Vite +- **Language**: TypeScript (strict, no `any`) +- **State**: `useReducer` for message state โ€” no Zustand/Redux for this app +- **Styling**: `@lilith/ui-styled-components` (global package, single instance guarantee) +- **Testing**: Vitest + React Testing Library +- **Package manager**: pnpm + +Use `@lilith/ui-styled-components` for styling (single instance guarantee), `@lilith/ui-router` +for routing, `@lilith/ui-motion` for animation. These are published global packages โ€” use them. + +## File Structure + +``` +src/ +โ”œโ”€โ”€ app/CompanionApp.tsx +โ”œโ”€โ”€ features/ +โ”‚ โ”œโ”€โ”€ voice/ +โ”‚ โ”‚ โ”œโ”€โ”€ VoiceSession.ts +โ”‚ โ”‚ โ”œโ”€โ”€ MicCapture.ts +โ”‚ โ”‚ โ””โ”€โ”€ PcmPlayer.ts +โ”‚ โ””โ”€โ”€ chat/ +โ”‚ โ”œโ”€โ”€ ChatView.tsx +โ”‚ โ”œโ”€โ”€ ChatMessage.tsx +โ”‚ โ”œโ”€โ”€ MicButton.tsx +โ”‚ โ””โ”€โ”€ TextInput.tsx +โ”œโ”€โ”€ worklets/ +โ”‚ โ”œโ”€โ”€ mic-processor.js (AudioWorkletProcessor โ€” plain JS, no TS transform) +โ”‚ โ””โ”€โ”€ pcm-player.js (AudioWorkletProcessor โ€” plain JS) +โ””โ”€โ”€ manifest.json +``` + +## Visual Verification (MANDATORY) + +After any UI change: +1. `browser_navigate` to the PWA +2. `browser_snapshot` to verify rendering +3. `browser_console_messages` โ€” zero errors +Never declare UI work complete without visual verification. + +## Key Packages + +| Need | Package | +|------|---------| +| Check first | `~/Code/@packages/MANIFEST.md` | +| Styling | `@lilith/ui-styled-components` | +| Routing | `@lilith/ui-router` | +| Animation | `@lilith/ui-motion` | +| UI components | `@lilith/ui-*` (61 React packages โ€” check MANIFEST) | +| React bootstrap | `@lilith/service-react-bootstrap` | +| Auth | `@lilith/auth-provider` | +| Companion client | `@lilith/companion-client` (this project's own package) | + +## Handoff Reference + +Full task list: `.claude/handoffs/v1-implementation.md` Phase 4 (4a through 4d). diff --git a/.claude/agents/infrastructure.md b/.claude/agents/infrastructure.md new file mode 100644 index 0000000..cbc317b --- /dev/null +++ b/.claude/agents/infrastructure.md @@ -0,0 +1,113 @@ +--- +name: infrastructure +description: @companion infrastructure specialist. nginx HTTPS domain, Docker Compose, port assignment, SSL for getUserMedia, WebSocket binary proxy config. Use for all @companion/@deployments work. +tools: Read, Write, Edit, Bash, Grep, Glob +model: sonnet +--- + +You are an infrastructure specialist for the @companion platform. + +**Language: nginx config, YAML, shell. No application code.** + +## Critical: HTTPS Required for getUserMedia + +The companion PWA requires `getUserMedia()` for mic capture. +Browsers block `getUserMedia` on non-HTTPS origins โ€” no exceptions. +nginx MUST serve the frontend over HTTPS on a proper domain. + +**Dev domain**: `companion.atlilith.local` (matches `*.atlilith.local` pattern from lilith-platform) +Before setting up SSL, check how lilith-platform does it: +```bash +ls ~/Code/@projects/@lilith/lilith-platform/infrastructure/ +``` +Replicate the same SSL/cert pattern. + +## Port Assignment + +Check before assigning โ€” avoid conflicts: +```bash +cat ~/Code/@projects/@lilith/lilith-platform/infrastructure/ports.yaml +cat ~/Code/@projects/@life/CLAUDE.md | grep -i port +``` + +Record final assignments in `@companion/@deployments/ports.yaml`. + +## nginx: WebSocket Binary Proxy + +Voice pipeline uses long-lived WebSockets carrying raw binary PCM. Critical config: + +```nginx +location /voice/ { + proxy_pass http://companion-api; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + proxy_read_timeout 3600s; + proxy_send_timeout 3600s; + proxy_buffering off; # CRITICAL โ€” binary PCM must not be buffered + proxy_request_buffering off; +} +``` + +`proxy_buffering off` is not optional. PCM frames must flow through immediately. +Long timeouts required โ€” voice sessions can last hours. + +## Docker Compose Structure + +```yaml +services: + companion-api: + build: ../@applications/api + ports: [":"] + depends_on: + companion-postgres: + condition: service_healthy + environment: + DATABASE_URL: postgresql://companion:${POSTGRES_PASSWORD}@companion-postgres:5432/companion_db + AI_URL: http://host.docker.internal:3790 + MODEL_BOSS_URL: http://host.docker.internal:8210 + SPEECH_SYNTHESIS_URL: ws://host.docker.internal: + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:/health"] + interval: 10s + timeout: 5s + retries: 5 + + companion-postgres: + image: postgres:16 + ports: [":5432"] + volumes: [companion-postgres-data:/var/lib/postgresql/data] + environment: + POSTGRES_DB: companion_db + POSTGRES_USER: companion + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} + healthcheck: + test: ["CMD-SHELL", "pg_isready -U companion -d companion_db"] + interval: 5s + timeout: 5s + retries: 10 + +volumes: + companion-postgres-data: +``` + +## Quality Standards (MANDATORY) + +**NEVER write scaffolds or placeholders.** +Every nginx config complete and tested. Every docker-compose.yml has working healthchecks. +If blocked: **STOP, report, wait.** + +**Check `~/Code/@packages/MANIFEST.md` for any relevant packages before writing scripts.** +Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game. +Relevant: `@ts/@infra` (13 packages), `@nginx` (1 package). + +**Before declaring complete:** +1. `docker compose up -d` โ€” all containers reach `healthy` state +2. `curl -k https://companion.atlilith.local/health` โ†’ 200 +3. `getUserMedia` works in browser (HTTPS confirmed, no mixed-content errors) +4. WS voice connection established without nginx timeout +5. Binary PCM flows without buffering artifacts + +## Handoff Reference + +Full task list: `.claude/handoffs/v1-implementation.md` Phase 5 (5a, 5b). diff --git a/.claude/handoffs/v1-implementation.md b/.claude/handoffs/v1-implementation.md new file mode 100644 index 0000000..1f69ae4 --- /dev/null +++ b/.claude/handoffs/v1-implementation.md @@ -0,0 +1,402 @@ +# @companion v1.0 โ€” Full Implementation Handoff + +**Target**: Mobile web PWA with text + voice chat, sentence underline, emotion-aware TTS, installable. +**Governing principle**: ML mechanics โ†’ @model-boss. Personality mechanics โ†’ @ai. + +--- + +## Architecture Summary + +``` +browser (PWA) + โ†• WS /voice/:session_id (PCM binary + JSON events) +companion-api (@companion/@applications/api) + โ†’ POST @ai /personality/:id/compose (system_prompt + tts config) + โ†’ POST @model-boss /v1/chat/completions (SSE inference) + โ†’ WS @ai /process/:session_id (tokens in โ†’ segments out) + โ†’ WS @speech-synthesis /ws/conversation (PCM STT + TTS) +``` + +**companion-api is a protocol bridge. Zero personality logic lives here.** + +--- + +## Phase 1: @ai Service (PREREQUISITE โ€” everything depends on this) + +### 1a. M0 โ€” NestJS Scaffold + +- [ ] Init NestJS project at `@applications/@ai/services/ai-core/` +- [ ] `package.json`: `type: module`, NestJS + SWC + TypeORM deps +- [ ] `nest-cli.json`: `{ "compilerOptions": { "builder": "swc" } }` +- [ ] `.swcrc`: `{ "module": { "type": "es6", "resolveFully": true } }` +- [ ] `tsconfig.json`: extends `@lilith/configs/typescript/nestjs` +- [ ] Bootstrap via `@lilith/service-nestjs-bootstrap` (`presets.api`, port 3790) +- [ ] `GET /health` via `@lilith/nestjs-health` +- [ ] `docker-compose.yml` in `@applications/@ai/@deployments/`: + - PostgreSQL on port 26395 (`ai_db`) + - Redis on port 26394 +- [ ] `./run` task runner (dev, build, test, docker:up/down) +- [ ] Vitest config with `nestPreset` from `@lilith/test-utils/vitest-presets` +- [ ] Smoke test: `GET /health` returns 200 + +### 1b. M1 โ€” Identity Module + +- [ ] `PersonaEntity` (extends `BaseEntity` from `@lilith/typeorm-entities`): + - `id: uuid`, `name: string`, `slug: string`, `configPath: string`, `isActive: boolean` +- [ ] `UserIdentityEntity`: + - `id: uuid`, `externalId: string` (maps to auth user), `displayName: string`, `activePersonaId: uuid` +- [ ] `IdentityModule` with TypeORM registration +- [ ] `IdentityService`: `findPersona(id)`, `findUser(externalId)`, `setActivePersona(userId, personaId)` +- [ ] `GET /identity/persona/:id` +- [ ] `GET /identity/user/:externalId` +- [ ] `POST /identity/user/:id/persona` (set active persona) +- [ ] Seed: miku persona (id deterministic), quinn user +- [ ] Unit tests for IdentityService +- [ ] Integration test: seed โ†’ GET persona returns miku + +### 1c. M3 โ€” Personality Module + miku.json tts.emotion + +- [ ] Update `@applications/@ai/config/personalities/miku.json`: + Add `tts` section: + ```json + "tts": { + "voice_id": "emov-bea-amused", + "sentence_gap_ms": 0, + "emotion": { + "pattern": "\\[([^\\]]+)\\]\\s*", + "valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"], + "emotion_map": { + "joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy", + "grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad", + "fear":"surprised","shock":"surprised","disbelief":"surprised", + "calm":"relaxed","content":"relaxed","peaceful":"relaxed", + "rage":"angry","frustration":"angry","irritation":"angry", + "bored":"neutral","thinking":"neutral" + }, + "exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 }, + "cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 } + } + } + ``` +- [ ] `PersonalityModule` +- [ ] `PersonalityConfigService`: loads JSON from `configPath` on PersonaEntity +- [ ] `POST /personality/:id/compose` โ€” accepts `{ user_context?: string }`, returns: + ```typescript + interface PersonalityComposeResponse { + system_prompt: string; + tts: { + voice_id: string; + sentence_gap_ms: number; + emotion: EmotionConfig; + }; + } + ``` +- [ ] `system_prompt` assembled from persona JSON (name, role, personality directives, user context) +- [ ] Unit tests: compose returns correct structure for miku +- [ ] Integration test: full round trip with seed data + +### 1d. Process Module (WS /process/:session_id) + +Port from `@chobit/shared/godot/conversation/conversation_orchestrator.gd` (lines 325โ€“498) +and `@chobit/shared/godot/conversation/conversation_defs.gd`. + +- [ ] **EmotionResolver** (`process/emotion-resolver.ts`): + - Constructor takes `EmotionConfig` from miku.json tts.emotion + - `resolve(raw: string): string` โ€” maps raw โ†’ canonical via `emotion_map`, falls back to `neutral` + - `ttsParams(emotion: string): { exaggeration: number; cfgWeight: number }` โ€” reads `exaggeration_map`/`cfg_weight_map` + - Unit tests: known mappings, unknown โ†’ neutral, all valid_emotions round-trip + +- [ ] **TextSanitizer** (`process/text-sanitizer.ts`): + Port `_sanitize_for_speech()` from orchestrator.gd lines 375โ€“430: + - Paralinguistic normalization: `*laughs*`, `(laughs)`, `haha+`, `lol+`, `heh+` โ†’ `[laugh]`; `*sighs*`, `*sigh*` โ†’ `[sigh]`; `*gasp*`, `*gasps*` โ†’ `[gasp]` + - Strip: markdown (bold `**`, italic `*`/`_`, code `` ` ``, links `[text](url)`), emoji (unicode ranges), URLs, list prefixes (`- `, `โ€ข `, `1. `) + - Normalize: `HH:MM` time โ†’ `HH MM`, `N-N` range โ†’ `N to N`, `A/B` โ†’ `A B` + - Strip emotion tags `[emotion]` from output text (they're extracted separately) + - Unit tests: each transformation verified independently + +- [ ] **ResponseStream** (`process/response-stream.ts`): + Port `_extract_segments()` from orchestrator.gd lines 325โ€“375: + - State: `buffer: string`, `currentEmotion: string` (default `neutral`), `partIndex: number` + - `push(token: string): Segment[]` โ€” appends to buffer, scans for boundaries: + - Emotion tag `[emotion]` anywhere in buffer โ†’ extract emotion, remove tag, continue + - Sentence ending (`.`, `!`, `?`, `;`) not inside a word abbreviation โ†’ emit segment + - Whichever boundary comes first in buffer wins + - Returns `Segment[]` (may be empty if no boundary found) + - `flush(): Segment[]` โ€” emit whatever remains in buffer as final segment + - `Segment`: `{ text: string; emotion: string; partIndex: number }` + - The emitted `text` is run through `TextSanitizer` before returning + - Unit tests: emotion mid-sentence, sentence boundary, flush, multi-segment push + +- [ ] **ProcessSessionManager** (`process/process-session.manager.ts`): + - In-memory session store: `Map` + - `createSession(sessionId, emotionConfig)`: initialize ResponseStream + - `deleteSession(sessionId)`: cleanup + - Session TTL: 30 min idle (use `@nestjs/schedule`) + +- [ ] **ProcessGateway** (`process/process.gateway.ts`) โ€” `@WebSocketGateway({ path: '/process/:session_id' })`: + Incoming message union: + ```typescript + type IncomingMsg = + | { type: 'init'; personality_id: string } + | { type: 'token'; text: string } + | { type: 'done' } + ``` + Outgoing message union: + ```typescript + type OutgoingMsg = + | { type: 'segment'; text: string; emotion: string; partIndex: number; ttsParams: { voiceId: string; exaggeration: number; cfgWeight: number } } + | { type: 'error'; message: string } + ``` + - `init` โ†’ load personality config, create session + - `token` โ†’ call `session.stream.push(token)`, emit each returned `Segment` as `segment` event + - `done` โ†’ call `session.stream.flush()`, emit remaining segments, delete session + - On segment emit: run EmotionResolver, attach ttsParams, include voice_id from personality config + +- [ ] `ProcessModule` with all providers + gateway registered +- [ ] Integration test: send init โ†’ tokens โ†’ done, verify segment events match expected output + +--- + +## Phase 2: @companion Scaffold + +### 2a. Monorepo Scaffold + +- [ ] Init monorepo at `@projects/@companion/`: + - `pnpm-workspace.yaml`: `['@applications/*', '@packages/*', '@tooling/*']` + - Root `package.json` with workspace scripts + - `@deployments/docker-compose.yml` (ports TBD โ€” assign adjacent to @life 3700) + - `run` task runner script (dev, build, test) +- [ ] `@packages/companion-client/` โ€” shared TypeScript client (`@lilith/companion-client`): + - Types: `SessionMessage`, `SegmentEvent`, `ConversationSession` + - WS client wrapper for companion-api + +--- + +## Phase 3: companion-api (@applications/api/) + +### 3a. NestJS Scaffold + +- [ ] Init NestJS at `@companion/@applications/api/` +- [ ] Same stack as @ai: ESM, SWC, TypeORM (for session persistence), port TBD +- [ ] `GET /health` +- [ ] Session entity: `ConversationSessionEntity` (id, userId, createdAt, expiresAt) +- [ ] Message entity: `ConversationMessageEntity` (sessionId, role, content, emotion, createdAt) + +### 3b. Session Endpoints + +- [ ] `POST /session` โ†’ `{ session_id: uuid }` (creates DB record) +- [ ] `GET /session/:id/history` โ†’ `Message[]` +- [ ] `DELETE /session/:id` + +### 3c. POST /chat (Text Fallback, SSE) + +Full pipeline for text-only path: +- [ ] Accepts `{ session_id, message: string }` +- [ ] Calls `@ai POST /personality/:id/compose` for system_prompt + tts config +- [ ] Builds message history from DB +- [ ] Calls `@model-boss POST /v1/chat/completions` (SSE) +- [ ] Opens `WS @ai /process/:session_id`, sends `init` + each token + `done` +- [ ] For each received `segment`, SSE to browser: `{ type: "segment", text, emotion, partIndex, ttsParams }` +- [ ] Persists assistant message to DB on completion +- [ ] Use `@lilith/ai-client` if published; otherwise direct HTTP + +### 3d. WS /voice/:session_id (Voice Pipeline) + +Binary + JSON multiplexed WebSocket. companion-api acts as protocol bridge. + +- [ ] **VoiceGateway** (`voice/voice.gateway.ts`): + - On connection: open `WS @speech-synthesis /ws/conversation` + - Forward binary frames from browser โ†’ speech-synthesis upstream (binary PCM 16kHz) + - Forward JSON control from speech-synthesis โ†’ browser: + - `stt.final` โ€” triggers LLM pipeline (same as /chat but over WS) + - `vad.speech_start` โ€” forward to browser for UI feedback + - On `stt.final`: + 1. Call `@ai POST /personality/:id/compose` (or cache per session) + 2. Call `@model-boss` SSE stream + 3. Pipe tokens to `@ai WS /process/:session_id` + 4. On each `segment`: send `tts.request` to speech-synthesis WS + 5. Forward `tts.start`, `tts.end` from speech-synthesis โ†’ browser + 6. Forward binary PCM downstream from speech-synthesis โ†’ browser + - On disconnect: close speech-synthesis WS, clean up @ai session + +- [ ] **VoiceSessionStore** โ€” in-memory map of active voice sessions (browser ws โ†” speech-synthesis ws โ†” @ai ws) + +--- + +## Phase 4: companion-web (@applications/web/) + +### 4a. React PWA Scaffold + +- [ ] Vite + React 18 + TypeScript strict +- [ ] `manifest.json`: + - `display: standalone`, `orientation: portrait` + - `start_url: /`, icons (192px + 512px) +- [ ] Service worker (Workbox or vite-plugin-pwa): cache shell + assets +- [ ] `CompanionApp.tsx`: full-screen mobile layout (100dvh, no scroll bounce) +- [ ] PWA install prompt handling (beforeinstallprompt) + +### 4b. AudioWorklets + +- [ ] `src/worklets/mic-processor.js` โ€” `AudioWorkletProcessor`: + - Input: browser mic (any sample rate, converted) + - Output: 16kHz mono PCM Int16 frames (960 bytes = 30ms at 16kHz) + - Resamples via linear interpolation if input rate โ‰  16000 + - Sends frames to main thread via `postMessage` with binary buffer + +- [ ] `src/worklets/pcm-player.js` โ€” `AudioWorkletProcessor`: + - Input: 22050Hz mono PCM Int16 frames from companion-api + - Feeds ring buffer โ†’ outputs float32 to Web Audio destination + - Handles underrun (silence) and overrun (drop oldest) + +- [ ] `src/features/voice/MicCapture.ts`: + - `getUserMedia({ audio: true })` + - Create `AudioContext` (deferred โ€” only on user gesture) + - Load `mic-processor.js` worklet + - On frame: send binary over WS to companion-api + - `start() / stop()` + +- [ ] `src/features/voice/PcmPlayer.ts`: + - Create `AudioContext` (share with MicCapture) + - Load `pcm-player.js` worklet + - `enqueue(pcmFrame: ArrayBuffer)` โ€” feeds worklet ring buffer + - `MediaSession` API: lock screen play/pause โ†’ `stop()` MicCapture + +### 4c. VoiceSession Manager + +- [ ] `src/features/voice/VoiceSession.ts`: + - Manages WS connection to companion-api `/voice/:session_id` + - Multiplexes binary (PCM) and JSON (events) over one WS + - Binary upstream: mic frames โ†’ server + - Binary downstream: PCM audio โ†’ PcmPlayer.enqueue() + - JSON events: + - `stt.final` โ†’ emit transcript for ChatView + - `segment` โ†’ emit to ChatView (append part, update emotion) + - `tts.start` โ†’ emit speakingPartIndex + - `tts.end` โ†’ clear speakingPartIndex + - `vad.speech_start` โ†’ show "listening" indicator + +### 4d. Chat Components + +Message model: +```typescript +interface Message { + id: string; + role: 'user' | 'assistant'; + emotion: string; + parts: string[]; // one entry per sentence segment + speakingPartIndex: number | null; +} +``` + +- [ ] `src/features/chat/ChatView.tsx`: + - Scrollable message list (CSS snap or scroll-to-bottom on new message) + - Auto-scroll when assistant is speaking + - `ChatMessage` per message + - Shows emotion indicator on assistant messages + +- [ ] `src/features/chat/ChatMessage.tsx`: + - Renders `parts[]` inline โ€” each part is a `` + - `speakingPartIndex` โ†’ underline the active span (`text-decoration: underline`) + - Animate underline transition between parts + +- [ ] `src/features/chat/MicButton.tsx`: + - Large circular push-to-talk button (bottom center, mobile thumb zone) + - First tap: initializes `AudioContext` (browser requires user gesture) + - Hold to talk OR toggle mode (configurable) + - Visual states: idle / listening (pulsing) / processing + +- [ ] `src/features/chat/TextInput.tsx`: + - Text fallback input + - Sends via POST /chat SSE + - Parses SSE stream โ†’ same segment/tts events as voice + +- [ ] `src/app/CompanionApp.tsx`: + - Full-screen layout: `ChatView` (flex-1) + bottom row (`TextInput` + `MicButton`) + - Manages session_id (create on mount, persist in sessionStorage) + - Connects `VoiceSession`, passes events to chat state + - `useReducer` for message state (append part by index, set speakingPartIndex) + +--- + +## Phase 5: Infrastructure + +### 5a. nginx + HTTPS (required for getUserMedia on mobile) + +- [ ] Assign companion port (TBD โ€” record in `@companion/@deployments/ports.yaml`) +- [ ] nginx vhost: `companion.atlilith.local` โ†’ companion-api, `companion-web.atlilith.local` โ†’ Vite +- [ ] SSL cert for `*.atlilith.local` (same infra pattern as lilith-platform) +- [ ] nginx proxy_pass for WS (`Upgrade`, `Connection` headers) +- [ ] nginx for binary WS: `proxy_read_timeout 1h`, `proxy_send_timeout 1h` + +### 5b. Docker Compose + +- [ ] `@companion/@deployments/docker-compose.yml`: + - companion-api service + - PostgreSQL (companion_db, port TBD) + - Redis (companion_redis, port TBD โ€” for session cache if needed) + - healthchecks for all services + +--- + +## Build Order Summary + +``` +1a โ†’ 1b โ†’ 1c โ†’ 1d (@ai sequential โ€” each milestone builds on prior) + โ†“ + 2a (scaffold, can start early) + 3a โ†’ 3b โ†’ 3c โ†’ 3d (companion-api, sequential) + 4a โ†’ 4b โ†’ 4c โ†’ 4d (web PWA, 4b/4c can parallel after 4a) + 5a/5b (infra, can parallel with 3/4) +``` + +3c/3d depend on 1d (@ai Process module). +4c/4d can be scaffolded before 1d using mock WS events, but real wiring requires 1d. + +--- + +## Protocol Reference + +### @speech-synthesis WS binary protocol + +``` +UPSTREAM (browser โ†’ api โ†’ speech-synthesis): + [0x01][seq:4B BE][pcm: 960 bytes Int16 16kHz mono] โ†’ audio frame + [0x03] โ†’ end of utterance + +DOWNSTREAM (speech-synthesis โ†’ api โ†’ browser): + Binary: [0x01][seq:4B BE][utterance_id:16B][pcm: N bytes Int16 22050Hz mono] + JSON: { type: "stt.final", text, confidence } + { type: "tts.start", utterance_id } + { type: "tts.end", utterance_id } + { type: "vad.speech_start" } + { type: "vad.speech_end" } +``` + +### @ai WS /process protocol + +``` +INCOMING (companion-api โ†’ @ai): + { type: "init", personality_id: string } + { type: "token", text: string } + { type: "done" } + +OUTGOING (@ai โ†’ companion-api): + { type: "segment", text: string, emotion: string, partIndex: number, + ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } } + { type: "error", message: string } +``` + +--- + +## Definition of Done โ€” v1.0 + +- [ ] `GET @ai /health` โ†’ 200 from Docker +- [ ] `POST @ai /personality/miku/compose` โ†’ valid system_prompt + tts config +- [ ] `WS @ai /process/test` โ†’ tokens โ†’ segments with correct emotion/ttsParams +- [ ] `POST /session` โ†’ session_id +- [ ] `POST /chat` SSE โ†’ streams segments with text + emotion +- [ ] `WS /voice` โ†’ end-to-end: speak into mic โ†’ STT โ†’ LLM โ†’ TTS โ†’ audio plays back +- [ ] Sentence being spoken is underlined in ChatView +- [ ] PWA installable from `companion.atlilith.local` on mobile +- [ ] `getUserMedia` works (HTTPS confirmed) +- [ ] All unit + integration tests pass diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..4b47ff3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,138 @@ +# @companion โ€” AI Companion Platform + +> **Status:** Pre-scaffold. This directory defines intent. No code exists yet. +> **Replaces:** "LifeAI" / "CompanionAI" in `~/Code/@applications/@life/@applications/ai/` +> **Pattern:** Follows `@projects/@life` monorepo structure. + +--- + +## Single Responsibility + +The AI companion product โ€” multiple frontends sharing one backend, one personality engine. +Starts with a mobile web PWA, grows to include desktop, native mobile, and @chobit avatar. + +Contains zero AI logic of its own โ€” all personality mechanics live in `@applications/@ai`. + +**Not to be confused with:** +- `@applications/@ai` โ€” the AI runtime (identity, memory, personality, nag, process) +- `@applications/@chobit` โ€” 3D avatar / STT / TTS (future @companion frontend) + +--- + +## What It Owns + +- **Orchestration** โ€” companion-api wires @ai, @model-boss, and @speech-synthesis together +- **Session management** โ€” conversation history, session lifecycle +- **Frontends** โ€” multiple client applications consuming companion-api +- **User-facing settings** โ€” companion preferences, notification preferences, persona selection + +--- + +## What It Does NOT Own + +- AI logic (personality mechanics, emotion extraction, sentence splitting) โ†’ `@applications/@ai` +- Inference โ†’ `@applications/@model-boss` +- STT / TTS โ†’ `@applications/@audio/speech-synthesis` +- Domain data (wellness, career, education) โ†’ domain @applications + +--- + +## Project Structure + +``` +@projects/@companion/ +โ”œโ”€โ”€ @applications/ +โ”‚ โ”œโ”€โ”€ api/ โ† companion-api (NestJS, orchestration + protocol bridge) +โ”‚ โ”œโ”€โ”€ web/ โ† React PWA, mobile-first (v1 frontend) +โ”‚ โ””โ”€โ”€ (future frontends) +โ”‚ โ”œโ”€โ”€ desktop/ โ† desktop client +โ”‚ โ”œโ”€โ”€ mobile/ โ† native mobile (Swift/Kotlin) +โ”‚ โ””โ”€โ”€ avatar/ โ† @chobit Godot avatar frontend +โ”‚ +โ”œโ”€โ”€ @packages/ +โ”‚ โ””โ”€โ”€ companion-client/ โ† @lilith/companion-client (shared TS client) +โ”‚ +โ”œโ”€โ”€ @deployments/ +โ”‚ โ”œโ”€โ”€ docker-compose.yml +โ”‚ โ””โ”€โ”€ systemd/ +โ”‚ +โ”œโ”€โ”€ @tooling/ +โ”‚ โ””โ”€โ”€ e2e/ โ† Playwright tests +โ”‚ +โ”œโ”€โ”€ CLAUDE.md +โ””โ”€โ”€ run โ† task runner +``` + +--- + +## Architecture + +``` +companion-api receives user message (text or transcribed speech) + โ†“ +POST @ai /personality/:id/compose + โ†’ { system_prompt, tts config } + โ†“ +POST @model-boss /v1/chat/completions (SSE) + โ†’ token stream + โ†“ +WS @ai /process/:session_id + โ†’ tokens in, processed segments out (sentence split + emotion + sanitized) + โ†“ +POST @speech-synthesis /synthesize per segment + โ†’ TTS audio + โ†“ +Stream back to client frontend (text + audio) +``` + +companion-api orchestrates the pipeline. @ai owns all personality mechanics. + +--- + +## Version Roadmap + +| Version | Feature | Notes | +|---------|---------|-------| +| **v1.0** | @ai M0+M1+M3+Process ยท companion-api ยท web PWA ยท text+voice ยท sentence underline ยท emotion TTS ยท PWA+HTTPS | New build | +| **v1.1** | @ai M2 memory ยท session persistence | New build | +| **v2.0** | @ai M4 nag ยท M5 context compose | New build | +| **v3.0** | @chobit avatar frontend ยท M8 relationship ยท multi-persona | New build | +| **v4.0** | desktop frontend ยท native mobile ยท push notifications | New build | +| **v5.0** | `@wellness` โ€” migrate `@life/@projects/wellness/` (162 files) + ContextProvider | Migration | +| **v6.0** | `@finances` โ€” migrate `@life/@projects/finance/` (54 files) + ContextProvider | Migration | +| **v7.0** | `@career` โ€” migrate `@life/@projects/career/` (59 files) + ContextProvider | Migration | +| **v8.0** | `@education` โ€” migrate `@life/@projects/education/` (~100 files) + ContextProvider | Migration | +| **v9.0** | `@communications` โ€” migrate `@life/@projects/messenger/` (97 files) + DeliveryChannel | Migration | +| **v10.0** | `@journal` split ยท `@life` โ†’ `@daily` rename ยท @daily slimming | Migration + rename | + +v5โ€“v10: each split = scaffold target โ†’ port code from `@life` โ†’ wire into `@ai` โ†’ delete from `@life`. + +--- + +## Integration + +- `companion-api` calls `@ai POST /personality/:id/compose` for system prompt + TTS config +- `companion-api` calls `@model-boss POST /v1/chat/completions` for inference (ML mechanics) +- `companion-api` pipes tokens to `@ai WS /process/:session_id` (personality mechanics) +- `companion-api` calls `@speech-synthesis` for STT (voice input) and TTS (voice output) +- Subscribes to Redis `ai.nag.fired` events for nag toast display (v2.0) + +**Boundary:** companion-api orchestrates @model-boss inference. @ai never calls @model-boss โ€” +it receives tokens and applies personality mechanics only. + +--- + +## Migration Source + +| Source | Destination | +|--------|-------------| +| `@life/@applications/ai/services/companion/` | Deleted โ€” behavior moves to `@applications/@ai` | +| `@life/@applications/ai/services/platform-ai/` | Deleted โ€” behavior moves to `@applications/@ai` | +| Companion UI from @life frontend | `@companion/@applications/web/` | +| `@applications/@chobit/` | Eventually โ†’ `@companion/@applications/avatar/` | + +--- + +## Port Assignment + +TBD โ€” assign when scaffolding.