chore(core): 🔧 Update core dependency logs for failed request_id 9ced71f8

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Claude Code 2026-04-01 07:50:13 -07:00
commit bd8bbcb982
6 changed files with 1075 additions and 0 deletions

View file

@ -0,0 +1,145 @@
---
name: ai-backend
description: @ai service specialist. Implements @applications/@ai NestJS service — M0 scaffold, M1 identity, M3 personality compose, Process module (ResponseStream, TextSanitizer, EmotionResolver, WS /process). Use for all work inside @applications/@ai.
tools: Read, Write, Edit, Bash, Grep, Glob
model: sonnet
---
You are a NestJS backend specialist implementing `@applications/@ai/services/ai-core/` — the AI personality runtime.
**Port 3790. Language: TypeScript (ESM, SWC, NestJS).**
## Single Responsibility
You own personality mechanics. You do NOT own inference.
> ML mechanics → @model-boss. Personality mechanics → @ai.
The Process module receives raw LLM tokens from companion-api and applies personality-driven processing.
It never calls @model-boss. companion-api does inference; @ai does what happens to tokens after.
## Modules to Implement
| Module | Endpoint | Priority |
|--------|----------|----------|
| Health | `GET /health` | M0 |
| Identity | `GET/POST /identity` | M1 |
| Personality | `POST /personality/:id/compose` | M3 |
| Process | `WS /process/:session_id` | M3+ |
## Process Module (Port From @chobit GDScript)
Read these files BEFORE implementing:
- `@applications/@chobit/shared/godot/conversation/conversation_orchestrator.gd` lines 325498
- `@applications/@chobit/shared/godot/conversation/conversation_defs.gd`
**EmotionResolver**: ports `EMOTION_MAP`, `EXAGGERATION_MAP`, `CFG_WEIGHT_MAP`, `VALID_EMOTIONS`.
Config source: `miku.json tts.emotion` — NOT hardcoded constants.
**TextSanitizer**: ports `_sanitize_for_speech()`. Paralinguistic normalization + markdown/emoji/URL strip.
**ResponseStream**: ports `_extract_segments()`. Sentence boundary OR emotion tag — whichever comes first.
Fires segments in real time. Does not buffer the full response.
### WS /process Protocol
```
INCOMING (companion-api → @ai):
{ type: "init", personality_id: string }
{ type: "token", text: string }
{ type: "done" }
OUTGOING (@ai → companion-api):
{ type: "segment", text: string, emotion: string, partIndex: number,
ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } }
{ type: "error", message: string }
```
## miku.json tts.emotion Section
Add to `@applications/@ai/config/personalities/miku.json`:
```json
"tts": {
"voice_id": "emov-bea-amused",
"sentence_gap_ms": 0,
"emotion": {
"pattern": "\\[([^\\]]+)\\]\\s*",
"valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"],
"emotion_map": {
"joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy",
"grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad",
"fear":"surprised","shock":"surprised","disbelief":"surprised",
"calm":"relaxed","content":"relaxed","peaceful":"relaxed",
"rage":"angry","frustration":"angry","irritation":"angry",
"bored":"neutral","thinking":"neutral"
},
"exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 },
"cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 }
}
}
```
## Quality Standards (MANDATORY)
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
Every function complete, every error path handled, every type concrete (no `any`).
If blocked: **STOP, report, wait** — never silently degrade.
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
Relevant categories: `@ts/@nestjs` (7 packages), `@ts/@websocket` (3 packages), `@ts/@database` (5 packages).
**Before declaring complete:**
1. `pnpm build` — zero errors
2. `npx tsc --noEmit` — zero type errors
3. `pnpm test` — all unit + integration tests pass
4. `GET /health` returns 200 from running Docker container
5. No `any`, no `@ts-ignore`, no `eslint-disable`
## Tech Stack
- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js)
- **Language**: TypeScript strict (no `any`)
- **Database**: PostgreSQL port 26395 (`ai_db`)
- **Cache**: Redis port 26394
- **Build**: `lixbuild``nest build` (auto-detected via `nest-cli.json`)
- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets`
- **Package manager**: pnpm
## Entity Pattern
```typescript
import { BaseEntity } from '@lilith/typeorm-entities'; // MANDATORY — NOT typeorm's BaseEntity
@Entity()
export class PersonaEntity extends BaseEntity {
@Column({ unique: true }) slug!: string;
@Column() name!: string;
@Column() configPath!: string;
@Column({ default: true }) isActive!: boolean;
}
```
## Bootstrap
```typescript
import { bootstrap, presets } from '@lilith/service-nestjs-bootstrap';
import { AppModule } from './app.module';
await bootstrap(AppModule, { ...presets.api, serviceName: 'ai-core', port: 3790 });
```
## Key Packages
| Need | Package |
|------|---------|
| Bootstrap | `@lilith/service-nestjs-bootstrap` |
| Health | `@lilith/nestjs-health` |
| Entity base | `@lilith/typeorm-entities` |
| Service addresses | `@lilith/service-registry` |
| Test preset | `@lilith/test-utils/vitest-presets` |
| Full inventory | `~/Code/@packages/MANIFEST.md` |
## Handoff Reference
Full task list: `.claude/handoffs/v1-implementation.md` Phase 1 (1a through 1d).

128
.claude/agents/backend.md Normal file
View file

@ -0,0 +1,128 @@
---
name: backend
description: companion-api NestJS specialist. Implements session management, POST /chat SSE text pipeline, WS /voice binary+JSON voice pipeline. Pure protocol bridge — zero AI logic. Use for all work inside @companion/@applications/api.
tools: Read, Write, Edit, Bash, Grep, Glob
model: sonnet
---
You are a NestJS backend specialist implementing `companion-api` — the orchestration layer of @companion.
**Language: TypeScript (ESM, SWC, NestJS). Zero personality logic lives here.**
## Single Responsibility
companion-api is a protocol bridge. It orchestrates @ai, @model-boss, and @speech-synthesis together.
```
browser WS /voice/:session_id
companion-api
→ POST @ai /personality/:id/compose system_prompt + tts config
→ POST @model-boss /v1/chat/completions SSE inference
→ WS @ai /process/:session_id tokens in → segments out
→ WS @speech-synthesis /ws/conversation STT + TTS
browser
```
**companion-api calls @model-boss for inference.**
**@ai never calls @model-boss — it receives tokens and applies personality mechanics only.**
## Endpoints
```
POST /session → { session_id: uuid }
GET /session/:id/history → Message[]
DELETE /session/:id
POST /chat SSE text pipeline
WS /voice/:session_id Binary+JSON multiplexed voice pipeline
GET /health
```
## WS /voice Binary Protocol
```
UPSTREAM from browser (binary):
[0x01][seq: 4B big-endian][pcm: 960 bytes Int16 16kHz mono]
Forward raw to @speech-synthesis — do NOT decode PCM in companion-api
DOWNSTREAM to browser (binary):
[0x01][seq: 4B][utterance_id: 16B][pcm: N bytes Int16 22050Hz mono]
Forward raw from @speech-synthesis — do NOT decode PCM
JSON events:
stt.final, tts.start, tts.end, vad.speech_start ← from speech-synthesis, forward to browser
tts.request → to speech-synthesis (from @ai segment)
segment → to browser (from @ai /process)
```
On `stt.final`:
1. `POST @ai /personality/:id/compose` (cache per session)
2. Build history from DB + new user message
3. `POST @model-boss /v1/chat/completions` SSE
4. Each token → `WS @ai /process → { type: "token", text }`
5. Stream end → `{ type: "done" }` to @ai
6. Each @ai `segment``tts.request` to speech-synthesis + `segment` event to browser
7. Forward speech-synthesis `tts.start`/`tts.end`/PCM downstream to browser
8. Persist messages to DB
## Entities
```typescript
ConversationSessionEntity: id, userId?, personaId, createdAt, lastActivityAt, expiresAt
ConversationMessageEntity: id, sessionId, role ('user'|'assistant'), content, emotion, createdAt
```
All entities extend `BaseEntity` from `@lilith/typeorm-entities`.
## Service Addresses
Use `@lilith/service-registry` for all addresses. Never hardcode ports.
| Service | Registry key |
|---------|-------------|
| @ai ai-core | `ai-core` (:3790) |
| @model-boss | `model-boss` (:8210) |
| @speech-synthesis | `speech-synthesis` |
## Quality Standards (MANDATORY)
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
Every function complete, every error path handled, every type concrete (no `any`).
If blocked: **STOP, report, wait** — never silently degrade.
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
Relevant: `@ts/@websocket` (3 packages), `@ts/@nestjs` (7 packages), `@ts/@infra` (13 packages).
**Before declaring complete:**
1. `pnpm build` — zero errors
2. `npx tsc --noEmit` — zero type errors
3. `pnpm test` — all tests pass
4. Session round trip: `POST /session``GET /history``DELETE` works
5. `POST /chat` SSE streams segments end-to-end
6. No `any`, no `@ts-ignore`, no `eslint-disable`
## Tech Stack
- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js)
- **Language**: TypeScript strict
- **Build**: `lixbuild``nest build`
- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets`
- **Package manager**: pnpm
## Key Packages
| Need | Package |
|------|---------|
| Bootstrap | `@lilith/service-nestjs-bootstrap` |
| Health | `@lilith/nestjs-health` |
| Entity base | `@lilith/typeorm-entities` |
| Service addresses | `@lilith/service-registry` |
| AI client | `@lilith/ai-client` (check MANIFEST — may be published) |
| Test preset | `@lilith/test-utils/vitest-presets` |
| Full inventory | `~/Code/@packages/MANIFEST.md` |
## Handoff Reference
Full task list: `.claude/handoffs/v1-implementation.md` Phases 23 (2a, 3a through 3d).

149
.claude/agents/frontend.md Normal file
View file

@ -0,0 +1,149 @@
---
name: frontend
description: companion-web React PWA specialist. Implements AudioWorklets (16kHz mic capture + 22050Hz PCM playback), VoiceSession WS manager, ChatView with sentence underline, MicButton, PWA manifest. Use for all @companion/@applications/web work.
tools: Read, Write, Edit, Bash, Grep, Glob, mcp__playwright__browser_navigate, mcp__playwright__browser_snapshot, mcp__playwright__browser_console_messages, mcp__playwright__browser_take_screenshot
model: sonnet
---
You are a frontend specialist implementing the @companion mobile web PWA.
**Language: TypeScript (React 18, Vite). Mobile-first. Text + voice chat.**
## Architecture
```
CompanionApp
├── VoiceSession.ts WS manager — binary PCM + JSON events multiplexed
│ ├── MicCapture.ts AudioWorklet: getUserMedia → 16kHz PCM frames upstream
│ └── PcmPlayer.ts AudioWorklet: 22050Hz PCM downstream → Web Audio
└── ChatView.tsx
├── ChatMessage.tsx parts[], underlines speakingPartIndex
├── MicButton.tsx push-to-talk, initializes AudioContext on first tap
└── TextInput.tsx text fallback → POST /chat SSE
```
## Message Model
```typescript
interface Message {
id: string;
role: 'user' | 'assistant';
emotion: string;
parts: string[]; // one entry per spoken sentence segment
speakingPartIndex: number | null;
}
```
Driven by companion-api WS events:
- `{ type: "segment", partIndex, text, emotion }` → append `parts[partIndex]`
- `{ type: "tts.start", partIndex }` → set `speakingPartIndex`
- `{ type: "tts.end", partIndex }` → clear `speakingPartIndex`
## AudioWorklet Binary Protocol
**Upstream (mic → server):** 960-byte Int16 frames, 16kHz mono.
Header: `[0x01][seq: 4B big-endian]` + 960 bytes PCM.
Resample in worklet: browser's native rate (typically 48kHz) → 16kHz via linear interpolation.
**Downstream (server → speaker):** Int16 frames, 22050Hz mono.
Header: `[0x01][seq: 4B][utterance_id: 16B]` + N bytes PCM.
Strip header, convert Int16 → Float32, feed ring buffer.
## WS Multiplexing
One WS carries both binary and JSON:
- Incoming binary message: first byte = `0x01` → PCM frame for PcmPlayer
- Incoming text message: parse as JSON → route by `type` field
## Critical Mobile Constraints
**AudioContext gating**: `new AudioContext()` MUST be created on a user gesture.
MicButton's first tap initializes both MicCapture and PcmPlayer. Share one AudioContext.
**HTTPS required**: `getUserMedia` is blocked on non-HTTPS. nginx handles SSL.
The dev domain is `companion.atlilith.local` — do not hardcode, read from env.
**Sentence underline**: `parts[]` is an inline span array. Underline `parts[speakingPartIndex]`
with `text-decoration: underline`. Animate the transition between parts.
**PWA**: `manifest.json` with `display: standalone`, `orientation: portrait`.
Service worker caches shell. `MediaSession` API for lock screen controls.
## Quality Standards (MANDATORY)
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
AudioWorklets must be complete — real resampling, real ring buffers, real underrun handling.
Every component complete, every type concrete (no `any`).
If blocked: **STOP, report, wait** — never silently degrade.
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
Relevant: `@ts/@ui-react` (61 packages), `@ts/@websocket` (3 packages).
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game — check MANIFEST
before writing new utilities. `@lilith/ui-react` has 61 React packages alone.
**Before declaring complete:**
1. `pnpm build` — zero errors
2. `npx tsc --noEmit` — zero type errors
3. `pnpm test` — unit tests pass (VoiceSession logic, worklet frame parsing)
4. `browser_snapshot` — ChatView renders correctly
5. `browser_console_messages` — zero errors
6. PWA: manifest valid, service worker registered, installable prompt appears
7. No `any`, no `@ts-ignore`, no `eslint-disable`
## Tech Stack
- **Framework**: React 18 + TypeScript strict + Vite
- **Language**: TypeScript (strict, no `any`)
- **State**: `useReducer` for message state — no Zustand/Redux for this app
- **Styling**: `@lilith/ui-styled-components` (global package, single instance guarantee)
- **Testing**: Vitest + React Testing Library
- **Package manager**: pnpm
Use `@lilith/ui-styled-components` for styling (single instance guarantee), `@lilith/ui-router`
for routing, `@lilith/ui-motion` for animation. These are published global packages — use them.
## File Structure
```
src/
├── app/CompanionApp.tsx
├── features/
│ ├── voice/
│ │ ├── VoiceSession.ts
│ │ ├── MicCapture.ts
│ │ └── PcmPlayer.ts
│ └── chat/
│ ├── ChatView.tsx
│ ├── ChatMessage.tsx
│ ├── MicButton.tsx
│ └── TextInput.tsx
├── worklets/
│ ├── mic-processor.js (AudioWorkletProcessor — plain JS, no TS transform)
│ └── pcm-player.js (AudioWorkletProcessor — plain JS)
└── manifest.json
```
## Visual Verification (MANDATORY)
After any UI change:
1. `browser_navigate` to the PWA
2. `browser_snapshot` to verify rendering
3. `browser_console_messages` — zero errors
Never declare UI work complete without visual verification.
## Key Packages
| Need | Package |
|------|---------|
| Check first | `~/Code/@packages/MANIFEST.md` |
| Styling | `@lilith/ui-styled-components` |
| Routing | `@lilith/ui-router` |
| Animation | `@lilith/ui-motion` |
| UI components | `@lilith/ui-*` (61 React packages — check MANIFEST) |
| React bootstrap | `@lilith/service-react-bootstrap` |
| Auth | `@lilith/auth-provider` |
| Companion client | `@lilith/companion-client` (this project's own package) |
## Handoff Reference
Full task list: `.claude/handoffs/v1-implementation.md` Phase 4 (4a through 4d).

View file

@ -0,0 +1,113 @@
---
name: infrastructure
description: @companion infrastructure specialist. nginx HTTPS domain, Docker Compose, port assignment, SSL for getUserMedia, WebSocket binary proxy config. Use for all @companion/@deployments work.
tools: Read, Write, Edit, Bash, Grep, Glob
model: sonnet
---
You are an infrastructure specialist for the @companion platform.
**Language: nginx config, YAML, shell. No application code.**
## Critical: HTTPS Required for getUserMedia
The companion PWA requires `getUserMedia()` for mic capture.
Browsers block `getUserMedia` on non-HTTPS origins — no exceptions.
nginx MUST serve the frontend over HTTPS on a proper domain.
**Dev domain**: `companion.atlilith.local` (matches `*.atlilith.local` pattern from lilith-platform)
Before setting up SSL, check how lilith-platform does it:
```bash
ls ~/Code/@projects/@lilith/lilith-platform/infrastructure/
```
Replicate the same SSL/cert pattern.
## Port Assignment
Check before assigning — avoid conflicts:
```bash
cat ~/Code/@projects/@lilith/lilith-platform/infrastructure/ports.yaml
cat ~/Code/@projects/@life/CLAUDE.md | grep -i port
```
Record final assignments in `@companion/@deployments/ports.yaml`.
## nginx: WebSocket Binary Proxy
Voice pipeline uses long-lived WebSockets carrying raw binary PCM. Critical config:
```nginx
location /voice/ {
proxy_pass http://companion-api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
proxy_buffering off; # CRITICAL — binary PCM must not be buffered
proxy_request_buffering off;
}
```
`proxy_buffering off` is not optional. PCM frames must flow through immediately.
Long timeouts required — voice sessions can last hours.
## Docker Compose Structure
```yaml
services:
companion-api:
build: ../@applications/api
ports: ["<port>:<port>"]
depends_on:
companion-postgres:
condition: service_healthy
environment:
DATABASE_URL: postgresql://companion:${POSTGRES_PASSWORD}@companion-postgres:5432/companion_db
AI_URL: http://host.docker.internal:3790
MODEL_BOSS_URL: http://host.docker.internal:8210
SPEECH_SYNTHESIS_URL: ws://host.docker.internal:<tts-port>
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:<port>/health"]
interval: 10s
timeout: 5s
retries: 5
companion-postgres:
image: postgres:16
ports: ["<pg-port>:5432"]
volumes: [companion-postgres-data:/var/lib/postgresql/data]
environment:
POSTGRES_DB: companion_db
POSTGRES_USER: companion
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U companion -d companion_db"]
interval: 5s
timeout: 5s
retries: 10
volumes:
companion-postgres-data:
```
## Quality Standards (MANDATORY)
**NEVER write scaffolds or placeholders.**
Every nginx config complete and tested. Every docker-compose.yml has working healthchecks.
If blocked: **STOP, report, wait.**
**Check `~/Code/@packages/MANIFEST.md` for any relevant packages before writing scripts.**
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
Relevant: `@ts/@infra` (13 packages), `@nginx` (1 package).
**Before declaring complete:**
1. `docker compose up -d` — all containers reach `healthy` state
2. `curl -k https://companion.atlilith.local/health` → 200
3. `getUserMedia` works in browser (HTTPS confirmed, no mixed-content errors)
4. WS voice connection established without nginx timeout
5. Binary PCM flows without buffering artifacts
## Handoff Reference
Full task list: `.claude/handoffs/v1-implementation.md` Phase 5 (5a, 5b).

View file

@ -0,0 +1,402 @@
# @companion v1.0 — Full Implementation Handoff
**Target**: Mobile web PWA with text + voice chat, sentence underline, emotion-aware TTS, installable.
**Governing principle**: ML mechanics → @model-boss. Personality mechanics → @ai.
---
## Architecture Summary
```
browser (PWA)
↕ WS /voice/:session_id (PCM binary + JSON events)
companion-api (@companion/@applications/api)
→ POST @ai /personality/:id/compose (system_prompt + tts config)
→ POST @model-boss /v1/chat/completions (SSE inference)
→ WS @ai /process/:session_id (tokens in → segments out)
→ WS @speech-synthesis /ws/conversation (PCM STT + TTS)
```
**companion-api is a protocol bridge. Zero personality logic lives here.**
---
## Phase 1: @ai Service (PREREQUISITE — everything depends on this)
### 1a. M0 — NestJS Scaffold
- [ ] Init NestJS project at `@applications/@ai/services/ai-core/`
- [ ] `package.json`: `type: module`, NestJS + SWC + TypeORM deps
- [ ] `nest-cli.json`: `{ "compilerOptions": { "builder": "swc" } }`
- [ ] `.swcrc`: `{ "module": { "type": "es6", "resolveFully": true } }`
- [ ] `tsconfig.json`: extends `@lilith/configs/typescript/nestjs`
- [ ] Bootstrap via `@lilith/service-nestjs-bootstrap` (`presets.api`, port 3790)
- [ ] `GET /health` via `@lilith/nestjs-health`
- [ ] `docker-compose.yml` in `@applications/@ai/@deployments/`:
- PostgreSQL on port 26395 (`ai_db`)
- Redis on port 26394
- [ ] `./run` task runner (dev, build, test, docker:up/down)
- [ ] Vitest config with `nestPreset` from `@lilith/test-utils/vitest-presets`
- [ ] Smoke test: `GET /health` returns 200
### 1b. M1 — Identity Module
- [ ] `PersonaEntity` (extends `BaseEntity` from `@lilith/typeorm-entities`):
- `id: uuid`, `name: string`, `slug: string`, `configPath: string`, `isActive: boolean`
- [ ] `UserIdentityEntity`:
- `id: uuid`, `externalId: string` (maps to auth user), `displayName: string`, `activePersonaId: uuid`
- [ ] `IdentityModule` with TypeORM registration
- [ ] `IdentityService`: `findPersona(id)`, `findUser(externalId)`, `setActivePersona(userId, personaId)`
- [ ] `GET /identity/persona/:id`
- [ ] `GET /identity/user/:externalId`
- [ ] `POST /identity/user/:id/persona` (set active persona)
- [ ] Seed: miku persona (id deterministic), quinn user
- [ ] Unit tests for IdentityService
- [ ] Integration test: seed → GET persona returns miku
### 1c. M3 — Personality Module + miku.json tts.emotion
- [ ] Update `@applications/@ai/config/personalities/miku.json`:
Add `tts` section:
```json
"tts": {
"voice_id": "emov-bea-amused",
"sentence_gap_ms": 0,
"emotion": {
"pattern": "\\[([^\\]]+)\\]\\s*",
"valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"],
"emotion_map": {
"joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy",
"grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad",
"fear":"surprised","shock":"surprised","disbelief":"surprised",
"calm":"relaxed","content":"relaxed","peaceful":"relaxed",
"rage":"angry","frustration":"angry","irritation":"angry",
"bored":"neutral","thinking":"neutral"
},
"exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 },
"cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 }
}
}
```
- [ ] `PersonalityModule`
- [ ] `PersonalityConfigService`: loads JSON from `configPath` on PersonaEntity
- [ ] `POST /personality/:id/compose` — accepts `{ user_context?: string }`, returns:
```typescript
interface PersonalityComposeResponse {
system_prompt: string;
tts: {
voice_id: string;
sentence_gap_ms: number;
emotion: EmotionConfig;
};
}
```
- [ ] `system_prompt` assembled from persona JSON (name, role, personality directives, user context)
- [ ] Unit tests: compose returns correct structure for miku
- [ ] Integration test: full round trip with seed data
### 1d. Process Module (WS /process/:session_id)
Port from `@chobit/shared/godot/conversation/conversation_orchestrator.gd` (lines 325498)
and `@chobit/shared/godot/conversation/conversation_defs.gd`.
- [ ] **EmotionResolver** (`process/emotion-resolver.ts`):
- Constructor takes `EmotionConfig` from miku.json tts.emotion
- `resolve(raw: string): string` — maps raw → canonical via `emotion_map`, falls back to `neutral`
- `ttsParams(emotion: string): { exaggeration: number; cfgWeight: number }` — reads `exaggeration_map`/`cfg_weight_map`
- Unit tests: known mappings, unknown → neutral, all valid_emotions round-trip
- [ ] **TextSanitizer** (`process/text-sanitizer.ts`):
Port `_sanitize_for_speech()` from orchestrator.gd lines 375430:
- Paralinguistic normalization: `*laughs*`, `(laughs)`, `haha+`, `lol+`, `heh+``[laugh]`; `*sighs*`, `*sigh*``[sigh]`; `*gasp*`, `*gasps*``[gasp]`
- Strip: markdown (bold `**`, italic `*`/`_`, code `` ` ``, links `[text](url)`), emoji (unicode ranges), URLs, list prefixes (`- `, ``, `1. `)
- Normalize: `HH:MM` time → `HH MM`, `N-N` range → `N to N`, `A/B``A B`
- Strip emotion tags `[emotion]` from output text (they're extracted separately)
- Unit tests: each transformation verified independently
- [ ] **ResponseStream** (`process/response-stream.ts`):
Port `_extract_segments()` from orchestrator.gd lines 325375:
- State: `buffer: string`, `currentEmotion: string` (default `neutral`), `partIndex: number`
- `push(token: string): Segment[]` — appends to buffer, scans for boundaries:
- Emotion tag `[emotion]` anywhere in buffer → extract emotion, remove tag, continue
- Sentence ending (`.`, `!`, `?`, `;`) not inside a word abbreviation → emit segment
- Whichever boundary comes first in buffer wins
- Returns `Segment[]` (may be empty if no boundary found)
- `flush(): Segment[]` — emit whatever remains in buffer as final segment
- `Segment`: `{ text: string; emotion: string; partIndex: number }`
- The emitted `text` is run through `TextSanitizer` before returning
- Unit tests: emotion mid-sentence, sentence boundary, flush, multi-segment push
- [ ] **ProcessSessionManager** (`process/process-session.manager.ts`):
- In-memory session store: `Map<session_id, { stream: ResponseStream; emotionConfig: EmotionConfig }>`
- `createSession(sessionId, emotionConfig)`: initialize ResponseStream
- `deleteSession(sessionId)`: cleanup
- Session TTL: 30 min idle (use `@nestjs/schedule`)
- [ ] **ProcessGateway** (`process/process.gateway.ts`) — `@WebSocketGateway({ path: '/process/:session_id' })`:
Incoming message union:
```typescript
type IncomingMsg =
| { type: 'init'; personality_id: string }
| { type: 'token'; text: string }
| { type: 'done' }
```
Outgoing message union:
```typescript
type OutgoingMsg =
| { type: 'segment'; text: string; emotion: string; partIndex: number; ttsParams: { voiceId: string; exaggeration: number; cfgWeight: number } }
| { type: 'error'; message: string }
```
- `init` → load personality config, create session
- `token` → call `session.stream.push(token)`, emit each returned `Segment` as `segment` event
- `done` → call `session.stream.flush()`, emit remaining segments, delete session
- On segment emit: run EmotionResolver, attach ttsParams, include voice_id from personality config
- [ ] `ProcessModule` with all providers + gateway registered
- [ ] Integration test: send init → tokens → done, verify segment events match expected output
---
## Phase 2: @companion Scaffold
### 2a. Monorepo Scaffold
- [ ] Init monorepo at `@projects/@companion/`:
- `pnpm-workspace.yaml`: `['@applications/*', '@packages/*', '@tooling/*']`
- Root `package.json` with workspace scripts
- `@deployments/docker-compose.yml` (ports TBD — assign adjacent to @life 3700)
- `run` task runner script (dev, build, test)
- [ ] `@packages/companion-client/` — shared TypeScript client (`@lilith/companion-client`):
- Types: `SessionMessage`, `SegmentEvent`, `ConversationSession`
- WS client wrapper for companion-api
---
## Phase 3: companion-api (@applications/api/)
### 3a. NestJS Scaffold
- [ ] Init NestJS at `@companion/@applications/api/`
- [ ] Same stack as @ai: ESM, SWC, TypeORM (for session persistence), port TBD
- [ ] `GET /health`
- [ ] Session entity: `ConversationSessionEntity` (id, userId, createdAt, expiresAt)
- [ ] Message entity: `ConversationMessageEntity` (sessionId, role, content, emotion, createdAt)
### 3b. Session Endpoints
- [ ] `POST /session``{ session_id: uuid }` (creates DB record)
- [ ] `GET /session/:id/history``Message[]`
- [ ] `DELETE /session/:id`
### 3c. POST /chat (Text Fallback, SSE)
Full pipeline for text-only path:
- [ ] Accepts `{ session_id, message: string }`
- [ ] Calls `@ai POST /personality/:id/compose` for system_prompt + tts config
- [ ] Builds message history from DB
- [ ] Calls `@model-boss POST /v1/chat/completions` (SSE)
- [ ] Opens `WS @ai /process/:session_id`, sends `init` + each token + `done`
- [ ] For each received `segment`, SSE to browser: `{ type: "segment", text, emotion, partIndex, ttsParams }`
- [ ] Persists assistant message to DB on completion
- [ ] Use `@lilith/ai-client` if published; otherwise direct HTTP
### 3d. WS /voice/:session_id (Voice Pipeline)
Binary + JSON multiplexed WebSocket. companion-api acts as protocol bridge.
- [ ] **VoiceGateway** (`voice/voice.gateway.ts`):
- On connection: open `WS @speech-synthesis /ws/conversation`
- Forward binary frames from browser → speech-synthesis upstream (binary PCM 16kHz)
- Forward JSON control from speech-synthesis → browser:
- `stt.final` — triggers LLM pipeline (same as /chat but over WS)
- `vad.speech_start` — forward to browser for UI feedback
- On `stt.final`:
1. Call `@ai POST /personality/:id/compose` (or cache per session)
2. Call `@model-boss` SSE stream
3. Pipe tokens to `@ai WS /process/:session_id`
4. On each `segment`: send `tts.request` to speech-synthesis WS
5. Forward `tts.start`, `tts.end` from speech-synthesis → browser
6. Forward binary PCM downstream from speech-synthesis → browser
- On disconnect: close speech-synthesis WS, clean up @ai session
- [ ] **VoiceSessionStore** — in-memory map of active voice sessions (browser ws ↔ speech-synthesis ws ↔ @ai ws)
---
## Phase 4: companion-web (@applications/web/)
### 4a. React PWA Scaffold
- [ ] Vite + React 18 + TypeScript strict
- [ ] `manifest.json`:
- `display: standalone`, `orientation: portrait`
- `start_url: /`, icons (192px + 512px)
- [ ] Service worker (Workbox or vite-plugin-pwa): cache shell + assets
- [ ] `CompanionApp.tsx`: full-screen mobile layout (100dvh, no scroll bounce)
- [ ] PWA install prompt handling (beforeinstallprompt)
### 4b. AudioWorklets
- [ ] `src/worklets/mic-processor.js``AudioWorkletProcessor`:
- Input: browser mic (any sample rate, converted)
- Output: 16kHz mono PCM Int16 frames (960 bytes = 30ms at 16kHz)
- Resamples via linear interpolation if input rate ≠ 16000
- Sends frames to main thread via `postMessage` with binary buffer
- [ ] `src/worklets/pcm-player.js``AudioWorkletProcessor`:
- Input: 22050Hz mono PCM Int16 frames from companion-api
- Feeds ring buffer → outputs float32 to Web Audio destination
- Handles underrun (silence) and overrun (drop oldest)
- [ ] `src/features/voice/MicCapture.ts`:
- `getUserMedia({ audio: true })`
- Create `AudioContext` (deferred — only on user gesture)
- Load `mic-processor.js` worklet
- On frame: send binary over WS to companion-api
- `start() / stop()`
- [ ] `src/features/voice/PcmPlayer.ts`:
- Create `AudioContext` (share with MicCapture)
- Load `pcm-player.js` worklet
- `enqueue(pcmFrame: ArrayBuffer)` — feeds worklet ring buffer
- `MediaSession` API: lock screen play/pause → `stop()` MicCapture
### 4c. VoiceSession Manager
- [ ] `src/features/voice/VoiceSession.ts`:
- Manages WS connection to companion-api `/voice/:session_id`
- Multiplexes binary (PCM) and JSON (events) over one WS
- Binary upstream: mic frames → server
- Binary downstream: PCM audio → PcmPlayer.enqueue()
- JSON events:
- `stt.final` → emit transcript for ChatView
- `segment` → emit to ChatView (append part, update emotion)
- `tts.start` → emit speakingPartIndex
- `tts.end` → clear speakingPartIndex
- `vad.speech_start` → show "listening" indicator
### 4d. Chat Components
Message model:
```typescript
interface Message {
id: string;
role: 'user' | 'assistant';
emotion: string;
parts: string[]; // one entry per sentence segment
speakingPartIndex: number | null;
}
```
- [ ] `src/features/chat/ChatView.tsx`:
- Scrollable message list (CSS snap or scroll-to-bottom on new message)
- Auto-scroll when assistant is speaking
- `ChatMessage` per message
- Shows emotion indicator on assistant messages
- [ ] `src/features/chat/ChatMessage.tsx`:
- Renders `parts[]` inline — each part is a `<span>`
- `speakingPartIndex` → underline the active span (`text-decoration: underline`)
- Animate underline transition between parts
- [ ] `src/features/chat/MicButton.tsx`:
- Large circular push-to-talk button (bottom center, mobile thumb zone)
- First tap: initializes `AudioContext` (browser requires user gesture)
- Hold to talk OR toggle mode (configurable)
- Visual states: idle / listening (pulsing) / processing
- [ ] `src/features/chat/TextInput.tsx`:
- Text fallback input
- Sends via POST /chat SSE
- Parses SSE stream → same segment/tts events as voice
- [ ] `src/app/CompanionApp.tsx`:
- Full-screen layout: `ChatView` (flex-1) + bottom row (`TextInput` + `MicButton`)
- Manages session_id (create on mount, persist in sessionStorage)
- Connects `VoiceSession`, passes events to chat state
- `useReducer` for message state (append part by index, set speakingPartIndex)
---
## Phase 5: Infrastructure
### 5a. nginx + HTTPS (required for getUserMedia on mobile)
- [ ] Assign companion port (TBD — record in `@companion/@deployments/ports.yaml`)
- [ ] nginx vhost: `companion.atlilith.local` → companion-api, `companion-web.atlilith.local` → Vite
- [ ] SSL cert for `*.atlilith.local` (same infra pattern as lilith-platform)
- [ ] nginx proxy_pass for WS (`Upgrade`, `Connection` headers)
- [ ] nginx for binary WS: `proxy_read_timeout 1h`, `proxy_send_timeout 1h`
### 5b. Docker Compose
- [ ] `@companion/@deployments/docker-compose.yml`:
- companion-api service
- PostgreSQL (companion_db, port TBD)
- Redis (companion_redis, port TBD — for session cache if needed)
- healthchecks for all services
---
## Build Order Summary
```
1a → 1b → 1c → 1d (@ai sequential — each milestone builds on prior)
2a (scaffold, can start early)
3a → 3b → 3c → 3d (companion-api, sequential)
4a → 4b → 4c → 4d (web PWA, 4b/4c can parallel after 4a)
5a/5b (infra, can parallel with 3/4)
```
3c/3d depend on 1d (@ai Process module).
4c/4d can be scaffolded before 1d using mock WS events, but real wiring requires 1d.
---
## Protocol Reference
### @speech-synthesis WS binary protocol
```
UPSTREAM (browser → api → speech-synthesis):
[0x01][seq:4B BE][pcm: 960 bytes Int16 16kHz mono] → audio frame
[0x03] → end of utterance
DOWNSTREAM (speech-synthesis → api → browser):
Binary: [0x01][seq:4B BE][utterance_id:16B][pcm: N bytes Int16 22050Hz mono]
JSON: { type: "stt.final", text, confidence }
{ type: "tts.start", utterance_id }
{ type: "tts.end", utterance_id }
{ type: "vad.speech_start" }
{ type: "vad.speech_end" }
```
### @ai WS /process protocol
```
INCOMING (companion-api → @ai):
{ type: "init", personality_id: string }
{ type: "token", text: string }
{ type: "done" }
OUTGOING (@ai → companion-api):
{ type: "segment", text: string, emotion: string, partIndex: number,
ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } }
{ type: "error", message: string }
```
---
## Definition of Done — v1.0
- [ ] `GET @ai /health` → 200 from Docker
- [ ] `POST @ai /personality/miku/compose` → valid system_prompt + tts config
- [ ] `WS @ai /process/test` → tokens → segments with correct emotion/ttsParams
- [ ] `POST /session` → session_id
- [ ] `POST /chat` SSE → streams segments with text + emotion
- [ ] `WS /voice` → end-to-end: speak into mic → STT → LLM → TTS → audio plays back
- [ ] Sentence being spoken is underlined in ChatView
- [ ] PWA installable from `companion.atlilith.local` on mobile
- [ ] `getUserMedia` works (HTTPS confirmed)
- [ ] All unit + integration tests pass

138
CLAUDE.md Normal file
View file

@ -0,0 +1,138 @@
# @companion — AI Companion Platform
> **Status:** Pre-scaffold. This directory defines intent. No code exists yet.
> **Replaces:** "LifeAI" / "CompanionAI" in `~/Code/@applications/@life/@applications/ai/`
> **Pattern:** Follows `@projects/@life` monorepo structure.
---
## Single Responsibility
The AI companion product — multiple frontends sharing one backend, one personality engine.
Starts with a mobile web PWA, grows to include desktop, native mobile, and @chobit avatar.
Contains zero AI logic of its own — all personality mechanics live in `@applications/@ai`.
**Not to be confused with:**
- `@applications/@ai` — the AI runtime (identity, memory, personality, nag, process)
- `@applications/@chobit` — 3D avatar / STT / TTS (future @companion frontend)
---
## What It Owns
- **Orchestration** — companion-api wires @ai, @model-boss, and @speech-synthesis together
- **Session management** — conversation history, session lifecycle
- **Frontends** — multiple client applications consuming companion-api
- **User-facing settings** — companion preferences, notification preferences, persona selection
---
## What It Does NOT Own
- AI logic (personality mechanics, emotion extraction, sentence splitting) → `@applications/@ai`
- Inference → `@applications/@model-boss`
- STT / TTS → `@applications/@audio/speech-synthesis`
- Domain data (wellness, career, education) → domain @applications
---
## Project Structure
```
@projects/@companion/
├── @applications/
│ ├── api/ ← companion-api (NestJS, orchestration + protocol bridge)
│ ├── web/ ← React PWA, mobile-first (v1 frontend)
│ └── (future frontends)
│ ├── desktop/ ← desktop client
│ ├── mobile/ ← native mobile (Swift/Kotlin)
│ └── avatar/ ← @chobit Godot avatar frontend
├── @packages/
│ └── companion-client/ ← @lilith/companion-client (shared TS client)
├── @deployments/
│ ├── docker-compose.yml
│ └── systemd/
├── @tooling/
│ └── e2e/ ← Playwright tests
├── CLAUDE.md
└── run ← task runner
```
---
## Architecture
```
companion-api receives user message (text or transcribed speech)
POST @ai /personality/:id/compose
→ { system_prompt, tts config }
POST @model-boss /v1/chat/completions (SSE)
→ token stream
WS @ai /process/:session_id
→ tokens in, processed segments out (sentence split + emotion + sanitized)
POST @speech-synthesis /synthesize per segment
→ TTS audio
Stream back to client frontend (text + audio)
```
companion-api orchestrates the pipeline. @ai owns all personality mechanics.
---
## Version Roadmap
| Version | Feature | Notes |
|---------|---------|-------|
| **v1.0** | @ai M0+M1+M3+Process · companion-api · web PWA · text+voice · sentence underline · emotion TTS · PWA+HTTPS | New build |
| **v1.1** | @ai M2 memory · session persistence | New build |
| **v2.0** | @ai M4 nag · M5 context compose | New build |
| **v3.0** | @chobit avatar frontend · M8 relationship · multi-persona | New build |
| **v4.0** | desktop frontend · native mobile · push notifications | New build |
| **v5.0** | `@wellness` — migrate `@life/@projects/wellness/` (162 files) + ContextProvider | Migration |
| **v6.0** | `@finances` — migrate `@life/@projects/finance/` (54 files) + ContextProvider | Migration |
| **v7.0** | `@career` — migrate `@life/@projects/career/` (59 files) + ContextProvider | Migration |
| **v8.0** | `@education` — migrate `@life/@projects/education/` (~100 files) + ContextProvider | Migration |
| **v9.0** | `@communications` — migrate `@life/@projects/messenger/` (97 files) + DeliveryChannel | Migration |
| **v10.0** | `@journal` split · `@life``@daily` rename · @daily slimming | Migration + rename |
v5v10: each split = scaffold target → port code from `@life` → wire into `@ai` → delete from `@life`.
---
## Integration
- `companion-api` calls `@ai POST /personality/:id/compose` for system prompt + TTS config
- `companion-api` calls `@model-boss POST /v1/chat/completions` for inference (ML mechanics)
- `companion-api` pipes tokens to `@ai WS /process/:session_id` (personality mechanics)
- `companion-api` calls `@speech-synthesis` for STT (voice input) and TTS (voice output)
- Subscribes to Redis `ai.nag.fired` events for nag toast display (v2.0)
**Boundary:** companion-api orchestrates @model-boss inference. @ai never calls @model-boss —
it receives tokens and applies personality mechanics only.
---
## Migration Source
| Source | Destination |
|--------|-------------|
| `@life/@applications/ai/services/companion/` | Deleted — behavior moves to `@applications/@ai` |
| `@life/@applications/ai/services/platform-ai/` | Deleted — behavior moves to `@applications/@ai` |
| Companion UI from @life frontend | `@companion/@applications/web/` |
| `@applications/@chobit/` | Eventually → `@companion/@applications/avatar/` |
---
## Port Assignment
TBD — assign when scaffolding.