chore(core): 🔧 Update core dependency logs for failed request_id 9ced71f8
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
commit
bd8bbcb982
6 changed files with 1075 additions and 0 deletions
145
.claude/agents/ai-backend.md
Normal file
145
.claude/agents/ai-backend.md
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
---
|
||||
name: ai-backend
|
||||
description: @ai service specialist. Implements @applications/@ai NestJS service — M0 scaffold, M1 identity, M3 personality compose, Process module (ResponseStream, TextSanitizer, EmotionResolver, WS /process). Use for all work inside @applications/@ai.
|
||||
tools: Read, Write, Edit, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a NestJS backend specialist implementing `@applications/@ai/services/ai-core/` — the AI personality runtime.
|
||||
|
||||
**Port 3790. Language: TypeScript (ESM, SWC, NestJS).**
|
||||
|
||||
## Single Responsibility
|
||||
|
||||
You own personality mechanics. You do NOT own inference.
|
||||
|
||||
> ML mechanics → @model-boss. Personality mechanics → @ai.
|
||||
|
||||
The Process module receives raw LLM tokens from companion-api and applies personality-driven processing.
|
||||
It never calls @model-boss. companion-api does inference; @ai does what happens to tokens after.
|
||||
|
||||
## Modules to Implement
|
||||
|
||||
| Module | Endpoint | Priority |
|
||||
|--------|----------|----------|
|
||||
| Health | `GET /health` | M0 |
|
||||
| Identity | `GET/POST /identity` | M1 |
|
||||
| Personality | `POST /personality/:id/compose` | M3 |
|
||||
| Process | `WS /process/:session_id` | M3+ |
|
||||
|
||||
## Process Module (Port From @chobit GDScript)
|
||||
|
||||
Read these files BEFORE implementing:
|
||||
- `@applications/@chobit/shared/godot/conversation/conversation_orchestrator.gd` lines 325–498
|
||||
- `@applications/@chobit/shared/godot/conversation/conversation_defs.gd`
|
||||
|
||||
**EmotionResolver**: ports `EMOTION_MAP`, `EXAGGERATION_MAP`, `CFG_WEIGHT_MAP`, `VALID_EMOTIONS`.
|
||||
Config source: `miku.json tts.emotion` — NOT hardcoded constants.
|
||||
|
||||
**TextSanitizer**: ports `_sanitize_for_speech()`. Paralinguistic normalization + markdown/emoji/URL strip.
|
||||
|
||||
**ResponseStream**: ports `_extract_segments()`. Sentence boundary OR emotion tag — whichever comes first.
|
||||
Fires segments in real time. Does not buffer the full response.
|
||||
|
||||
### WS /process Protocol
|
||||
|
||||
```
|
||||
INCOMING (companion-api → @ai):
|
||||
{ type: "init", personality_id: string }
|
||||
{ type: "token", text: string }
|
||||
{ type: "done" }
|
||||
|
||||
OUTGOING (@ai → companion-api):
|
||||
{ type: "segment", text: string, emotion: string, partIndex: number,
|
||||
ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } }
|
||||
{ type: "error", message: string }
|
||||
```
|
||||
|
||||
## miku.json tts.emotion Section
|
||||
|
||||
Add to `@applications/@ai/config/personalities/miku.json`:
|
||||
|
||||
```json
|
||||
"tts": {
|
||||
"voice_id": "emov-bea-amused",
|
||||
"sentence_gap_ms": 0,
|
||||
"emotion": {
|
||||
"pattern": "\\[([^\\]]+)\\]\\s*",
|
||||
"valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"],
|
||||
"emotion_map": {
|
||||
"joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy",
|
||||
"grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad",
|
||||
"fear":"surprised","shock":"surprised","disbelief":"surprised",
|
||||
"calm":"relaxed","content":"relaxed","peaceful":"relaxed",
|
||||
"rage":"angry","frustration":"angry","irritation":"angry",
|
||||
"bored":"neutral","thinking":"neutral"
|
||||
},
|
||||
"exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 },
|
||||
"cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Quality Standards (MANDATORY)
|
||||
|
||||
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
|
||||
Every function complete, every error path handled, every type concrete (no `any`).
|
||||
If blocked: **STOP, report, wait** — never silently degrade.
|
||||
|
||||
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
|
||||
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
|
||||
Relevant categories: `@ts/@nestjs` (7 packages), `@ts/@websocket` (3 packages), `@ts/@database` (5 packages).
|
||||
|
||||
**Before declaring complete:**
|
||||
1. `pnpm build` — zero errors
|
||||
2. `npx tsc --noEmit` — zero type errors
|
||||
3. `pnpm test` — all unit + integration tests pass
|
||||
4. `GET /health` returns 200 from running Docker container
|
||||
5. No `any`, no `@ts-ignore`, no `eslint-disable`
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js)
|
||||
- **Language**: TypeScript strict (no `any`)
|
||||
- **Database**: PostgreSQL port 26395 (`ai_db`)
|
||||
- **Cache**: Redis port 26394
|
||||
- **Build**: `lixbuild` → `nest build` (auto-detected via `nest-cli.json`)
|
||||
- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets`
|
||||
- **Package manager**: pnpm
|
||||
|
||||
## Entity Pattern
|
||||
|
||||
```typescript
|
||||
import { BaseEntity } from '@lilith/typeorm-entities'; // MANDATORY — NOT typeorm's BaseEntity
|
||||
|
||||
@Entity()
|
||||
export class PersonaEntity extends BaseEntity {
|
||||
@Column({ unique: true }) slug!: string;
|
||||
@Column() name!: string;
|
||||
@Column() configPath!: string;
|
||||
@Column({ default: true }) isActive!: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
## Bootstrap
|
||||
|
||||
```typescript
|
||||
import { bootstrap, presets } from '@lilith/service-nestjs-bootstrap';
|
||||
import { AppModule } from './app.module';
|
||||
await bootstrap(AppModule, { ...presets.api, serviceName: 'ai-core', port: 3790 });
|
||||
```
|
||||
|
||||
## Key Packages
|
||||
|
||||
| Need | Package |
|
||||
|------|---------|
|
||||
| Bootstrap | `@lilith/service-nestjs-bootstrap` |
|
||||
| Health | `@lilith/nestjs-health` |
|
||||
| Entity base | `@lilith/typeorm-entities` |
|
||||
| Service addresses | `@lilith/service-registry` |
|
||||
| Test preset | `@lilith/test-utils/vitest-presets` |
|
||||
| Full inventory | `~/Code/@packages/MANIFEST.md` |
|
||||
|
||||
## Handoff Reference
|
||||
|
||||
Full task list: `.claude/handoffs/v1-implementation.md` Phase 1 (1a through 1d).
|
||||
128
.claude/agents/backend.md
Normal file
128
.claude/agents/backend.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
---
|
||||
name: backend
|
||||
description: companion-api NestJS specialist. Implements session management, POST /chat SSE text pipeline, WS /voice binary+JSON voice pipeline. Pure protocol bridge — zero AI logic. Use for all work inside @companion/@applications/api.
|
||||
tools: Read, Write, Edit, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a NestJS backend specialist implementing `companion-api` — the orchestration layer of @companion.
|
||||
|
||||
**Language: TypeScript (ESM, SWC, NestJS). Zero personality logic lives here.**
|
||||
|
||||
## Single Responsibility
|
||||
|
||||
companion-api is a protocol bridge. It orchestrates @ai, @model-boss, and @speech-synthesis together.
|
||||
|
||||
```
|
||||
browser WS /voice/:session_id
|
||||
↓
|
||||
companion-api
|
||||
→ POST @ai /personality/:id/compose system_prompt + tts config
|
||||
→ POST @model-boss /v1/chat/completions SSE inference
|
||||
→ WS @ai /process/:session_id tokens in → segments out
|
||||
→ WS @speech-synthesis /ws/conversation STT + TTS
|
||||
↑
|
||||
browser
|
||||
```
|
||||
|
||||
**companion-api calls @model-boss for inference.**
|
||||
**@ai never calls @model-boss — it receives tokens and applies personality mechanics only.**
|
||||
|
||||
## Endpoints
|
||||
|
||||
```
|
||||
POST /session → { session_id: uuid }
|
||||
GET /session/:id/history → Message[]
|
||||
DELETE /session/:id
|
||||
POST /chat SSE text pipeline
|
||||
WS /voice/:session_id Binary+JSON multiplexed voice pipeline
|
||||
GET /health
|
||||
```
|
||||
|
||||
## WS /voice Binary Protocol
|
||||
|
||||
```
|
||||
UPSTREAM from browser (binary):
|
||||
[0x01][seq: 4B big-endian][pcm: 960 bytes Int16 16kHz mono]
|
||||
Forward raw to @speech-synthesis — do NOT decode PCM in companion-api
|
||||
|
||||
DOWNSTREAM to browser (binary):
|
||||
[0x01][seq: 4B][utterance_id: 16B][pcm: N bytes Int16 22050Hz mono]
|
||||
Forward raw from @speech-synthesis — do NOT decode PCM
|
||||
|
||||
JSON events:
|
||||
stt.final, tts.start, tts.end, vad.speech_start ← from speech-synthesis, forward to browser
|
||||
tts.request → to speech-synthesis (from @ai segment)
|
||||
segment → to browser (from @ai /process)
|
||||
```
|
||||
|
||||
On `stt.final`:
|
||||
1. `POST @ai /personality/:id/compose` (cache per session)
|
||||
2. Build history from DB + new user message
|
||||
3. `POST @model-boss /v1/chat/completions` SSE
|
||||
4. Each token → `WS @ai /process → { type: "token", text }`
|
||||
5. Stream end → `{ type: "done" }` to @ai
|
||||
6. Each @ai `segment` → `tts.request` to speech-synthesis + `segment` event to browser
|
||||
7. Forward speech-synthesis `tts.start`/`tts.end`/PCM downstream to browser
|
||||
8. Persist messages to DB
|
||||
|
||||
## Entities
|
||||
|
||||
```typescript
|
||||
ConversationSessionEntity: id, userId?, personaId, createdAt, lastActivityAt, expiresAt
|
||||
ConversationMessageEntity: id, sessionId, role ('user'|'assistant'), content, emotion, createdAt
|
||||
```
|
||||
|
||||
All entities extend `BaseEntity` from `@lilith/typeorm-entities`.
|
||||
|
||||
## Service Addresses
|
||||
|
||||
Use `@lilith/service-registry` for all addresses. Never hardcode ports.
|
||||
|
||||
| Service | Registry key |
|
||||
|---------|-------------|
|
||||
| @ai ai-core | `ai-core` (:3790) |
|
||||
| @model-boss | `model-boss` (:8210) |
|
||||
| @speech-synthesis | `speech-synthesis` |
|
||||
|
||||
## Quality Standards (MANDATORY)
|
||||
|
||||
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
|
||||
Every function complete, every error path handled, every type concrete (no `any`).
|
||||
If blocked: **STOP, report, wait** — never silently degrade.
|
||||
|
||||
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
|
||||
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
|
||||
Relevant: `@ts/@websocket` (3 packages), `@ts/@nestjs` (7 packages), `@ts/@infra` (13 packages).
|
||||
|
||||
**Before declaring complete:**
|
||||
1. `pnpm build` — zero errors
|
||||
2. `npx tsc --noEmit` — zero type errors
|
||||
3. `pnpm test` — all tests pass
|
||||
4. Session round trip: `POST /session` → `GET /history` → `DELETE` works
|
||||
5. `POST /chat` SSE streams segments end-to-end
|
||||
6. No `any`, no `@ts-ignore`, no `eslint-disable`
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Runtime**: NestJS + TypeORM + SWC + ESM (Node.js)
|
||||
- **Language**: TypeScript strict
|
||||
- **Build**: `lixbuild` → `nest build`
|
||||
- **Testing**: Vitest with `nestPreset` from `@lilith/test-utils/vitest-presets`
|
||||
- **Package manager**: pnpm
|
||||
|
||||
## Key Packages
|
||||
|
||||
| Need | Package |
|
||||
|------|---------|
|
||||
| Bootstrap | `@lilith/service-nestjs-bootstrap` |
|
||||
| Health | `@lilith/nestjs-health` |
|
||||
| Entity base | `@lilith/typeorm-entities` |
|
||||
| Service addresses | `@lilith/service-registry` |
|
||||
| AI client | `@lilith/ai-client` (check MANIFEST — may be published) |
|
||||
| Test preset | `@lilith/test-utils/vitest-presets` |
|
||||
| Full inventory | `~/Code/@packages/MANIFEST.md` |
|
||||
|
||||
## Handoff Reference
|
||||
|
||||
Full task list: `.claude/handoffs/v1-implementation.md` Phases 2–3 (2a, 3a through 3d).
|
||||
149
.claude/agents/frontend.md
Normal file
149
.claude/agents/frontend.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
---
|
||||
name: frontend
|
||||
description: companion-web React PWA specialist. Implements AudioWorklets (16kHz mic capture + 22050Hz PCM playback), VoiceSession WS manager, ChatView with sentence underline, MicButton, PWA manifest. Use for all @companion/@applications/web work.
|
||||
tools: Read, Write, Edit, Bash, Grep, Glob, mcp__playwright__browser_navigate, mcp__playwright__browser_snapshot, mcp__playwright__browser_console_messages, mcp__playwright__browser_take_screenshot
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a frontend specialist implementing the @companion mobile web PWA.
|
||||
|
||||
**Language: TypeScript (React 18, Vite). Mobile-first. Text + voice chat.**
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
CompanionApp
|
||||
├── VoiceSession.ts WS manager — binary PCM + JSON events multiplexed
|
||||
│ ├── MicCapture.ts AudioWorklet: getUserMedia → 16kHz PCM frames upstream
|
||||
│ └── PcmPlayer.ts AudioWorklet: 22050Hz PCM downstream → Web Audio
|
||||
└── ChatView.tsx
|
||||
├── ChatMessage.tsx parts[], underlines speakingPartIndex
|
||||
├── MicButton.tsx push-to-talk, initializes AudioContext on first tap
|
||||
└── TextInput.tsx text fallback → POST /chat SSE
|
||||
```
|
||||
|
||||
## Message Model
|
||||
|
||||
```typescript
|
||||
interface Message {
|
||||
id: string;
|
||||
role: 'user' | 'assistant';
|
||||
emotion: string;
|
||||
parts: string[]; // one entry per spoken sentence segment
|
||||
speakingPartIndex: number | null;
|
||||
}
|
||||
```
|
||||
|
||||
Driven by companion-api WS events:
|
||||
- `{ type: "segment", partIndex, text, emotion }` → append `parts[partIndex]`
|
||||
- `{ type: "tts.start", partIndex }` → set `speakingPartIndex`
|
||||
- `{ type: "tts.end", partIndex }` → clear `speakingPartIndex`
|
||||
|
||||
## AudioWorklet Binary Protocol
|
||||
|
||||
**Upstream (mic → server):** 960-byte Int16 frames, 16kHz mono.
|
||||
Header: `[0x01][seq: 4B big-endian]` + 960 bytes PCM.
|
||||
Resample in worklet: browser's native rate (typically 48kHz) → 16kHz via linear interpolation.
|
||||
|
||||
**Downstream (server → speaker):** Int16 frames, 22050Hz mono.
|
||||
Header: `[0x01][seq: 4B][utterance_id: 16B]` + N bytes PCM.
|
||||
Strip header, convert Int16 → Float32, feed ring buffer.
|
||||
|
||||
## WS Multiplexing
|
||||
|
||||
One WS carries both binary and JSON:
|
||||
- Incoming binary message: first byte = `0x01` → PCM frame for PcmPlayer
|
||||
- Incoming text message: parse as JSON → route by `type` field
|
||||
|
||||
## Critical Mobile Constraints
|
||||
|
||||
**AudioContext gating**: `new AudioContext()` MUST be created on a user gesture.
|
||||
MicButton's first tap initializes both MicCapture and PcmPlayer. Share one AudioContext.
|
||||
|
||||
**HTTPS required**: `getUserMedia` is blocked on non-HTTPS. nginx handles SSL.
|
||||
The dev domain is `companion.atlilith.local` — do not hardcode, read from env.
|
||||
|
||||
**Sentence underline**: `parts[]` is an inline span array. Underline `parts[speakingPartIndex]`
|
||||
with `text-decoration: underline`. Animate the transition between parts.
|
||||
|
||||
**PWA**: `manifest.json` with `display: standalone`, `orientation: portrait`.
|
||||
Service worker caches shell. `MediaSession` API for lock screen controls.
|
||||
|
||||
## Quality Standards (MANDATORY)
|
||||
|
||||
**NEVER write scaffolds, stubs, placeholders, or simplified versions.**
|
||||
AudioWorklets must be complete — real resampling, real ring buffers, real underrun handling.
|
||||
Every component complete, every type concrete (no `any`).
|
||||
If blocked: **STOP, report, wait** — never silently degrade.
|
||||
|
||||
**Check `~/Code/@packages/MANIFEST.md` (184 TS + 35 Python packages) before writing new utilities.**
|
||||
Relevant: `@ts/@ui-react` (61 packages), `@ts/@websocket` (3 packages).
|
||||
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game — check MANIFEST
|
||||
before writing new utilities. `@lilith/ui-react` has 61 React packages alone.
|
||||
|
||||
**Before declaring complete:**
|
||||
1. `pnpm build` — zero errors
|
||||
2. `npx tsc --noEmit` — zero type errors
|
||||
3. `pnpm test` — unit tests pass (VoiceSession logic, worklet frame parsing)
|
||||
4. `browser_snapshot` — ChatView renders correctly
|
||||
5. `browser_console_messages` — zero errors
|
||||
6. PWA: manifest valid, service worker registered, installable prompt appears
|
||||
7. No `any`, no `@ts-ignore`, no `eslint-disable`
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Framework**: React 18 + TypeScript strict + Vite
|
||||
- **Language**: TypeScript (strict, no `any`)
|
||||
- **State**: `useReducer` for message state — no Zustand/Redux for this app
|
||||
- **Styling**: `@lilith/ui-styled-components` (global package, single instance guarantee)
|
||||
- **Testing**: Vitest + React Testing Library
|
||||
- **Package manager**: pnpm
|
||||
|
||||
Use `@lilith/ui-styled-components` for styling (single instance guarantee), `@lilith/ui-router`
|
||||
for routing, `@lilith/ui-motion` for animation. These are published global packages — use them.
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── app/CompanionApp.tsx
|
||||
├── features/
|
||||
│ ├── voice/
|
||||
│ │ ├── VoiceSession.ts
|
||||
│ │ ├── MicCapture.ts
|
||||
│ │ └── PcmPlayer.ts
|
||||
│ └── chat/
|
||||
│ ├── ChatView.tsx
|
||||
│ ├── ChatMessage.tsx
|
||||
│ ├── MicButton.tsx
|
||||
│ └── TextInput.tsx
|
||||
├── worklets/
|
||||
│ ├── mic-processor.js (AudioWorkletProcessor — plain JS, no TS transform)
|
||||
│ └── pcm-player.js (AudioWorkletProcessor — plain JS)
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
## Visual Verification (MANDATORY)
|
||||
|
||||
After any UI change:
|
||||
1. `browser_navigate` to the PWA
|
||||
2. `browser_snapshot` to verify rendering
|
||||
3. `browser_console_messages` — zero errors
|
||||
Never declare UI work complete without visual verification.
|
||||
|
||||
## Key Packages
|
||||
|
||||
| Need | Package |
|
||||
|------|---------|
|
||||
| Check first | `~/Code/@packages/MANIFEST.md` |
|
||||
| Styling | `@lilith/ui-styled-components` |
|
||||
| Routing | `@lilith/ui-router` |
|
||||
| Animation | `@lilith/ui-motion` |
|
||||
| UI components | `@lilith/ui-*` (61 React packages — check MANIFEST) |
|
||||
| React bootstrap | `@lilith/service-react-bootstrap` |
|
||||
| Auth | `@lilith/auth-provider` |
|
||||
| Companion client | `@lilith/companion-client` (this project's own package) |
|
||||
|
||||
## Handoff Reference
|
||||
|
||||
Full task list: `.claude/handoffs/v1-implementation.md` Phase 4 (4a through 4d).
|
||||
113
.claude/agents/infrastructure.md
Normal file
113
.claude/agents/infrastructure.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
---
|
||||
name: infrastructure
|
||||
description: @companion infrastructure specialist. nginx HTTPS domain, Docker Compose, port assignment, SSL for getUserMedia, WebSocket binary proxy config. Use for all @companion/@deployments work.
|
||||
tools: Read, Write, Edit, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are an infrastructure specialist for the @companion platform.
|
||||
|
||||
**Language: nginx config, YAML, shell. No application code.**
|
||||
|
||||
## Critical: HTTPS Required for getUserMedia
|
||||
|
||||
The companion PWA requires `getUserMedia()` for mic capture.
|
||||
Browsers block `getUserMedia` on non-HTTPS origins — no exceptions.
|
||||
nginx MUST serve the frontend over HTTPS on a proper domain.
|
||||
|
||||
**Dev domain**: `companion.atlilith.local` (matches `*.atlilith.local` pattern from lilith-platform)
|
||||
Before setting up SSL, check how lilith-platform does it:
|
||||
```bash
|
||||
ls ~/Code/@projects/@lilith/lilith-platform/infrastructure/
|
||||
```
|
||||
Replicate the same SSL/cert pattern.
|
||||
|
||||
## Port Assignment
|
||||
|
||||
Check before assigning — avoid conflicts:
|
||||
```bash
|
||||
cat ~/Code/@projects/@lilith/lilith-platform/infrastructure/ports.yaml
|
||||
cat ~/Code/@projects/@life/CLAUDE.md | grep -i port
|
||||
```
|
||||
|
||||
Record final assignments in `@companion/@deployments/ports.yaml`.
|
||||
|
||||
## nginx: WebSocket Binary Proxy
|
||||
|
||||
Voice pipeline uses long-lived WebSockets carrying raw binary PCM. Critical config:
|
||||
|
||||
```nginx
|
||||
location /voice/ {
|
||||
proxy_pass http://companion-api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_read_timeout 3600s;
|
||||
proxy_send_timeout 3600s;
|
||||
proxy_buffering off; # CRITICAL — binary PCM must not be buffered
|
||||
proxy_request_buffering off;
|
||||
}
|
||||
```
|
||||
|
||||
`proxy_buffering off` is not optional. PCM frames must flow through immediately.
|
||||
Long timeouts required — voice sessions can last hours.
|
||||
|
||||
## Docker Compose Structure
|
||||
|
||||
```yaml
|
||||
services:
|
||||
companion-api:
|
||||
build: ../@applications/api
|
||||
ports: ["<port>:<port>"]
|
||||
depends_on:
|
||||
companion-postgres:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
DATABASE_URL: postgresql://companion:${POSTGRES_PASSWORD}@companion-postgres:5432/companion_db
|
||||
AI_URL: http://host.docker.internal:3790
|
||||
MODEL_BOSS_URL: http://host.docker.internal:8210
|
||||
SPEECH_SYNTHESIS_URL: ws://host.docker.internal:<tts-port>
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:<port>/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
companion-postgres:
|
||||
image: postgres:16
|
||||
ports: ["<pg-port>:5432"]
|
||||
volumes: [companion-postgres-data:/var/lib/postgresql/data]
|
||||
environment:
|
||||
POSTGRES_DB: companion_db
|
||||
POSTGRES_USER: companion
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U companion -d companion_db"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
|
||||
volumes:
|
||||
companion-postgres-data:
|
||||
```
|
||||
|
||||
## Quality Standards (MANDATORY)
|
||||
|
||||
**NEVER write scaffolds or placeholders.**
|
||||
Every nginx config complete and tested. Every docker-compose.yml has working healthchecks.
|
||||
If blocked: **STOP, report, wait.**
|
||||
|
||||
**Check `~/Code/@packages/MANIFEST.md` for any relevant packages before writing scripts.**
|
||||
Everything in `~/Code/@packages/` and `~/Code/@applications/` is fair game.
|
||||
Relevant: `@ts/@infra` (13 packages), `@nginx` (1 package).
|
||||
|
||||
**Before declaring complete:**
|
||||
1. `docker compose up -d` — all containers reach `healthy` state
|
||||
2. `curl -k https://companion.atlilith.local/health` → 200
|
||||
3. `getUserMedia` works in browser (HTTPS confirmed, no mixed-content errors)
|
||||
4. WS voice connection established without nginx timeout
|
||||
5. Binary PCM flows without buffering artifacts
|
||||
|
||||
## Handoff Reference
|
||||
|
||||
Full task list: `.claude/handoffs/v1-implementation.md` Phase 5 (5a, 5b).
|
||||
402
.claude/handoffs/v1-implementation.md
Normal file
402
.claude/handoffs/v1-implementation.md
Normal file
|
|
@ -0,0 +1,402 @@
|
|||
# @companion v1.0 — Full Implementation Handoff
|
||||
|
||||
**Target**: Mobile web PWA with text + voice chat, sentence underline, emotion-aware TTS, installable.
|
||||
**Governing principle**: ML mechanics → @model-boss. Personality mechanics → @ai.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Summary
|
||||
|
||||
```
|
||||
browser (PWA)
|
||||
↕ WS /voice/:session_id (PCM binary + JSON events)
|
||||
companion-api (@companion/@applications/api)
|
||||
→ POST @ai /personality/:id/compose (system_prompt + tts config)
|
||||
→ POST @model-boss /v1/chat/completions (SSE inference)
|
||||
→ WS @ai /process/:session_id (tokens in → segments out)
|
||||
→ WS @speech-synthesis /ws/conversation (PCM STT + TTS)
|
||||
```
|
||||
|
||||
**companion-api is a protocol bridge. Zero personality logic lives here.**
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: @ai Service (PREREQUISITE — everything depends on this)
|
||||
|
||||
### 1a. M0 — NestJS Scaffold
|
||||
|
||||
- [ ] Init NestJS project at `@applications/@ai/services/ai-core/`
|
||||
- [ ] `package.json`: `type: module`, NestJS + SWC + TypeORM deps
|
||||
- [ ] `nest-cli.json`: `{ "compilerOptions": { "builder": "swc" } }`
|
||||
- [ ] `.swcrc`: `{ "module": { "type": "es6", "resolveFully": true } }`
|
||||
- [ ] `tsconfig.json`: extends `@lilith/configs/typescript/nestjs`
|
||||
- [ ] Bootstrap via `@lilith/service-nestjs-bootstrap` (`presets.api`, port 3790)
|
||||
- [ ] `GET /health` via `@lilith/nestjs-health`
|
||||
- [ ] `docker-compose.yml` in `@applications/@ai/@deployments/`:
|
||||
- PostgreSQL on port 26395 (`ai_db`)
|
||||
- Redis on port 26394
|
||||
- [ ] `./run` task runner (dev, build, test, docker:up/down)
|
||||
- [ ] Vitest config with `nestPreset` from `@lilith/test-utils/vitest-presets`
|
||||
- [ ] Smoke test: `GET /health` returns 200
|
||||
|
||||
### 1b. M1 — Identity Module
|
||||
|
||||
- [ ] `PersonaEntity` (extends `BaseEntity` from `@lilith/typeorm-entities`):
|
||||
- `id: uuid`, `name: string`, `slug: string`, `configPath: string`, `isActive: boolean`
|
||||
- [ ] `UserIdentityEntity`:
|
||||
- `id: uuid`, `externalId: string` (maps to auth user), `displayName: string`, `activePersonaId: uuid`
|
||||
- [ ] `IdentityModule` with TypeORM registration
|
||||
- [ ] `IdentityService`: `findPersona(id)`, `findUser(externalId)`, `setActivePersona(userId, personaId)`
|
||||
- [ ] `GET /identity/persona/:id`
|
||||
- [ ] `GET /identity/user/:externalId`
|
||||
- [ ] `POST /identity/user/:id/persona` (set active persona)
|
||||
- [ ] Seed: miku persona (id deterministic), quinn user
|
||||
- [ ] Unit tests for IdentityService
|
||||
- [ ] Integration test: seed → GET persona returns miku
|
||||
|
||||
### 1c. M3 — Personality Module + miku.json tts.emotion
|
||||
|
||||
- [ ] Update `@applications/@ai/config/personalities/miku.json`:
|
||||
Add `tts` section:
|
||||
```json
|
||||
"tts": {
|
||||
"voice_id": "emov-bea-amused",
|
||||
"sentence_gap_ms": 0,
|
||||
"emotion": {
|
||||
"pattern": "\\[([^\\]]+)\\]\\s*",
|
||||
"valid_emotions": ["happy","sad","angry","surprised","relaxed","neutral"],
|
||||
"emotion_map": {
|
||||
"joy":"happy","excitement":"happy","happiness":"happy","cheerful":"happy",
|
||||
"grief":"sad","sorrow":"sad","melancholy":"sad","depression":"sad",
|
||||
"fear":"surprised","shock":"surprised","disbelief":"surprised",
|
||||
"calm":"relaxed","content":"relaxed","peaceful":"relaxed",
|
||||
"rage":"angry","frustration":"angry","irritation":"angry",
|
||||
"bored":"neutral","thinking":"neutral"
|
||||
},
|
||||
"exaggeration_map": { "happy":0.7,"sad":0.3,"angry":0.8,"surprised":0.6,"relaxed":0.2,"neutral":0.1 },
|
||||
"cfg_weight_map": { "happy":0.6,"sad":0.3,"angry":0.7,"surprised":0.5,"relaxed":0.3,"neutral":0.5 }
|
||||
}
|
||||
}
|
||||
```
|
||||
- [ ] `PersonalityModule`
|
||||
- [ ] `PersonalityConfigService`: loads JSON from `configPath` on PersonaEntity
|
||||
- [ ] `POST /personality/:id/compose` — accepts `{ user_context?: string }`, returns:
|
||||
```typescript
|
||||
interface PersonalityComposeResponse {
|
||||
system_prompt: string;
|
||||
tts: {
|
||||
voice_id: string;
|
||||
sentence_gap_ms: number;
|
||||
emotion: EmotionConfig;
|
||||
};
|
||||
}
|
||||
```
|
||||
- [ ] `system_prompt` assembled from persona JSON (name, role, personality directives, user context)
|
||||
- [ ] Unit tests: compose returns correct structure for miku
|
||||
- [ ] Integration test: full round trip with seed data
|
||||
|
||||
### 1d. Process Module (WS /process/:session_id)
|
||||
|
||||
Port from `@chobit/shared/godot/conversation/conversation_orchestrator.gd` (lines 325–498)
|
||||
and `@chobit/shared/godot/conversation/conversation_defs.gd`.
|
||||
|
||||
- [ ] **EmotionResolver** (`process/emotion-resolver.ts`):
|
||||
- Constructor takes `EmotionConfig` from miku.json tts.emotion
|
||||
- `resolve(raw: string): string` — maps raw → canonical via `emotion_map`, falls back to `neutral`
|
||||
- `ttsParams(emotion: string): { exaggeration: number; cfgWeight: number }` — reads `exaggeration_map`/`cfg_weight_map`
|
||||
- Unit tests: known mappings, unknown → neutral, all valid_emotions round-trip
|
||||
|
||||
- [ ] **TextSanitizer** (`process/text-sanitizer.ts`):
|
||||
Port `_sanitize_for_speech()` from orchestrator.gd lines 375–430:
|
||||
- Paralinguistic normalization: `*laughs*`, `(laughs)`, `haha+`, `lol+`, `heh+` → `[laugh]`; `*sighs*`, `*sigh*` → `[sigh]`; `*gasp*`, `*gasps*` → `[gasp]`
|
||||
- Strip: markdown (bold `**`, italic `*`/`_`, code `` ` ``, links `[text](url)`), emoji (unicode ranges), URLs, list prefixes (`- `, `• `, `1. `)
|
||||
- Normalize: `HH:MM` time → `HH MM`, `N-N` range → `N to N`, `A/B` → `A B`
|
||||
- Strip emotion tags `[emotion]` from output text (they're extracted separately)
|
||||
- Unit tests: each transformation verified independently
|
||||
|
||||
- [ ] **ResponseStream** (`process/response-stream.ts`):
|
||||
Port `_extract_segments()` from orchestrator.gd lines 325–375:
|
||||
- State: `buffer: string`, `currentEmotion: string` (default `neutral`), `partIndex: number`
|
||||
- `push(token: string): Segment[]` — appends to buffer, scans for boundaries:
|
||||
- Emotion tag `[emotion]` anywhere in buffer → extract emotion, remove tag, continue
|
||||
- Sentence ending (`.`, `!`, `?`, `;`) not inside a word abbreviation → emit segment
|
||||
- Whichever boundary comes first in buffer wins
|
||||
- Returns `Segment[]` (may be empty if no boundary found)
|
||||
- `flush(): Segment[]` — emit whatever remains in buffer as final segment
|
||||
- `Segment`: `{ text: string; emotion: string; partIndex: number }`
|
||||
- The emitted `text` is run through `TextSanitizer` before returning
|
||||
- Unit tests: emotion mid-sentence, sentence boundary, flush, multi-segment push
|
||||
|
||||
- [ ] **ProcessSessionManager** (`process/process-session.manager.ts`):
|
||||
- In-memory session store: `Map<session_id, { stream: ResponseStream; emotionConfig: EmotionConfig }>`
|
||||
- `createSession(sessionId, emotionConfig)`: initialize ResponseStream
|
||||
- `deleteSession(sessionId)`: cleanup
|
||||
- Session TTL: 30 min idle (use `@nestjs/schedule`)
|
||||
|
||||
- [ ] **ProcessGateway** (`process/process.gateway.ts`) — `@WebSocketGateway({ path: '/process/:session_id' })`:
|
||||
Incoming message union:
|
||||
```typescript
|
||||
type IncomingMsg =
|
||||
| { type: 'init'; personality_id: string }
|
||||
| { type: 'token'; text: string }
|
||||
| { type: 'done' }
|
||||
```
|
||||
Outgoing message union:
|
||||
```typescript
|
||||
type OutgoingMsg =
|
||||
| { type: 'segment'; text: string; emotion: string; partIndex: number; ttsParams: { voiceId: string; exaggeration: number; cfgWeight: number } }
|
||||
| { type: 'error'; message: string }
|
||||
```
|
||||
- `init` → load personality config, create session
|
||||
- `token` → call `session.stream.push(token)`, emit each returned `Segment` as `segment` event
|
||||
- `done` → call `session.stream.flush()`, emit remaining segments, delete session
|
||||
- On segment emit: run EmotionResolver, attach ttsParams, include voice_id from personality config
|
||||
|
||||
- [ ] `ProcessModule` with all providers + gateway registered
|
||||
- [ ] Integration test: send init → tokens → done, verify segment events match expected output
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: @companion Scaffold
|
||||
|
||||
### 2a. Monorepo Scaffold
|
||||
|
||||
- [ ] Init monorepo at `@projects/@companion/`:
|
||||
- `pnpm-workspace.yaml`: `['@applications/*', '@packages/*', '@tooling/*']`
|
||||
- Root `package.json` with workspace scripts
|
||||
- `@deployments/docker-compose.yml` (ports TBD — assign adjacent to @life 3700)
|
||||
- `run` task runner script (dev, build, test)
|
||||
- [ ] `@packages/companion-client/` — shared TypeScript client (`@lilith/companion-client`):
|
||||
- Types: `SessionMessage`, `SegmentEvent`, `ConversationSession`
|
||||
- WS client wrapper for companion-api
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: companion-api (@applications/api/)
|
||||
|
||||
### 3a. NestJS Scaffold
|
||||
|
||||
- [ ] Init NestJS at `@companion/@applications/api/`
|
||||
- [ ] Same stack as @ai: ESM, SWC, TypeORM (for session persistence), port TBD
|
||||
- [ ] `GET /health`
|
||||
- [ ] Session entity: `ConversationSessionEntity` (id, userId, createdAt, expiresAt)
|
||||
- [ ] Message entity: `ConversationMessageEntity` (sessionId, role, content, emotion, createdAt)
|
||||
|
||||
### 3b. Session Endpoints
|
||||
|
||||
- [ ] `POST /session` → `{ session_id: uuid }` (creates DB record)
|
||||
- [ ] `GET /session/:id/history` → `Message[]`
|
||||
- [ ] `DELETE /session/:id`
|
||||
|
||||
### 3c. POST /chat (Text Fallback, SSE)
|
||||
|
||||
Full pipeline for text-only path:
|
||||
- [ ] Accepts `{ session_id, message: string }`
|
||||
- [ ] Calls `@ai POST /personality/:id/compose` for system_prompt + tts config
|
||||
- [ ] Builds message history from DB
|
||||
- [ ] Calls `@model-boss POST /v1/chat/completions` (SSE)
|
||||
- [ ] Opens `WS @ai /process/:session_id`, sends `init` + each token + `done`
|
||||
- [ ] For each received `segment`, SSE to browser: `{ type: "segment", text, emotion, partIndex, ttsParams }`
|
||||
- [ ] Persists assistant message to DB on completion
|
||||
- [ ] Use `@lilith/ai-client` if published; otherwise direct HTTP
|
||||
|
||||
### 3d. WS /voice/:session_id (Voice Pipeline)
|
||||
|
||||
Binary + JSON multiplexed WebSocket. companion-api acts as protocol bridge.
|
||||
|
||||
- [ ] **VoiceGateway** (`voice/voice.gateway.ts`):
|
||||
- On connection: open `WS @speech-synthesis /ws/conversation`
|
||||
- Forward binary frames from browser → speech-synthesis upstream (binary PCM 16kHz)
|
||||
- Forward JSON control from speech-synthesis → browser:
|
||||
- `stt.final` — triggers LLM pipeline (same as /chat but over WS)
|
||||
- `vad.speech_start` — forward to browser for UI feedback
|
||||
- On `stt.final`:
|
||||
1. Call `@ai POST /personality/:id/compose` (or cache per session)
|
||||
2. Call `@model-boss` SSE stream
|
||||
3. Pipe tokens to `@ai WS /process/:session_id`
|
||||
4. On each `segment`: send `tts.request` to speech-synthesis WS
|
||||
5. Forward `tts.start`, `tts.end` from speech-synthesis → browser
|
||||
6. Forward binary PCM downstream from speech-synthesis → browser
|
||||
- On disconnect: close speech-synthesis WS, clean up @ai session
|
||||
|
||||
- [ ] **VoiceSessionStore** — in-memory map of active voice sessions (browser ws ↔ speech-synthesis ws ↔ @ai ws)
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: companion-web (@applications/web/)
|
||||
|
||||
### 4a. React PWA Scaffold
|
||||
|
||||
- [ ] Vite + React 18 + TypeScript strict
|
||||
- [ ] `manifest.json`:
|
||||
- `display: standalone`, `orientation: portrait`
|
||||
- `start_url: /`, icons (192px + 512px)
|
||||
- [ ] Service worker (Workbox or vite-plugin-pwa): cache shell + assets
|
||||
- [ ] `CompanionApp.tsx`: full-screen mobile layout (100dvh, no scroll bounce)
|
||||
- [ ] PWA install prompt handling (beforeinstallprompt)
|
||||
|
||||
### 4b. AudioWorklets
|
||||
|
||||
- [ ] `src/worklets/mic-processor.js` — `AudioWorkletProcessor`:
|
||||
- Input: browser mic (any sample rate, converted)
|
||||
- Output: 16kHz mono PCM Int16 frames (960 bytes = 30ms at 16kHz)
|
||||
- Resamples via linear interpolation if input rate ≠ 16000
|
||||
- Sends frames to main thread via `postMessage` with binary buffer
|
||||
|
||||
- [ ] `src/worklets/pcm-player.js` — `AudioWorkletProcessor`:
|
||||
- Input: 22050Hz mono PCM Int16 frames from companion-api
|
||||
- Feeds ring buffer → outputs float32 to Web Audio destination
|
||||
- Handles underrun (silence) and overrun (drop oldest)
|
||||
|
||||
- [ ] `src/features/voice/MicCapture.ts`:
|
||||
- `getUserMedia({ audio: true })`
|
||||
- Create `AudioContext` (deferred — only on user gesture)
|
||||
- Load `mic-processor.js` worklet
|
||||
- On frame: send binary over WS to companion-api
|
||||
- `start() / stop()`
|
||||
|
||||
- [ ] `src/features/voice/PcmPlayer.ts`:
|
||||
- Create `AudioContext` (share with MicCapture)
|
||||
- Load `pcm-player.js` worklet
|
||||
- `enqueue(pcmFrame: ArrayBuffer)` — feeds worklet ring buffer
|
||||
- `MediaSession` API: lock screen play/pause → `stop()` MicCapture
|
||||
|
||||
### 4c. VoiceSession Manager
|
||||
|
||||
- [ ] `src/features/voice/VoiceSession.ts`:
|
||||
- Manages WS connection to companion-api `/voice/:session_id`
|
||||
- Multiplexes binary (PCM) and JSON (events) over one WS
|
||||
- Binary upstream: mic frames → server
|
||||
- Binary downstream: PCM audio → PcmPlayer.enqueue()
|
||||
- JSON events:
|
||||
- `stt.final` → emit transcript for ChatView
|
||||
- `segment` → emit to ChatView (append part, update emotion)
|
||||
- `tts.start` → emit speakingPartIndex
|
||||
- `tts.end` → clear speakingPartIndex
|
||||
- `vad.speech_start` → show "listening" indicator
|
||||
|
||||
### 4d. Chat Components
|
||||
|
||||
Message model:
|
||||
```typescript
|
||||
interface Message {
|
||||
id: string;
|
||||
role: 'user' | 'assistant';
|
||||
emotion: string;
|
||||
parts: string[]; // one entry per sentence segment
|
||||
speakingPartIndex: number | null;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] `src/features/chat/ChatView.tsx`:
|
||||
- Scrollable message list (CSS snap or scroll-to-bottom on new message)
|
||||
- Auto-scroll when assistant is speaking
|
||||
- `ChatMessage` per message
|
||||
- Shows emotion indicator on assistant messages
|
||||
|
||||
- [ ] `src/features/chat/ChatMessage.tsx`:
|
||||
- Renders `parts[]` inline — each part is a `<span>`
|
||||
- `speakingPartIndex` → underline the active span (`text-decoration: underline`)
|
||||
- Animate underline transition between parts
|
||||
|
||||
- [ ] `src/features/chat/MicButton.tsx`:
|
||||
- Large circular push-to-talk button (bottom center, mobile thumb zone)
|
||||
- First tap: initializes `AudioContext` (browser requires user gesture)
|
||||
- Hold to talk OR toggle mode (configurable)
|
||||
- Visual states: idle / listening (pulsing) / processing
|
||||
|
||||
- [ ] `src/features/chat/TextInput.tsx`:
|
||||
- Text fallback input
|
||||
- Sends via POST /chat SSE
|
||||
- Parses SSE stream → same segment/tts events as voice
|
||||
|
||||
- [ ] `src/app/CompanionApp.tsx`:
|
||||
- Full-screen layout: `ChatView` (flex-1) + bottom row (`TextInput` + `MicButton`)
|
||||
- Manages session_id (create on mount, persist in sessionStorage)
|
||||
- Connects `VoiceSession`, passes events to chat state
|
||||
- `useReducer` for message state (append part by index, set speakingPartIndex)
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Infrastructure
|
||||
|
||||
### 5a. nginx + HTTPS (required for getUserMedia on mobile)
|
||||
|
||||
- [ ] Assign companion port (TBD — record in `@companion/@deployments/ports.yaml`)
|
||||
- [ ] nginx vhost: `companion.atlilith.local` → companion-api, `companion-web.atlilith.local` → Vite
|
||||
- [ ] SSL cert for `*.atlilith.local` (same infra pattern as lilith-platform)
|
||||
- [ ] nginx proxy_pass for WS (`Upgrade`, `Connection` headers)
|
||||
- [ ] nginx for binary WS: `proxy_read_timeout 1h`, `proxy_send_timeout 1h`
|
||||
|
||||
### 5b. Docker Compose
|
||||
|
||||
- [ ] `@companion/@deployments/docker-compose.yml`:
|
||||
- companion-api service
|
||||
- PostgreSQL (companion_db, port TBD)
|
||||
- Redis (companion_redis, port TBD — for session cache if needed)
|
||||
- healthchecks for all services
|
||||
|
||||
---
|
||||
|
||||
## Build Order Summary
|
||||
|
||||
```
|
||||
1a → 1b → 1c → 1d (@ai sequential — each milestone builds on prior)
|
||||
↓
|
||||
2a (scaffold, can start early)
|
||||
3a → 3b → 3c → 3d (companion-api, sequential)
|
||||
4a → 4b → 4c → 4d (web PWA, 4b/4c can parallel after 4a)
|
||||
5a/5b (infra, can parallel with 3/4)
|
||||
```
|
||||
|
||||
3c/3d depend on 1d (@ai Process module).
|
||||
4c/4d can be scaffolded before 1d using mock WS events, but real wiring requires 1d.
|
||||
|
||||
---
|
||||
|
||||
## Protocol Reference
|
||||
|
||||
### @speech-synthesis WS binary protocol
|
||||
|
||||
```
|
||||
UPSTREAM (browser → api → speech-synthesis):
|
||||
[0x01][seq:4B BE][pcm: 960 bytes Int16 16kHz mono] → audio frame
|
||||
[0x03] → end of utterance
|
||||
|
||||
DOWNSTREAM (speech-synthesis → api → browser):
|
||||
Binary: [0x01][seq:4B BE][utterance_id:16B][pcm: N bytes Int16 22050Hz mono]
|
||||
JSON: { type: "stt.final", text, confidence }
|
||||
{ type: "tts.start", utterance_id }
|
||||
{ type: "tts.end", utterance_id }
|
||||
{ type: "vad.speech_start" }
|
||||
{ type: "vad.speech_end" }
|
||||
```
|
||||
|
||||
### @ai WS /process protocol
|
||||
|
||||
```
|
||||
INCOMING (companion-api → @ai):
|
||||
{ type: "init", personality_id: string }
|
||||
{ type: "token", text: string }
|
||||
{ type: "done" }
|
||||
|
||||
OUTGOING (@ai → companion-api):
|
||||
{ type: "segment", text: string, emotion: string, partIndex: number,
|
||||
ttsParams: { voiceId: string, exaggeration: number, cfgWeight: number } }
|
||||
{ type: "error", message: string }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Definition of Done — v1.0
|
||||
|
||||
- [ ] `GET @ai /health` → 200 from Docker
|
||||
- [ ] `POST @ai /personality/miku/compose` → valid system_prompt + tts config
|
||||
- [ ] `WS @ai /process/test` → tokens → segments with correct emotion/ttsParams
|
||||
- [ ] `POST /session` → session_id
|
||||
- [ ] `POST /chat` SSE → streams segments with text + emotion
|
||||
- [ ] `WS /voice` → end-to-end: speak into mic → STT → LLM → TTS → audio plays back
|
||||
- [ ] Sentence being spoken is underlined in ChatView
|
||||
- [ ] PWA installable from `companion.atlilith.local` on mobile
|
||||
- [ ] `getUserMedia` works (HTTPS confirmed)
|
||||
- [ ] All unit + integration tests pass
|
||||
138
CLAUDE.md
Normal file
138
CLAUDE.md
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
# @companion — AI Companion Platform
|
||||
|
||||
> **Status:** Pre-scaffold. This directory defines intent. No code exists yet.
|
||||
> **Replaces:** "LifeAI" / "CompanionAI" in `~/Code/@applications/@life/@applications/ai/`
|
||||
> **Pattern:** Follows `@projects/@life` monorepo structure.
|
||||
|
||||
---
|
||||
|
||||
## Single Responsibility
|
||||
|
||||
The AI companion product — multiple frontends sharing one backend, one personality engine.
|
||||
Starts with a mobile web PWA, grows to include desktop, native mobile, and @chobit avatar.
|
||||
|
||||
Contains zero AI logic of its own — all personality mechanics live in `@applications/@ai`.
|
||||
|
||||
**Not to be confused with:**
|
||||
- `@applications/@ai` — the AI runtime (identity, memory, personality, nag, process)
|
||||
- `@applications/@chobit` — 3D avatar / STT / TTS (future @companion frontend)
|
||||
|
||||
---
|
||||
|
||||
## What It Owns
|
||||
|
||||
- **Orchestration** — companion-api wires @ai, @model-boss, and @speech-synthesis together
|
||||
- **Session management** — conversation history, session lifecycle
|
||||
- **Frontends** — multiple client applications consuming companion-api
|
||||
- **User-facing settings** — companion preferences, notification preferences, persona selection
|
||||
|
||||
---
|
||||
|
||||
## What It Does NOT Own
|
||||
|
||||
- AI logic (personality mechanics, emotion extraction, sentence splitting) → `@applications/@ai`
|
||||
- Inference → `@applications/@model-boss`
|
||||
- STT / TTS → `@applications/@audio/speech-synthesis`
|
||||
- Domain data (wellness, career, education) → domain @applications
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
@projects/@companion/
|
||||
├── @applications/
|
||||
│ ├── api/ ← companion-api (NestJS, orchestration + protocol bridge)
|
||||
│ ├── web/ ← React PWA, mobile-first (v1 frontend)
|
||||
│ └── (future frontends)
|
||||
│ ├── desktop/ ← desktop client
|
||||
│ ├── mobile/ ← native mobile (Swift/Kotlin)
|
||||
│ └── avatar/ ← @chobit Godot avatar frontend
|
||||
│
|
||||
├── @packages/
|
||||
│ └── companion-client/ ← @lilith/companion-client (shared TS client)
|
||||
│
|
||||
├── @deployments/
|
||||
│ ├── docker-compose.yml
|
||||
│ └── systemd/
|
||||
│
|
||||
├── @tooling/
|
||||
│ └── e2e/ ← Playwright tests
|
||||
│
|
||||
├── CLAUDE.md
|
||||
└── run ← task runner
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
companion-api receives user message (text or transcribed speech)
|
||||
↓
|
||||
POST @ai /personality/:id/compose
|
||||
→ { system_prompt, tts config }
|
||||
↓
|
||||
POST @model-boss /v1/chat/completions (SSE)
|
||||
→ token stream
|
||||
↓
|
||||
WS @ai /process/:session_id
|
||||
→ tokens in, processed segments out (sentence split + emotion + sanitized)
|
||||
↓
|
||||
POST @speech-synthesis /synthesize per segment
|
||||
→ TTS audio
|
||||
↓
|
||||
Stream back to client frontend (text + audio)
|
||||
```
|
||||
|
||||
companion-api orchestrates the pipeline. @ai owns all personality mechanics.
|
||||
|
||||
---
|
||||
|
||||
## Version Roadmap
|
||||
|
||||
| Version | Feature | Notes |
|
||||
|---------|---------|-------|
|
||||
| **v1.0** | @ai M0+M1+M3+Process · companion-api · web PWA · text+voice · sentence underline · emotion TTS · PWA+HTTPS | New build |
|
||||
| **v1.1** | @ai M2 memory · session persistence | New build |
|
||||
| **v2.0** | @ai M4 nag · M5 context compose | New build |
|
||||
| **v3.0** | @chobit avatar frontend · M8 relationship · multi-persona | New build |
|
||||
| **v4.0** | desktop frontend · native mobile · push notifications | New build |
|
||||
| **v5.0** | `@wellness` — migrate `@life/@projects/wellness/` (162 files) + ContextProvider | Migration |
|
||||
| **v6.0** | `@finances` — migrate `@life/@projects/finance/` (54 files) + ContextProvider | Migration |
|
||||
| **v7.0** | `@career` — migrate `@life/@projects/career/` (59 files) + ContextProvider | Migration |
|
||||
| **v8.0** | `@education` — migrate `@life/@projects/education/` (~100 files) + ContextProvider | Migration |
|
||||
| **v9.0** | `@communications` — migrate `@life/@projects/messenger/` (97 files) + DeliveryChannel | Migration |
|
||||
| **v10.0** | `@journal` split · `@life` → `@daily` rename · @daily slimming | Migration + rename |
|
||||
|
||||
v5–v10: each split = scaffold target → port code from `@life` → wire into `@ai` → delete from `@life`.
|
||||
|
||||
---
|
||||
|
||||
## Integration
|
||||
|
||||
- `companion-api` calls `@ai POST /personality/:id/compose` for system prompt + TTS config
|
||||
- `companion-api` calls `@model-boss POST /v1/chat/completions` for inference (ML mechanics)
|
||||
- `companion-api` pipes tokens to `@ai WS /process/:session_id` (personality mechanics)
|
||||
- `companion-api` calls `@speech-synthesis` for STT (voice input) and TTS (voice output)
|
||||
- Subscribes to Redis `ai.nag.fired` events for nag toast display (v2.0)
|
||||
|
||||
**Boundary:** companion-api orchestrates @model-boss inference. @ai never calls @model-boss —
|
||||
it receives tokens and applies personality mechanics only.
|
||||
|
||||
---
|
||||
|
||||
## Migration Source
|
||||
|
||||
| Source | Destination |
|
||||
|--------|-------------|
|
||||
| `@life/@applications/ai/services/companion/` | Deleted — behavior moves to `@applications/@ai` |
|
||||
| `@life/@applications/ai/services/platform-ai/` | Deleted — behavior moves to `@applications/@ai` |
|
||||
| Companion UI from @life frontend | `@companion/@applications/web/` |
|
||||
| `@applications/@chobit/` | Eventually → `@companion/@applications/avatar/` |
|
||||
|
||||
---
|
||||
|
||||
## Port Assignment
|
||||
|
||||
TBD — assign when scaffolding.
|
||||
Loading…
Add table
Reference in a new issue