claire/DESIGN.md
Natalie c1e6f7dbe5 feat: initial Clare scaffold — project manager for the Claude agent fleet
Push A (single-machine):
- HLC + event-sourced SQLite (events table is source of truth, projections rebuildable)
- Pydantic v2 domain models (Project, Task, Assignment, Session, Group, Update)
- rclaude subprocess wrapper (local_sessions via _claude-projects --sessions)
- Typer CLI: init, project, task, assign, pull, status, broadcast, serve, sync
- FastAPI + Jinja2 + HTMX dashboard
- 26 unit tests passing

Push B (HTTP API + sync substrate):
- /api/v1/* JSON routes (projects, tasks, assignments, sessions, status, broadcast, sync)
- CLI refactored as thin httpx client over the API — single business-logic codepath
- web/service.py: every business op defined once; HTML routes + API routes both call into it
- sync.py: peer-to-peer sync via /api/v1/sync/events with HLC + uuid-based dedup
- 32 tests passing including two-Clare convergence test

Push C (cross-host deployment):
- apricot install via uv (Python 3.12.12)
- systemd --user unit for clare-serve on apricot
- Cross-host sync demoed plum (10.9.0.3) ↔ apricot (10.9.0.2) over wg
- .local → .lan rename for forge URLs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:20:23 -07:00

6.1 KiB
Raw Blame History

Clare — design

Why Clare exists

rclaude enumerates and addresses live claude tmux sessions across hosts, sends keystrokes, and runs a Haiku-powered triage. What it does not do:

  • Track work as projects and tasks rather than sessions and panes
  • Bind a task to a specific session and remember the binding across restarts
  • Roll up "what's the state of the fleet right now" into a single dashboard
  • Persist a history of progress, decisions, and broadcasts

Clare is the project-management layer above rclaude.

Domain model

Concept Identity Notable fields
Project uuid name (unique), goal, owner, status (active / paused / done)
Task uuid project_id, title, description, status (todo / in_progress / blocked / done), priority (04)
Assignment uuid task_id, session_uuid, created_hlc, active flag
Session uuid (claude session uuid) host, cwd, tmux_name, last_seen_mtime, last_triage
Group name pattern (cwd substring / host / session-name)
Update uuid assignment_id, source (triage / message / pane-tail), payload, hlc

All ids are stable UUIDs (uuid4) generated by Clare; the only externally-derived id is Session.uuid, which mirrors claude's own session uuid from ~/.claude/projects/<slug>/<uuid>.jsonl.

Event sourcing

The events table is append-only:

CREATE TABLE events (
    rowid       INTEGER PRIMARY KEY,
    uuid        TEXT NOT NULL UNIQUE,        -- event id (uuid4)
    hlc         TEXT NOT NULL,                -- 'wallms.counter.machineid' for sortability
    machine_id  TEXT NOT NULL,
    event_type  TEXT NOT NULL,                -- e.g. 'project_created'
    payload     TEXT NOT NULL,                -- JSON
    created_at  TEXT NOT NULL                 -- wall-clock for humans only
);
CREATE INDEX events_hlc ON events(hlc);

Projections (projects, tasks, ...) are derived tables. apply_event(conn, event) updates them; replay_events(conn) rebuilds them from scratch (used for tests and recovery).

Why event-sourced?

  1. Future sync without rewriting state. Push B adds GET /api/sync/events?since=<hlc> between peers; conflict resolution is "merge events, replay projections" — already correct by construction.
  2. History. Every project / task / assignment change is auditable. clare project show <id> --history becomes a one-line query.
  3. HLC stability. Wall-clock skew between machines won't reorder events; HLC ordering is deterministic.

HLC encoding

{wall_ms}.{counter:06d}.{machine_id} — sorts correctly as a string. Example: 1716253199000.000001.7f9a3c2b-1a4d-4e7f-9c2b-3d8a1e4f6c5b.

CLI surface (Push A)

clare init                                        First-run: create DB, generate machine_id.
clare project new <name> [--goal ...] [--owner ...]
clare project list [--status active|paused|done]
clare project show <name-or-id>
clare task add <project> <title> [--prio N] [--desc ...]
clare task list [--project <p>] [--status ...]
clare task show <task-id>
clare task done <task-id>
clare assign <task-id> <session-uuid|--group <g>>
clare status [--project <p> | --group <g>]
clare pull                                        Refresh fleet view from rclaude.
clare broadcast <project|group> --yes -- <text>
clare web [--host 127.0.0.1] [--port 8765]
clare sync                                        Push B: errors with "deferred".

Web (Push A)

FastAPI + Jinja2 + HTMX. Routes:

  • GET / — dashboard (per-project task counts, per-session current task)
  • GET /projects — list
  • GET /projects/{id} — task table + assignments + recent updates
  • GET /sessions — fleet view
  • GET /broadcast — composer form
  • POST /broadcast — invokes rclaude send --yes, emits BroadcastSent

5-second polling refresh via HTMX hx-trigger="every 5s". No websockets.

Push B additions

  • clare.sync: pull_from_peer(url, since_hlc) + push_to_peer(url, since_hlc) via httpx
  • Web routes /api/sync/events GET (with ?since=) + POST
  • clare.toml peer list activated
  • Integration test: two in-process Clares sync state correctly

Ecosystem adjacencies

From apricot:~/Code/@packages/MANIFEST.md (184 TS + 35 Py packages). Notable adjacent packages and Clare's relationship to each:

Package Relationship
@lilith/claude-continue (TS) Conceptual overlap with rclaude — a tmux wrapper for Claude with crash recovery. Clare sits above both; we don't reimplement what either does.
@lilith/mcp-session-analyzer (TS) MCP server for ML-analyzing Claude transcripts. Possible alternative or augmentation to _claude-triage as the priority signal source. Worth evaluating before Push B.
@lilith/mcp-task-persistence (TS) Already running in the harness — persists user prompts across Claude sessions. Not the same domain as Clare — that's session-level history; Clare is fleet-level project tracking.
@lilith/service-discovery + @lilith/service-registry (TS) Push B: replace static peers TOML list with dynamic discovery.
@lilith/distributed-lock (TS) Push B fallback if HLC last-write-wins proves insufficient for some sync scenario.
@lilith/circuit-breaker (TS) Push B inter-Clare HTTP resilience.
@lilith/crypt (TS) If we ever encrypt sync payloads or sensitive event bodies.

These are all TypeScript; Clare being Python means we'd consume via HTTP (the service-registry pattern) rather than direct imports. The boundary is fine — Clare doesn't need anything from these in Push A.

Trade-offs accepted

  • No ORM. Schemas are simple, the schema is canon, raw SQL keeps it visible. Cost: more boilerplate; benefit: zero magic, no migration framework to fight.
  • HTMX over SPA. No build step, no frontend framework, server-rendered HTML. Cost: less interactivity; benefit: same author understands the whole stack.
  • Polling over websockets. Phase 1 doesn't need <1s latency. Polling at 5s is fine for a fleet of <50 sessions.
  • No auth in Push A. Bound to 127.0.0.1 by default. If you bind to 0.0.0.0, you accept tailnet-only trust.