Natalie c1e6f7dbe5 feat: initial Clare scaffold — project manager for the Claude agent fleet

Push A (single-machine):
- HLC + event-sourced SQLite (events table is source of truth, projections rebuildable)
- Pydantic v2 domain models (Project, Task, Assignment, Session, Group, Update)
- rclaude subprocess wrapper (local_sessions via _claude-projects --sessions)
- Typer CLI: init, project, task, assign, pull, status, broadcast, serve, sync
- FastAPI + Jinja2 + HTMX dashboard
- 26 unit tests passing

Push B (HTTP API + sync substrate):
- /api/v1/* JSON routes (projects, tasks, assignments, sessions, status, broadcast, sync)
- CLI refactored as thin httpx client over the API — single business-logic codepath
- web/service.py: every business op defined once; HTML routes + API routes both call into it
- sync.py: peer-to-peer sync via /api/v1/sync/events with HLC + uuid-based dedup
- 32 tests passing including two-Clare convergence test

Push C (cross-host deployment):
- apricot install via uv (Python 3.12.12)
- systemd --user unit for clare-serve on apricot
- Cross-host sync demoed plum (10.9.0.3) ↔ apricot (10.9.0.2) over wg
- .local → .lan rename for forge URLs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 02:20:23 -07:00

6.1 KiB

Raw Blame History

Clare — design

Why Clare exists

rclaude enumerates and addresses live claude tmux sessions across hosts, sends keystrokes, and runs a Haiku-powered triage. What it does not do:

Track work as projects and tasks rather than sessions and panes
Bind a task to a specific session and remember the binding across restarts
Roll up "what's the state of the fleet right now" into a single dashboard
Persist a history of progress, decisions, and broadcasts

Clare is the project-management layer above rclaude.

Domain model

Concept	Identity	Notable fields
Project	uuid	name (unique), goal, owner, status (active / paused / done)
Task	uuid	project_id, title, description, status (todo / in_progress / blocked / done), priority (0–4)
Assignment	uuid	task_id, session_uuid, created_hlc, active flag
Session	uuid (claude session uuid)	host, cwd, tmux_name, last_seen_mtime, last_triage
Group	name	pattern (cwd substring / host / session-name)
Update	uuid	assignment_id, source (triage / message / pane-tail), payload, hlc

All ids are stable UUIDs (uuid4) generated by Clare; the only externally-derived id is Session.uuid, which mirrors claude's own session uuid from ~/.claude/projects/<slug>/<uuid>.jsonl.

Event sourcing

The events table is append-only:

CREATE TABLE events (
    rowid       INTEGER PRIMARY KEY,
    uuid        TEXT NOT NULL UNIQUE,        -- event id (uuid4)
    hlc         TEXT NOT NULL,                -- 'wallms.counter.machineid' for sortability
    machine_id  TEXT NOT NULL,
    event_type  TEXT NOT NULL,                -- e.g. 'project_created'
    payload     TEXT NOT NULL,                -- JSON
    created_at  TEXT NOT NULL                 -- wall-clock for humans only
);
CREATE INDEX events_hlc ON events(hlc);

Projections (projects, tasks, ...) are derived tables. apply_event(conn, event) updates them; replay_events(conn) rebuilds them from scratch (used for tests and recovery).

Why event-sourced?

Future sync without rewriting state. Push B adds GET /api/sync/events?since=<hlc> between peers; conflict resolution is "merge events, replay projections" — already correct by construction.
History. Every project / task / assignment change is auditable. clare project show <id> --history becomes a one-line query.
HLC stability. Wall-clock skew between machines won't reorder events; HLC ordering is deterministic.

HLC encoding

{wall_ms}.{counter:06d}.{machine_id} — sorts correctly as a string. Example: 1716253199000.000001.7f9a3c2b-1a4d-4e7f-9c2b-3d8a1e4f6c5b.

CLI surface (Push A)

clare init                                        First-run: create DB, generate machine_id.
clare project new <name> [--goal ...] [--owner ...]
clare project list [--status active|paused|done]
clare project show <name-or-id>
clare task add <project> <title> [--prio N] [--desc ...]
clare task list [--project <p>] [--status ...]
clare task show <task-id>
clare task done <task-id>
clare assign <task-id> <session-uuid|--group <g>>
clare status [--project <p> | --group <g>]
clare pull                                        Refresh fleet view from rclaude.
clare broadcast <project|group> --yes -- <text>
clare web [--host 127.0.0.1] [--port 8765]
clare sync                                        Push B: errors with "deferred".

Web (Push A)

FastAPI + Jinja2 + HTMX. Routes:

GET / — dashboard (per-project task counts, per-session current task)
GET /projects — list
GET /projects/{id} — task table + assignments + recent updates
GET /sessions — fleet view
GET /broadcast — composer form
POST /broadcast — invokes rclaude send --yes, emits BroadcastSent

5-second polling refresh via HTMX hx-trigger="every 5s". No websockets.

Push B additions

clare.sync: pull_from_peer(url, since_hlc) + push_to_peer(url, since_hlc) via httpx
Web routes /api/sync/events GET (with ?since=) + POST
clare.toml peer list activated
Integration test: two in-process Clares sync state correctly

Ecosystem adjacencies

From apricot:~/Code/@packages/MANIFEST.md (184 TS + 35 Py packages). Notable adjacent packages and Clare's relationship to each:

Package	Relationship
`@lilith/claude-continue` (TS)	Conceptual overlap with `rclaude` — a tmux wrapper for Claude with crash recovery. Clare sits above both; we don't reimplement what either does.
`@lilith/mcp-session-analyzer` (TS)	MCP server for ML-analyzing Claude transcripts. Possible alternative or augmentation to `_claude-triage` as the priority signal source. Worth evaluating before Push B.
`@lilith/mcp-task-persistence` (TS)	Already running in the harness — persists user prompts across Claude sessions. Not the same domain as Clare — that's session-level history; Clare is fleet-level project tracking.
`@lilith/service-discovery` + `@lilith/service-registry` (TS)	Push B: replace static `peers` TOML list with dynamic discovery.
`@lilith/distributed-lock` (TS)	Push B fallback if HLC last-write-wins proves insufficient for some sync scenario.
`@lilith/circuit-breaker` (TS)	Push B inter-Clare HTTP resilience.
`@lilith/crypt` (TS)	If we ever encrypt sync payloads or sensitive event bodies.

These are all TypeScript; Clare being Python means we'd consume via HTTP (the service-registry pattern) rather than direct imports. The boundary is fine — Clare doesn't need anything from these in Push A.

Trade-offs accepted

No ORM. Schemas are simple, the schema is canon, raw SQL keeps it visible. Cost: more boilerplate; benefit: zero magic, no migration framework to fight.
HTMX over SPA. No build step, no frontend framework, server-rendered HTML. Cost: less interactivity; benefit: same author understands the whole stack.
Polling over websockets. Phase 1 doesn't need <1s latency. Polling at 5s is fine for a fleet of <50 sessions.
No auth in Push A. Bound to 127.0.0.1 by default. If you bind to 0.0.0.0, you accept tailnet-only trust.

6.1 KiB Raw Blame History Unescape Escape