General Staff
Humans and AI should be perfectly specialized: the human does only what AI cannot, the AI (the General Staff) does everything it can.
Design
The naive build
The most naive instantiation: a scheduled loop + a harness with a set of skills + a memory database.
- Scheduled loop — the clock. Ticks on an interval; this is what makes it “alive” rather than invoked.
- Harness with skills — the Staff’s hands. A model loop with a set of tools/skills it can call.
- Memory database — what persists across ticks; the standing knowledge the loop reads at the top and writes at the bottom.
Two clocks
Liveness and the session are different clocks. Conflating them is the bloat trap.
- Hot loop (fast). A perpetual scheduler re-fires mortal ticks. Each tick boots fresh, reads the relevant slice of memory + current doctrine, does one bounded unit of work, writes results/feedback back, and dies. No context survives between ticks, so context rot is impossible. The system is perpetual; every process in it is mortal. Liveness lives in the scheduler; continuity lives in memory, not in a context window.
- Cold loop (slow). Off the hot path, keeper-agents maintain the store: triage, dedupe, prune stale facts, fold feedback in — truer not longer. (Reference implementation: this repo’s nightly heal.)
Fast loop acts; slow loop learns.
Orchestrator + ephemeral workers
One perpetual orchestrator; recursion for parallelism, not for liveness. The orchestrator spawns ephemeral worker instances per task (work a PR, review another, scan chat), collects results, writes back. Workers are disposable; the orchestrator is the single owner of memory + verification. Avoid N peer daemons — conflicting writes, compounding errors, a swarm with no commander.
Active memory: status-indexed, not time-indexed
A perpetual loop doesn’t live in days, it lives in task state. A daily log is time-indexed — so “what’s still open” smears across many entries, decays, and contradicts, and every tick has to re-derive open work from history. The fix: index by status, not time.
The principle: don’t cache the world, re-read it. The orchestrator re-reads reality (Slack, GitHub) every tick rather than trusting a stale snapshot. Persist only the task list and the done-record. Memory shrinks to its true minimum:
todo.md— what’s not done. A task lives here, verbatim, until it’s done. Freshness-independent: an open task cannot fall through a crack, because “not done” is its storage location. The task block carries its own context.done.md— the archive of completed tasks. The only real memory needed — so the orchestrator doesn’t redo finished work.tone.md— stable, hand-set voice for when the Staff speaks as the principal (PR comments, status, chat). The alignment layer for speech.
The loop collapses to: read todo.md → scan Slack/GitHub for new tasks + now-completable ones → do/assign one → move finished tasks to done.md → exit.
Two guardrails this model lives or dies on:
- Dedup on add. Each tick re-sees tasks it already captured. The add-step must check: already in
todoordone? If so, skip. (Truer, not longer.) - The world decides “done,” not the orchestrator. A task moves to
done.mdonly when reality confirms it (PR merged, message sent), re-checked each tick — never when the orchestrator merely believes it acted. Keeps “done” honest and re-derivable.
(Doctrine, if used, regenerates from todo.md — the brief is just a prioritized view of open tasks — not from logs.)
v1 spec (the dumb one)
Two things are free: time (it can run forever) and tokens (not my money). So v1 is deliberately wasteful — no event triggers, no budgeting, no smart scheduling. Brute-force ticking proves the skeleton. Optimize nothing.
Goal: one mortal tick, fired on a dumb interval, that reads memory → does one bounded thing → writes memory → exits. Nothing else.
Components (memory folder + 2 scripts):
-
memory/— plain markdown files. The whole database. (No DB, no embeddings. Files.)todo.md— open tasks, verbatim, until done.done.md— completed tasks (so it doesn’t redo them).tone.md— hand-set voice. Stable. You write this once.
-
tick.sh(hot loop — one tick) — does exactly:cat memory/todo.md memory/tone.mdinto a prompt.- Invoke PI (the harness) with that prompt + the skill set, told to: scan Slack/GitHub for new tasks (dedup vs todo+done before adding) and now-completable ones; do/assign ONE; move anything the world confirms finished from
todo.mdtodone.md. - Exit. The process dies. No state held between ticks; the world + the two files are the only state.
-
clock— the perpetual scheduler. v1 = the dumbest thing that re-fires:txtwhile true; do ./tick.sh >> clock.log 2>&1; sleep 60; done(Upgrade to
launchd/crononce the tick works. Don’t start there.)
Explicitly NOT in v1: ephemeral worker fan-out, event triggers, token budgets, verification gates, doctrine regeneration, safety on unattended writes. All strict upgrades after one tick runs end-to-end.
Done = the clock runs, ticks fire on the interval, and across ticks todo.md gains real tasks scanned from Slack/GitHub and done.md gains tasks the world confirms finished — all without you touching it. That’s proof of life. Everything else is iteration.