From Systems and Practice

Building with AI

Question I wanted to answer: why do some people get so much more out of coding with AI than others, on the same models? I build this way a lot, often against a clock. What I concluded is that prompt wording barely matters and the procedure does. So I turned it into a procedure. This is the loop I run, in order.

The mindset under all of it: the AI is a fast junior who has read everything and remembers nothing about my problem. My job is to be the lead, not the typist.

Step 1 — Understand the spec (product)

Before any code. Specs are always underspecified; the model fills gaps with its defaults, not mine. So the first prompt makes it prove it understood the product and surface the holes, then stop.

Find the empty case, the error case, the huge input, what’s out of scope.
Every gap left open is a decision silently handed to the model. Cheapest to fix here.

The prompt I keep in Raycast (;spec):

txt

Before writing any code, act as my product lead and make sure we actually understand what we are building. Do not propose a solution yet.

Do this:
1. Restate what we are building in 2-3 plain sentences, in your own words, so I can confirm you understood it.
2. Tell me who it is for and what "done" looks like concretely from the user's side.
3. List every place the spec is underspecified or ambiguous: the empty case, the error case, the largest realistic input, anything left implicit. For each, state the assumption you would make by default.
4. List what is explicitly OUT of scope so we don't gold-plate.
5. Ask me the 2-4 questions whose answers would most change the build.

Stop there and wait for my answers. Do not design or code yet.

Here is the spec:

Step 2 — Make the technical plan

Once the product is clear, get the architecture down before any implementation. This is where I make the design decisions, not the model. It plans, I adjust, then it types my plan.

The prompt (;plan):

txt

Now that we understand the spec, make a plan to implement the product. No code yet.

Lay out how you'd build it, in a clear order I can follow. Then flag anything I should decide: technical risks, tradeoffs, or choices that need my call before you start.

Show me the plan and wait for my approval before writing any code.

Step 3 — Build it, one verifiable chunk at a time

Now the model types the plan from Step 2. The only call left is sizing each chunk.

One-shot if I can hold the whole result in my head and check it in one read.
Decompose the moment it’s bigger: build a layer, verify, next layer, checkpoint at every seam.
Unit = what I can review at once, not what the model can generate.
(Failed both ways: split what should’ve been one prompt and wasted time gluing; handed over a blob too big to verify and couldn’t find the bug.)

Step 4 — Run the verify-and-redirect loop (the core)

For every chunk it returns:

Read it for correct and is-it-what-I-asked, not for style.
Run it. Plausible is not correct; the bug hides three steps downstream at 10x the cost.
Accept, or redirect precisely: name the exact problem and exact fix. (“Empty case returns null, spec wants an empty list, change that, leave the rest.”)
Repeat.

Vague redirects (“that’s wrong, fix it”) make it guess differently wrong. Speed here is loop count, not prompt quality.

Step 5 — Clean the context when answers drift

If quality drops mid-build, the context got polluted, not the model dumber.

Restate where things actually stand.
Drop the dead ends and abandoned approaches.
Point it only at the files that matter now.

Step 6 — Narrate decisions the whole way

Out loud, even alone: why this order, why one-shot, why this is out of scope.

The narration is the thinking, and the thinking is the only part that’s mine. The code is a commodity; the call about what to build and how I know it’s right is not.

Step 7 — The cleanup pass

Once it works, it’s not done. The verify loop optimizes for correct, not clean, so the first working version is usually overbuilt: duplicated logic, dead branches, names that lie, abstractions added for cases that never come. I run one refinement pass at the end against a fixed bar of what good code looks like.

What I’m asking it to find:

duplication that should be one function
dead code, unused vars, leftover scaffolding from abandoned approaches
names that don’t match what the thing does
premature abstraction (a layer for one caller) and its opposite (a 60-line function that’s three functions)
error/empty/edge cases the happy path skipped
comments that restate the code instead of explaining the why

This is the prompt I keep in Raycast and fire at the end:

txt

Review the code you just wrote as a senior engineer doing a final cleanup pass before merge. Do not add features. Judge it against what good code actually looks like:

- single responsibility, no duplication (same logic in two places = extract it)
- no dead code, unused variables, or leftover scaffolding from earlier attempts
- names that say exactly what the thing is and does
- the right level of abstraction: no layer that has one caller, no function doing three jobs
- every edge case handled (empty, error, large, null), not just the happy path
- comments are load bearing and explain WHY, never restate WHAT the code already says
- consistent with the conventions already in this file/repo

For each issue: name the file and line, say what's wrong in one sentence, and show the fix. List them worst-first. If something is genuinely fine, don't invent problems. Then ask me which to apply before changing anything.

The last line matters: it reviews first and waits, so I stay the lead on the cleanup too instead of letting it rewrite half the file unprompted.

The one rule under all of it

Stay the lead. Speed comes from judgment, not typing, and judgment is the one thing the model lacks. It’s faster than me at producing code and worse at knowing which to produce. Getting good at this was never about prompting.