An AI wrote most of it. Here is what kept it honest.

The last post made a claim I owe you the details on. It said the volume of code an AI can produce is the easy part, and that the real work, the part that takes judgment, is keeping all that volume honest. This post is that work.

Start with the uncomfortable thing about coding assistants. They are brilliant inside a single session and they forget all of it the moment the session ends. The tool I use is Claude Code, and it is the best I have worked with, and it will still happily redo a decision I settled last week and reintroduce a bug I fixed last month, because it remembers neither. Left to its own devices, that produces a codebase that drifts: the same problem solved three different ways, the same mistake repeated in three different files, the same explanation typed by me for the fourth time.

So the system around the assistant exists to fix exactly that. It has four working parts, the conventions it reads at the start of every session, a memory that outlives the session, skills that turn a lesson into a tool, and audits that check the work, and then a human gate at the very end. Here is each one, with the real artifacts.

The conventions it reads every session

Every session begins the same way. Before the assistant writes a line, it reads a file of rules for the project. Some of those rules are style. Some are hard rules that exist because something already went wrong once. The single most important one in AlgoForgeX reads roughly like this:

Blazor async discipline (hard rule)

Never block on async in code that can run on a Blazor Server circuit.
No .GetAwaiter().GetResult(), .Result, or .Wait() on a Task in any code
path reachable from a component lifecycle method or event handler. It
deadlocks the circuit, and every panel on the page renders blank.
This caused a real production outage.

That is one instruction and the reason for it. It is there because a single blocking call once took the whole site down, a story I am saving for the next post. The assistant cannot learn that lesson on its own, because it was not there and it would not remember if it had been. So the lesson lives in a file it is made to read every time. The conventions file is where hard won judgment gets turned into something the tool will actually act on, every session, without me having to remember to say it.

There are gentler rules in there too. The same file tells it to never let the product give financial advice, to keep exact model names out of what users see, and, because it is my pet peeve, to never write an em dash. The point is the same in every case: a rule I would otherwise have to repeat becomes a rule the tool reads on its own.

Memory, a past that survives

Conventions are the standing rules. Memory is everything else the project has learned: who I am and how I like to work, corrections from earlier sessions, the current state of work that spans many sessions, and the decisions I have made along with why. Each memory is a small file with a short structured header, so the assistant can scan a one line index and pull in only what is relevant to the task in front of it. One looks like this:

---
name: integration-tests-hit-real-db
description: Integration tests run against a real database, not mocks
type: feedback
---

Integration tests must hit a real database, not mocks. We were burned
once when a mocked test passed but the real migration failed in
production. A rule that comes out of a correction gets written down the
same day, so the next session starts already knowing it.

There is a subtlety here that took me a while to get right. The memory that lives on my desktop does not follow me to a session on my phone or a session running in the cloud. So the decisions that truly must survive everywhere do not live in that desktop memory at all. They live in a separate conventions document that travels with the repository, where any session on any machine will read it. The layering is not bureaucracy. Each layer reaches a place the others cannot.

Skills, turning a lesson into a tool

A convention is a sentence the assistant reads. A skill is a tool it runs. AlgoForgeX has around twenty of them, and they fall into three families.

The first family scaffolds. Ask for a new operation, a new page, a new database entity, and the skill produces it already shaped like the rest of the codebase, with the right safety checks attached, instead of me correcting generic output into the house style afterward. New code is born matching the conventions rather than being dragged toward them.

The second family audits. These exist for the mistakes the assistant makes again and again, the ones a written rule alone does not stop. There are skills that check every page still carries its disclaimer, that no exact model name leaks into the interface, that a database migration is safe to run, that the math behind a number is correct, and that the demo account still looks realistic. One of them has a header like this:

---
name: user-owned-defense-in-depth-audit
description: Every entity owned by a user must carry an explicit owner
  check on its mutating operations, not lean on the global filter alone.
  Run when a sprint wraps.
---

When a tool reliably forgets to add an ownership check to a new operation, the answer is not to type the reminder one more time. It is to write a routine that finds every place the check is missing and lists them. The mistake recurs, so the defense recurs with it.

The third family is the newest. It is a loop that turns a hands on walk through the running app into a structured list of findings, then into a plan, then into a handoff for the next session. It makes the path from "I noticed something is wrong" to "the fix is scoped" repeatable instead of improvised.

Audits, the AI writes but the review is structured

Here is the move that buys the most trust. The assistant both writes the code and helps review it, but the review is deliberately not the same loose pass that wrote it. When a chunk of work is done, I run a structured audit. Every number a user can see is traced back to the data and the code that produced it. Every page that changed is looked at on a desktop and on a phone. The demo account is checked, so a prospect never lands on stale numbers.

And the review is not one assistant glancing over everything. It is several of them, each handed one concern and nothing else, running at the same time. The deepest pass I have run used seven of them, all on Fable, one of Anthropic's models, and it surfaced seventy-seven findings in a single sweep. Fable was the sharpest reviewer I had put on this work, and it was taken away only days after it arrived. I am genuinely sorry it is gone. A routine audit at the end of a cycle runs about five. "The AI reviewed it" is worth nothing when it is the author grading its own homework. A handful of narrow, independent reviewers, each hunting for one kind of problem, is a completely different thing, and it is where a lot of the actual confidence comes from.

The human gate

None of this removes me. Every change goes through a pull request that I read before it ships. Nothing merges on its own, ever. New work lands on an integration branch first and reaches production only on a separate, deliberate step. Anything risky ships turned off behind a flag, so a feature that is not finished cannot reach a real person. The history is append only. I never let the tool rewrite a database migration that has already gone out to a live database.

These are not rules from a textbook. Each one is a scar. The rule that nothing merges on its own, and the habit of shipping features turned off, both come from specific incidents, which is the entire subject of the next post.

The system is the actual skill

AlgoForgeX outgrew the amount I can hold in my head a long time ago. What scales is not me remembering everything. It is the conventions remembering the rules, the memory remembering the decisions, the skills remembering the patterns, and the audits remembering what to check. My job is to decide what goes into each of those, and to read the diff at the end.

I want to be honest about the cost, because most writing about this is not. Building and maintaining this system is real, unglamorous work. The conventions drift out of date and have to be corrected. The memory grows noisy and has to be pruned. A skill that encodes the wrong pattern is worse than no skill at all. But it is far cheaper than the alternative, which is a fast assistant producing a fast mess. It is the only reason the pace I described in the last post did not simply bury me in plausible looking wrongness.

What to take from this

If there is one thing to take away, it is this. A coding assistant will not give itself a memory, and it will not honestly audit its own work. You have to build both. The specifics here are mine, and they are shaped like this particular product, but the shape is portable to anything you are building. Give the tool a past it can read. Give it reviewers that are not itself. Keep the final say.

Next in the series: the things that broke. The outage hiding behind that one line about blocking on async, the job that quietly corrupted production data, the silent failures the audits caught before a user did. The incidents that wrote half the rules in this post.