Back to blog

25 May 2026

Claude Code in production: lessons from the field

A candid field report on Claude Code in production: excellent on bounded execution and refactors, much weaker whenever context, priorities, or guardrails stay implicit.

Claude Code is not an engineer replacement. It is a very specific force multiplier.

I use Claude Code on real client work, not on toy repos designed to make the tool look smart. That matters, because my view of it is not a product review. It is a field report. In the right setup, Claude Code is genuinely useful. In the wrong setup, it mostly creates the feeling of speed. As an AI coding agent, its value is not that it writes code instead of an engineer. Its value is that it can absorb a slice of concrete execution that normally slows down an engineer who already knows what the right move is.

That distinction matters for anyone building production AI systems. A lot of the conversation around AI-assisted development still asks the theatrical question: can the tool ship by itself? I think that is the wrong standard. The better question is: on which classes of work does it materially reduce the time between intent, implementation, and verification without adding stupid risk? That is the lens through which Claude Code becomes interesting to me, and it is also the lens that makes its limits obvious very quickly.

Where it is clearly strong: the middle of the work

Claude Code is not where I would start for a blank-sheet architecture decision or for a product tradeoff that is still mostly implicit. Where it does shine is what I think of as the middle of the work: following an error trace, understanding an established convention, applying the same pattern across multiple files, fixing a batch of type issues, updating tests, or preparing a localized refactor without getting lost. On those tasks, the AI coding agent is often faster than I am on the mechanics, as long as the repo is legible and the success criteria are clear.

Its most underrated quality is persistence. If I give it the right paths, constraints, verification commands, and non-goals, Claude Code can stay inside an execution loop quite reliably. It reads, proposes, edits, tests, and comes back with something I can audit. For me, the best return from AI-assisted development is not "write the whole feature from scratch." It is compressing the repetitive space between a decision that has already been made and a change that is actually ready to merge.

Where it breaks: ambiguity, bad context, and inflated judgment

The tradeoff is sharp. Claude Code breaks down when the useful context is not explicit. If the real rules live in a PM's head, in an old Slack thread, in contradictory tickets, or in a messy branch, it starts improvising. And when a system like this improvises, it does it with too much confidence. I have seen Claude Code cleanly patch the local symptom of a problem while missing the systemic cause. If the prompt only says "fix the build," it may choose the fastest way to silence the error instead of the right way to repair the system.

Another failure mode is bloated context. People often assume more files, more logs, and more history will automatically help. In practice, noisy context degrades prioritization fast. Claude Code gets worse when you dump the repo on it without a signal about what actually matters. In production AI work, I treat context as a scarce resource. I would rather hand it a compact dossier with the objective, source-of-truth files, invariants, validation commands, and stop rules. Less noise, more structure.

The prompts that work look more like execution tickets than conversations

The best prompts I write for Claude Code almost never look like open-ended conversation. They look like execution tickets. I include five things: the target outcome, where to look first, what must not break, how to verify success, and what it is not allowed to touch. That structure changes the quality of the result. It forces the human to clarify the task, and it gives the tool something operational rather than vaguely aspirational. When the task is larger, I split it into short loops: inspect, plan, implement one slice, verify, repeat.

What this changes in my AI engineer workflow

The deepest shift is not that I type less. It is that I frame work much more explicitly. I keep repo landmarks cleaner, I make acceptance criteria more concrete, and I turn more tribal knowledge into checklists. My AI engineer workflow now includes more context design and verification management than it used to. I would not hand Claude Code a critical architecture decision without supervision, but I do use it every week because it is already worth it on a large share of execution-heavy work.