Fast feedback works very well for Claude Code harnesses.
Opus 4.6 is very good at hypothesising about how to do something, but quickly conflates its hypothesis with fact and overcommits to the (possibly) wrong approach.
If you provide DX that lets it fail quickly, accumulate context and hypothesise again, it can solve outsized problems.
I’m interested in less planning and more *steering* using fast feedback.
Author: Eli Mydlarz
At Thoughtworks I participated in and ran this exercise where 2 teams do civil engineering projects with dry spaghetti and putty. Team Big Batch would build their tower (or whatever) in 1 hour, and Team Small Batch would build their tower in 20 minutes, and then get to do it again, and then get to do it again. Guess who wins?
My software engineering career benefitted a lot from this principle. I learned fast pairing with great engineers, but also by trying and failing in DX that provided very fast feedback. In our strongest teams, we put great care into building and improving our feedback loops. As an organisation, we built our own tools to do it.
Now I’m seeing the principle in action with coding agents. With a harness (that’s what we call DX for coding agents, I think) that provides fast feedback, coding agents can try, fail, and accumulate context until they succeed. Sometimes they get lucky, but typically success came because somewhere in that accumulated context there were answers – or even insights. What should we do with them? Note that you and I have benefitted from our insights over decades.
One project ended just as our team was reaching an incredible velocity. Everything had come together and we were absolutely smashing it, then we had to go home. What an accumulation of context we must have had when we threw away all our leverage – but that was business. We shouldn’t do it that way again with agents.
Since I started posting about how token speed is the easiest optimisation and maintains developer engagement (if that’s a thing you still want!), OpenAI started offering Codex at 1200 token/sec and Anthropic added a /fast mode to Claude where you pay extra to run Opus faster.
I’m really interested in AFK dev and working on things in that space at the moment, and the effort going into autonomy is still very important. But I do wonder how we would feel about the effort going into parallelisation and overnight runs if we had say 2000 token/sec inference with our favourite models.
Another typical coding agent problem is adding and adding and adding until the problem is solved, then walking away without really understanding what solved the issue, or building that understanding into the codebase. A good question to ask, or incorporate into your instructions and rules as appropriate for your agent:
“Can this be achieved by subtraction or simplification, rather than addition?”
Codex and Claude are way too defensive. I think this is a good time to talk about defensive programming.
Say I believe some scenario is impossible, and should it be true there will be an error – a console error, request failure, something noisy – but life will go on.
This is actually good. I am probably not wrong, so there is no reason to complicate my code. If I am wrong, great! Through failing fast (and good observability) I will discover my wrongness and we will all be better off for it. The effects of being wrong in software are cumulative and sometimes fatal, so we want to uncover wrongness early.
Building a good understanding of how data actually flows through your system is important. You should not just guess. You also should not defend against everything by default. You should actually check, be confident that you know, and be eager to discover that you are wrong.
The worst thing is taking a wild guess at how some unexpected edge case should be handled, when you really have no idea why that would have happened, or what the downstream implications of your handling will be. It is routine for coding agents to mishandle, downgrade to warning, or just completely swallow errors that reveal critical misunderstandings and concordant design problems in your (their?) software.
Coding agents love defensive programming. There could be many reasons for this, but two come to mind:
– They just don’t want it to crash, like the early JS mentality.
– They don’t want to “miss an edge case”, perhaps reflective of a lot of training data produced by people who didn’t want to “miss an edge case”.
When you vibe code (not agentically engineer, or whatever we’re calling it) and everything looks amazing, how much of the implementation is just failing quietly because of defensive programming? Perhaps it helps explain the early-euphoria-hard-crash we saw many vibe coders go through.
Instruct your coding agents to fail fast and loud.
Tokens/sec is the easiest optimisation https://lnkd.in/gVZarwKM
You are at the keyboard with your coding agent building the most important thing as fast and as well as you can. But the coding agent is slow, and you want to be as productive as possible, so you do the things you know how to do. Get the agent running overnight (effort to enable increased autonomy), parallelise (effort to enable separation of work, integration of separate git trees). This actually doesn’t help with your goal of getting the most important thing done as fast and as well as you can. It lengthens feedback loops, increases cognitive overhead – all the old cycle time vs throughput arguments apply.
So what can you actually do to achieve your goal? More tokens/sec! Just keep doing what you’re doing, but faster. If it’s fast enough for you to stay directly engaged with your highest priority work at your preferred level of abstraction, you will find it very satisfying.
I don’t want you to think I’m for or against Ralph loops etc. I’m exploring and learning like everybody. But I think we are missing the easiest optimisation, and I do worry that we are introducing unnecessary complexity as workarounds instead.
So now that I can finally get more tokens/sec (Cerebras team, help me!) I’ll go back from Claude Code CLI to Codex CLI for a little bit, and lean back into ADDD (Agentic Dictator Driven Development) and see how I like it at this speed.
Soon I’ll try to talk about _refinement_ more. We are all very focused on initial dev which is very exciting at the moment, but the euphoria of a quick AI build can sometimes be short-lived.
I do a lot of experiments, and I know I should share them more. My latest one is a small CLI tool for scheduling agent runs with GitHub Actions.
It’s writing GitHub Actions config for you, and leveraging your existing OpenCode config. It’s simple but enables a lot:
- You could push target test trees only, and trigger an agent that diffs between target and actual and then implements.
- You could run a test review and improvement agent every hour, in a busy trunk-based codebase.
- You could implement really sloppily, and trigger an agent that examines your implementation for intention, writes tests to formalise that, then uses them as the basis for test-driven reimplementation of your code.
All of this just by writing some OpenCode agent (one .MD file) and running Tender CLI.
I’m not using it – for now I’m still an ADDD (Agentic Dictator Driven Development) practitioner. I also never liked automatic commits, but maybe Tender is me starting to let go of that.
If you want to play with it and use NPM, you can run it at repo root with npx @susu-eng/tender.
You can also ask your coding agent to explore the CLI and take care of it for you. Just tell it to run npx @susu-eng/tender --help. This was my first agent-first UI, and it was fun watching Claude figure out how to operate it (trial, error, actually reading instructions, success).
This was only an experiment, implemented 100% by Codex – please use it carefully.
The style of TDD I learned, practiced and taught might seem a bit heavy by today’s standards, but I am still a believer in its virtues. For example: it’s very repetitive expression of intention makes it almost impossible for coding agents to look at a piece of code without knowing what it’s supposed to do and not do under every expected set of conditions.
I can write reliable software automatically, so what’s next?
I’ve been talking to some likeminded folks in similar situations, and current business models around software delivery don’t make sense to us much anymore.
Will we have studios that have mastered AI dev turning out software for clients much more cheaply than before? The agents are already faster than clients can make decisions about what they want.
Will we build speculatively, and then sell to people who believe in the software enough to GTM with it?
Will we build whole integrated development, deployment, and GTM platforms? Some of these already exist in somewhat disappointing fashion, but the tech is there to execute well on it now.
Will we just build products we believe in for ourselves? The cost of trying is very low.
I’m super interested in what’s next.
Tight feedback loops are critical DX for eXtreme Programmers, and they help coding agents immensely.
I use the same thing I learned way back at ThoughtWorkers University – a single command that comprehensively verifies that the build is green. I expose it as top-level DX, which for my current project is pnpm green. My agents call this frequently.
When you feel an urge to do manual testing, debug an error yourself, or anything else that isn’t automated, stop and instead improve “green” until you feel safe again.
Make green as fast as possible – coding agents can be way faster than humans, so waiting 30 seconds for an integration test is a huge delay relatively speaking. The test pyramid is back!
Good DX will get you very far, even before you get into workflow-land and start building quality-focused processes deeply into your coding agents.
