Pairing, Teaching and Coding Agents — January 7, 2026

Pairing, Teaching and Coding Agents

I love the incredible learning opportunities pair programming creates. So it’s a little sad when I encounter a teachable moment, but my pair is a coding agent who can’t learn.

Working with a human engineer, such moments are very exciting. My pair would benefit from learning, I’d feel good about teaching them, and we’d all benefit from the human engineer being more capable going forward.

With current coding agents, there’s little point trying to teach lessons conversationally. Next session, that guy will still be a dummy! We don’t have our usual organic, lovely mechanism for teaching, learning, and growing together. Ouch.

What do we get instead? A free lesson for anybody who thinks leadership is all about performance management. Instead of making your agent smarter or more motivated – because you can’t – you step back and ask how you can make it more successful.

What mechanisms exist in your little agentic org for ensuring that strategy makes its way down to the fine details? What context does the agent need, exactly? How do new threads learn about the codebase quickly? How do you define, communicate and enforce rules? How do you improve the developer experience for your agent? How is work best scoped and defined for it?

When you take individual competence off the table – not because it doesn’t matter, but because you don’t control it – you are forced into systems thinking.

PS: If you want to talk about experiential learning for coding agents then DM me, I am very interested.

Epistemic Decentralisation — October 15, 2025

Epistemic Decentralisation

Are you losing faith in your experts? 

When we get a second opinion from ChatGPT (or wherever), we see that our experts are fallible, and begin to lose confidence in them. The technology in our pockets hasn’t filtered out to our institutions, and while knowledge monopolies have been broken, accompanying professional monopolies persist. 

So what happens when you bring AI to your appointment? Friction, from institutions that expect you to be ignorant and passive. Overconfidence – you might find yourself saying something you don’t really understand, and can’t explain in real time 😂 You learn that AI is about understanding more than output.

But you care more about your situation than a paid expert ever will. You can spend more than some institutionally-allotted period of time on it, and even experts make mistakes.  Growing your own understanding with AI can help you make sure your doctor is following the latest guidelines for your case, or that your lawyer has considered how a small detail in your contract impacts your claim. 

In the next year, we will break knowledge monopolies in areas that impact us, moving the locus of control closer to ourselves, _using_ professional monopolies rather than falling into them haplessly.

Multi-agent vs Multi-generational — December 19, 2024

Multi-agent vs Multi-generational

I’ve been building agentic systems for a while now, and I want to reflect on some of the dominant thinking in the circle of people doing that kind of work.

Multi-agent solutions feel natural to people who understand the value of teamwork. But are we projecting our human need for specialisation onto a new generation of agentic systems, thereby passing our limitations onto them unnecessarily?

When people are trying to do something that seems to exceed the capabilities of their AI system, they usually introduce greater up-front orchestration, which helps short-term – but it’s a trap! As the model gets smarter, the orchestration becomes redundant – wasted effort, which you paid a high opportunity-cost for. I got burned that way when I first started.

I’ve found going the other way is usually a better decision. I consider how I can empower my model with more context, rather than treating it like a dummy by giving it narrower instructions and a more specialised role. Context is king.

When pursuing longer-term or high-complexity objectives with AI, we should think less about multi-agent solutions, and more about multigenerational solutions. It’s not about architecting the perfect swarm up-front and trying to force fuzzy LLMs into rigid behaviour. Rather, it’s about carrying on the work over many generations, with each generation able to revise past work and continue, setting new proximal goals, while bearing in mind the distal goal. Therein, the fuzziness introduced by LLMs is essential, rather than detrimental.

To put it another way, It’s recursive graph reversal, but the graph is a family tree being built dynamically. Children revise and continue the work left by their parents, and their understanding of the problem begins to exceed that of earlier generations through the buildup of additional context – of history – ultimately leading to better outcomes.

This has proven to be a good framework for thinking about problem solving with AI – multigenerational operations, with correction and autonomy in each layer of the family tree. It’s more like building the Kailasa temple than traditional computing.

Just months ago, I thought of these generations as being context windows, modelled as nodes in a graph – relatively short containers of understanding and output, used to delegate to, and build context for, subsequent context windows / nodes. These days, advances in agentic memory and my use of rolling context windows have me focusing more on building and leveraging layers of understanding in a more continuous way. But I still find thinking multigenerationally really helpful whenever I’m tempted to increase orchestration.

Research Journal – The Agents are Coming — November 26, 2024

Research Journal – The Agents are Coming

When I started my AI research, I was focused on code generation. After a while, I could see that even the most challenging approach – building a general software engineering agent – is a solved problem, theoretically.

Start with where we are today. Replit Agent is my favourite so far. The way I see it, Replit Agent and friends already have an advantage over me in speed and cost on some time horizon, and that time horizon will get longer.

Consider the improvements we’re seeing in related areas. Tools like Cody and Aider show how understanding codebases as graphs makes LLMs radically more helpful.

Agents are also getting better at interacting with their environments. Anthropic just launched the Model Context Protocol – a framework for connectivity to new data sources – and the ChatGPT app can already see your editor and terminal.

The models themselves are improving, notably including new planning-oriented models, improvements in reasoning, improvements in efficiency, scaling of output quality with test time, faster inference… Each of these is super helpful for a general software engineering agent, and they are all happening at the same time.

Memory layers are also maturing. Letta and Zep are both amazing.

Orchestration is getting better. There are so many good new tools – I’m trying LangGraph next. That’s going to enable more complex workflows (TDD, outside-in) and – crucially – cognitive architectures.

Then there is a bunch of effort stuff. Putting together templates and examples for agents to train on and work with, building software with abstractions that play to the strengths of the agents building it, rather than being designed for humans. Remember, even though it seems like we can do it this way, this is not the easiest way for AI at all – we will make it much easier.

I suspect a lot of engineers would prefer not to think about it, but software engineering will be generally solved, with multiple approaches.

Research Journal – Syntax Trees and Agents — November 19, 2024

Research Journal – Syntax Trees and Agents

After my experiments with code generation, graph traversal for outside-in prompting, and combined development time and runtime, I was really taken with graph traversal.

But I realised I didn’t need my custom graphs to traverse code – I was working in TypeScript, and its compiler already understands the code as a syntax tree.

Project 2 was orchestrating outside-in graph traversal for code generation, using syntax trees of your existing code to find entry points for generable software. In this approach, an entry point is identified and described briefly with a special type that extends a custom Generable type, a VS Code extension detects and implements Generables, and a tiny custom dependency injection framework injects them at runtime.

You can use code you didn’t write yet and let the system implement in the background. No need to write special graphs – if you’re writing code, you’re already writing a graph. What a great design language for engineers! If you can get the system to use the same type-based delegation recursively, maybe you can specify pretty challenging high-level Generables in a high-level orchestration layer, and use it while it gets implemented underneath you.

But developments in agentic approaches made me feel that I was still undershooting the future – that this would be superseded by an agentic, general purpose software engineer, even though that is a much harder version of the problem to solve! And I wasn’t ready to let go of combining development time and runtime.

More on those spaces and how they influenced my work in the next post 🙏 Including some specific tools I have enjoyed, for people who want to try things out.

Research Journal – Graphs & Runtime — November 15, 2024

Research Journal – Graphs & Runtime

I want to share a story that challenged my most basic ideas about software engineering.

Months ago, as part of my first new AI project, I designed a DSL for expressing software designs. I found a pretty friendly way to describe a pseudo-graph of operations and data. You send it to my server, with a target node.

My server builds outside-in to reach the target node. At each node, it prompts a model to implement a suitable function, with the code for already-implemented consumers (the node’s parents in the graph) and a plain English description (from the DSL) supplied as system instructions. Engineers who work outside-in and consumer-first might already know why this is a useful approach for AI.

You can design and run complex things neatly and correctly – take-home programming exercise level stuff. Though I’m working on something else now, I still love that project. I’m proud of two decisions in particular.

First decision: Pass the graph you want to execute and a target node in at runtime and get the result synchronously – not the software, but the result of running it, providing inputs as the outermost edges of your graph. The server doesn’t build software for you, it computes for you, with software that it builds incidentally to get your result. This means there is no separation between software development and operation – only latency and caching.

It is a shock to work with something that does lots of your job at runtime. Our code is usually so precious to us. Think of the care we take in its production, testing, and deployment. What a strange feeling, to suddenly think of it as so trivial that it might not even be worth caching!

Second decision: Let the client patch their graph – just keep adding to it in separate HTTP calls, specifying whatever parts are needed, referencing parts specified earlier, all while targeting a node in each call to get results.

I used it super-iteratively, designing software with a tight feedback loop between my requests and the resulting responses and software until I had a system I liked. At that point, I could delete the graph and any code, because my requests essentially were my software. Everything after design was incidental – generable at runtime. I could use my honed set of graph-segment-and-target-node patch requests like endpoints to operate the system, even if the system didn’t exist yet. It was a Keanu “woah” moment for me.

I thought deeply about this paradigm we are swimming in – where software must be developed and deployed well in advance of being executed. It felt like seeing water.

We could have a dynamic runtime where models run code ad-hoc to achieve their goals. Code can be used, rather than developed, deployed, and executed, collapsing the space between development and runtime as my project did. There’s no technical obstacle, really. We just need time, compute, and creativity.

That glimpse of the future was really impactful for me, and I took the lessons in my following work.

Research Journal – Missing the AI Forest for the AI Trees — November 12, 2024

Research Journal – Missing the AI Forest for the AI Trees

I’m lucky to have been able to throw myself headfirst into AI since leaving SadaPay, but almost everybody I know in-industry is still building traditional software. There’s a lot happening. I’m working on my fourth AI project now, and I’m so excited by the work we’re doing in this space. You’re going to see a whole new generation of user experiences. I’ll try to pause and share back sometimes, from now on.

Sometimes when I talk with people about AI, all they can share is that they aren’t impressed with ChatGPT. To me, that reaction seems like seeing electricity power the first electric light bulb, and not being able to imagine an air conditioner, a television, an electrified subway system – or maybe not even a better lightbulb – and thereby concluding that candles aren’t going anywhere.

It’s not that skeptics are wrong, it’s that they’re missing the forest for the trees by evaluating a fundamentally new form of compute – an enormous foundational breakthrough – as if it were a product, based on the first product(s) they have seen it power.

We have some form of intelligence on demand now, like electricity, but electricity required a lot of work to go from the first lightbulb to the massively electrified society we have today. We are building a whole new ecosystem in massive parallel, from chips to user-facing software, and we are really just getting started, but it’s coming faster than you might think.

Behavioural Coupling is a Silent Killer — December 14, 2023

Behavioural Coupling is a Silent Killer

People worry a lot about static coupling, but not nearly enough about behavioural coupling.

Coupling in your code is easy to see and even measure. You can tolerate quite a lot of it, as modern IDEs are so good at helping us understand what’s going on.

Conversely, behavioural coupling is a silent killer. People usually introduce it with good intentions, not realising the damage they’re doing.

Good intentions go astray because this particular misstep often feels like a eureka moment. In a flash of insight, you realise that because some external system – which could be another unit of code or a distant microservice – works a _certain way_, you can solve some problem very easily in your system by _just doing x_.

Of course, this kind of thing makes us feel smart. We are using holistic knowledge of our software (and problematically, other people’s software) to save effort – which feels like efficiency – and produce briefer solutions, which feels like elegance.

But if our solution was formulated based on knowledge of how external systems behave, it will intrinsically depend on those behaviours and therefore be coupled to them.

Worse still, this coupling will be invisible. You won’t even know where to look for it. It might be in a simple function call, or hiding in complex patterns of message publication and consumption spanning major organisational boundaries.

Such solutions sit around like landmines. Every change in behaviour starts to feel risky, because you don’t know where it’s safe to step.

I’ve figured out how I have been avoiding this, but maybe it will sound a bit weird.

When I’m writing a unit of code, I’m _inhabiting_ it. I think and talk with my pair about what I – as the unit of code – am able to know through properly defined interfaces. I’m not allowing things outside of that to interfere with my behaviour (the behaviour of the unit), because I (as the unit of code) just don’t know about that.

For example, if my pair wants to assume a string passed to some public method will never be empty (say, because the current consumer is getting the value from validated user input with a minimum length), I put my cursor inside the method and say “Well, we (as the method) don’t know that right? We just know it’s a string.”

It’s important that our solution not rely on information that is known to the programmer _but not to the program_, because it could lead to – in this case – an implementation that explodes when an empty string is provided, which of course could happen at any time through a new consumer or a change to the existing consumer.

These things we feel clever about knowing and leveraging aren’t safe to build software on. They are and should be shifting ground, because behaviour should be easy to change.

Maybe this practice sounds a bit silly, but I’ve been doing it automatically for years. I think it’s helped me see the difference between what I believe is the case and what is programmatically guaranteed to be the case, which are very different things that can feel similar if you aren’t careful.

Elegance — December 12, 2023

Elegance

You can argue almost anything with SOLID. In my experience, elegance is more important.

An elegant solution is immediately understandable by everybody. It’s a way of thinking about the problem that becomes _everybody’s_ way, as soon as they see it.

Questions about how we would handle x, y or z have obvious answers – they follow naturally because an elegant way of thinking gives us leverage. 

In general, we should judge our ideas less by how well they conform to some design pattern, and more by how well they bring understanding to our team. By how much leverage they give us to solve our problems, by their simplicity, and – sometimes – by their brevity. To me, that’s elegance.

Collaborate Your Meetings to Death —

Collaborate Your Meetings to Death

Most meetings are only necessary because of gaps in shared context. If we all did everything together, there would be no reason for meetings (except with external stakeholders). But teams can easily fall into a a vicious cycle. Inundated by meetings, people begin to isolate themselves, creating a vacuum in shared context that gets filled by more meetings. 

Break the cycle. Pick a meeting that’s being used to share context and find a way to share it earlier and more continuously.