2026-04-30T16:29:18+07:00

PMs are Getting the Wrong Impression

April 30, 2026May 14, 2026/Eli Mydlarz/Leave a comment

Product-focused leaders are trying out AI on their own codebases and getting exactly the wrong impression.

Jack uses his orgs codebase with AI: “Wow, this is going so well, I can add features so easily! AI is much better than our engineers”

Nick does the same: “Oh no, AI keeps breaking things – I’m glad we have such great engineers, they are even better than AI”

They are both completely wrong.

Coding agents benefit a lot from, good developer experience, consistent architecture, and particularly from strong testing. They are OK at the moment. If you give them a strong platform, they can build on top of it really well without direction. If you give them a bad platform then they will struggle, just like a new team member would.

Jack had a great experience with AI because his team is so diligent in protecting his orgs ability to innovate. AI is faster than the team because it wants to make Jack happy this session – ideally this message. Jack’s coding agent is mangling his codebase while showing him new features that only look like they work, but he misinterprets his experience in favour of the AI – now brittleness increases every session until he can’t deliver anymore.

Nick had a bad experience because his team is incompetent or rushing. Their software is already brittle, but they can operate in it for now with tribal knowledge. AI can’t – it’s a builder showing up at a home that’s falling apart and being told to add an extra floor. It doesn’t go well and Nick misinterprets his experience in favour of the team. Now he’s stuck with them and they’re stuck with their brittle code. At least refactoring is still possible.

Obviously there is heaps of context to think about, but these are realistic possibilities for product-focused leaders in small orgs.

Test Trees

April 22, 2026May 14, 2026/Eli Mydlarz/Leave a comment

My closest equivalent to “specs” are test trees.

Test trees are seriously battle-tested. They are already used to describe and test the behaviour of popular FinTech apps. I’ve personally spent thousands of hours working with test trees and other engineers to describe, reason about, and verify our software.

Test trees are a living, verifiable description of all behaviour. They are best explained by the test tree for test trees in my own Claude Code plugin, which are at the bottom of this post.

If you think test trees should work differently, you can propose a change:> "then the tree must exist before implementation starts" No, TDD is dumb! "then Claude should just go nuts and I'll figure it out later"

We can reason about changes to behaviour very easily this way, which is important when changing real products with millions of users over _years_.

Also, they are verifiable – they prove that the software has upheld them, because they map to tests that in turn _drove_ the implementation. Here I go again with the TDD 😂

			
test-trees-as-requirements (unit: test/test-trees-as-requirements.bats)
  when a project uses contree
    then CLAUDE.md identifies TEST_TREES.md as the definition of functional and cross-functional requirements
    and TEST_TREES.md defines functional requirements using EARS syntax
    and each behavioural unit has its own tree in TEST_TREES.md
    and trees are flat subsections — not grouped by kind or layer
    and every tree reifies exactly one test file
    and every test file reifies exactly one tree
    and every tree names its coverage in parenthesised labelled pairs on the tree-name line, covering the categories src, unit, integration, functional
    and gaps are declared explicitly — "none" for expected-but-uncovered categories, omission for not-applicable ones
    and the EARS rule is embedded in skills that use it
  when a behaviour change is needed
    then the tree must exist before implementation starts
  when implementation reveals new understanding
    then the tree is updated to reflect reality

		

Trunk Sync and Seance

March 25, 2026April 9, 2026/Eli Mydlarz/Leave a comment

Trunk Sync has a new “seance” feature.

Are you worried about inheriting AI-generated code you don’t understand? No problem, you can always talk to the guy who just wrote it.

Resurrect the long-dead coding agent responsible at exactly the moment in code and context when they changed that line. Learn how the code works and why it works that way straight from the programmer, rather than through post-hoc analysis (guessing).

Seance is a feature of Trunk Sync, which I use for extreme continuous integration with my coding agents. It was the challenge of not being able to personally defend main – normally my last line of defence – that drove me to create Seance.

Typical example at https://lnkd.in/gqMEeBE4 – wanting to know why a Docker image was changed.

In your project folder:

			
ppm i @susu-eng/trunk-sync
trunk-sync install

Please remember, I am just sharing my own experiments. I only hope it’s interesting for you.

Trunk Sync: Maximum continuous integration for coding agents. Agents work in parallel on local worktrees, across remote machines – any mix, all with agentic conflict resolution. No resolving conflicts by hand, or discovering that an agent never pushed its work.

Seance: Talk to dead coding agents. Point at any line of code and rewind the codebase and session back to the exact moment it was written. Ask the agent what it was thinking. Understand generated code on demand and stop worrying about keeping up with every change your agents make.

Academic: When you can run multiple Claude Code agents on the same codebase from anywhere and without breaking each other’s work, your comprehension becomes the bottleneck. People are framing this as “cognitive debt”, and here we are exploring the far right of this debate – extreme post-hoc understanding. Don’t worry about cognitive debt at all – just build as fast as you can and make it easier to catch up selectively. I’m not endorsing – just experimenting and learning like you.

Caveats: There’s a flag for pushing Claude transcripts in case the session doing the work was on another machine or needs to be accessed after Claude cleans it up. A better version (please feel free to PR) would push transcripts to a server so they can be accessed securely outside of Git.

There’s another command for summoning the developer who instructed the agent to write the code, but that one is occult – best kept as an easter egg 😂

Thinker CLI

March 21, 2026April 9, 2026/Eli Mydlarz/Leave a comment

I’m sharing Thinker CLI.

You’ve seen me talk about how valuable CLIs are in agent-land already:
– Self-documenting
– Model domain objects and lifecycles
– Model workflows
– Provide fast feedback
– Teach agents incrementally (rather than requiring full usage baked into a skill)
– Run by any shell-using agent
Give an agent a good CLI and it can do _the thing_ even if it doesn’t know how, because _how_ is baked into the CLI.

Thinker CLI brings all these benefits _and it’s super simple_.

Thinker lets anybody define (and share!) a guided, multi-step thought process for your agent in a JSON config file. Agents follows user directions (or automation) to use Thinker with the config file, then Thinker walks them through the multi-turn process in the config file call by call using structured inputs, structured outputs, interpolation into templates, and strict validation. This way work is presented to the agent clearly, incrementally, and validated at each step. The agent can “think through” complicated work, programmed in advance.

I’ve been using this approach – human-guided CoT sequences with structured inputs and outputs – to great effect in my projects for years now. With good design, it _way_ outperforms the generalised reasoning processes built into current models. I’m really happy I can share it in such a simple way.

Used in an agent, you can define steps for searching in memory, saving back into memory, researching online, producing complex artefacts: Thinker CLI allows you to compose any of your agents functionality in linear sequences using natural language.

Links:
– If you want to read more: https://lnkd.in/g3khXusD
– If you want to tell your agent to install: https://lnkd.in/g-SzcWiU
– Example of a coding agent running it: https://lnkd.in/gyDxBNGv (I normally use Thinker with OpenClaw, but this was easier to get logs of. You see how any agent can use it)

Defensive programming and coding agents

March 12, 2026April 9, 2026/Eli Mydlarz/Leave a comment

Codex and Claude are way too defensive. I think this is a good time to talk about defensive programming.

Say I believe some scenario is impossible, and should it be true there will be an error – a console error, request failure, something noisy – but life will go on.

This is actually good. I am probably not wrong, so there is no reason to complicate my code. If I am wrong, great! Through failing fast (and good observability) I will discover my wrongness and we will all be better off for it. The effects of being wrong in software are cumulative and sometimes fatal, so we want to uncover wrongness early.

Building a good understanding of how data actually flows through your system is important. You should not just guess. You also should not defend against everything by default. You should actually check, be confident that you know, and be eager to discover that you are wrong.

The worst thing is taking a wild guess at how some unexpected edge case should be handled, when you really have no idea why that would have happened, or what the downstream implications of your handling will be. It is routine for coding agents to mishandle, downgrade to warning, or just completely swallow errors that reveal critical misunderstandings and concordant design problems in your (their?) software.

Coding agents love defensive programming. There could be many reasons for this, but two come to mind:
– They just don’t want it to crash, like the early JS mentality.
– They don’t want to “miss an edge case”, perhaps reflective of a lot of training data produced by people who didn’t want to “miss an edge case”.

When you vibe code (not agentically engineer, or whatever we’re calling it) and everything looks amazing, how much of the implementation is just failing quietly because of defensive programming? Perhaps it helps explain the early-euphoria-hard-crash we saw many vibe coders go through.

Instruct your coding agents to fail fast and loud.