Behavioural Coupling is a Silent Killer — December 14, 2023

Behavioural Coupling is a Silent Killer

People worry a lot about static coupling, but not nearly enough about behavioural coupling.

Coupling in your code is easy to see and even measure. You can tolerate quite a lot of it, as modern IDEs are so good at helping us understand what’s going on.

Conversely, behavioural coupling is a silent killer. People usually introduce it with good intentions, not realising the damage they’re doing.

Good intentions go astray because this particular misstep often feels like a eureka moment. In a flash of insight, you realise that because some external system – which could be another unit of code or a distant microservice – works a _certain way_, you can solve some problem very easily in your system by _just doing x_.

Of course, this kind of thing makes us feel smart. We are using holistic knowledge of our software (and problematically, other people’s software) to save effort – which feels like efficiency – and produce briefer solutions, which feels like elegance.

But if our solution was formulated based on knowledge of how external systems behave, it will intrinsically depend on those behaviours and therefore be coupled to them.

Worse still, this coupling will be invisible. You won’t even know where to look for it. It might be in a simple function call, or hiding in complex patterns of message publication and consumption spanning major organisational boundaries.

Such solutions sit around like landmines. Every change in behaviour starts to feel risky, because you don’t know where it’s safe to step.

I’ve figured out how I have been avoiding this, but maybe it will sound a bit weird.

When I’m writing a unit of code, I’m _inhabiting_ it. I think and talk with my pair about what I – as the unit of code – am able to know through properly defined interfaces. I’m not allowing things outside of that to interfere with my behaviour (the behaviour of the unit), because I (as the unit of code) just don’t know about that.

For example, if my pair wants to assume a string passed to some public method will never be empty (say, because the current consumer is getting the value from validated user input with a minimum length), I put my cursor inside the method and say “Well, we (as the method) don’t know that right? We just know it’s a string.”

It’s important that our solution not rely on information that is known to the programmer _but not to the program_, because it could lead to – in this case – an implementation that explodes when an empty string is provided, which of course could happen at any time through a new consumer or a change to the existing consumer.

These things we feel clever about knowing and leveraging aren’t safe to build software on. They are and should be shifting ground, because behaviour should be easy to change.

Maybe this practice sounds a bit silly, but I’ve been doing it automatically for years. I think it’s helped me see the difference between what I believe is the case and what is programmatically guaranteed to be the case, which are very different things that can feel similar if you aren’t careful.

Elegance — December 12, 2023

Elegance

You can argue almost anything with SOLID. In my experience, elegance is more important.

An elegant solution is immediately understandable by everybody. It’s a way of thinking about the problem that becomes _everybody’s_ way, as soon as they see it.

Questions about how we would handle x, y or z have obvious answers – they follow naturally because an elegant way of thinking gives us leverage. 

In general, we should judge our ideas less by how well they conform to some design pattern, and more by how well they bring understanding to our team. By how much leverage they give us to solve our problems, by their simplicity, and – sometimes – by their brevity. To me, that’s elegance.

Doing It Well — December 7, 2023

Doing It Well

If you care about doing it well, you are most of the way there.

That’s no small thing. Doing it well means caring about improvement more than ego. It means being challenged instead of comfortable, humble instead of prideful, open instead of defensive, earnest instead of cynical, and hard-working instead of lazy. It is believing the best way is still out there and going on a mission to find it every day.

If you can find it in yourself, you are very lucky. If you can practice it well enough that others join you, then you can build a great team – and a great team is something very special.

Delivery Today and Tomorrow — May 3, 2022

Delivery Today and Tomorrow

My way of thinking about technical debt is a bit different to what I usually hear 😅 Let’s start by rejecting the usual framing – the idea that we are in some eternal struggle to find the right balance between speed and quality.

We’re not balancing speed and quality.  We’re balancing delivery today against our ability to deliver tomorrow.  We care a lot about delivery today, which may tempt us to compromise on delivery tomorrow. But do you really think that tomorrow, we will care less about delivery? Has that been your experience so far? 

There are cases where delivery today is an existential necessity, but they are rare. Even startups need to deliver consistently on a long enough timeline for technical debt to catch them. But that doesn’t mean we should spend today developing complex architectures, implementing fancy design patterns or factoring new code exhaustively – there are good reasons to create technical debt.

Imagine you are on a trip and have just arrived at your hotel. Your luggage is full of useful items you may need during your stay. Do you assume you will use every item and unpack them all right away? Do you immediately fill cupboards and drawers with your belongings? Of course not! You start by keeping everything together so you know where it is (high cohesion) and unpacking things as you need them (YAGNI).

People give similar advice about building microservices. Don’t try to draw service boundaries at the start because you’ll get them wrong and it will be hard to fix. Keep things together in one service (high cohesion) and extract them only when boundaries become clear (YAGNI).

There are some life lessons here for us!

  • We are usually wrong at the start
  • Arranging is much easier than rearranging

That first point is important. When we start building something, we are probably wrong. Wrong about how our domain works, wrong about how to model it, wrong about which parts of the system will need to be extensible, wrong about our cross-functional requirements – there are so many things to be wrong about! The more heavily we factor our code, the more we pour concrete on our wrong assumptions. The more we arrange things at the start, the more difficult they will be to rearrange later when we find out we were wrong.  

This “we’re probably wrong at the start” thing can be a bit tough for experienced people to accept, as we may have built our professional ego around the idea that we are good at being right. If you think you can get things right the first time, this post isn’t for you.

But if you’re with me so far, then my advice is this. Apply what we’ve learned about luggage and microservices right down to the class and function level. Don’t heavily factor your code at the start – just do enough. To build enough, we can use outside-in, test-driven development, test-driven design and close collaboration to produce the simplest, smallest, least-factored thing that works. Anything we are tempted to do beyond that, we can consider technical debt.

Working this way, our initial solution will be:

  • As simple as possible, because we will produce it outside-in while practicing test-driven development
  • Well structured, because test-driven design will guide us in writing data-driven code, separating concerns and avoiding complex conditional logic
  • Using domain language that makes sense to our business, because we are working closely with non-engineers
  • Sensible to our team, because we are mobbing or pairing with rotation
  • A step in the right direction, because we have thought about the problem holistically (but a small step only, because we have not thought too hard or too long)

The result of all this will be some software and some technical debt, so when and how should we pay that debt off? A few years ago, I would have suggested we briefly record technical debt on a technical debt wall, with axes for effort and value. I’d have suggested reviewing the wall at every story kick-off and picking up items that will make delivery of the coming story easier. That’s not a bad approach – it’s better than what most people are doing.

Now I believe technical debt walls are a local optima. In a team working as we have discussed here, categorising and prioritising technical debt is of little value:

  • Our opinion about how to improve something will change as our skills and knowledge grow
  • Our opinion about what is worth improving will change as we are asked to adjust and extend our software in ways we did not initially predict
  • We may not work in that part of the codebase again in a relevant timeframe anyway

It is more sensible to ask – before making any change to our software – “Can we improve our engineering design *a bit* to make the coming change easier?” Why only “a bit”? Because we don’t want to engage in a wholesale, big-bang redesign based on the need for a single change. Every time we change our software, we want to improve its design in a small increment, so that the kind of change we are trying to make is easier next time. This way our design will gradually evolve to support the kinds of changes we are actually making, in the areas we are actually making them. We don’t need to gamble on what should be extensible, how extensible it should be and in what ways we will need to extend it.

Put everything together and the entire approach is super simple:

  1. Build the simplest thing we can
  2. Every time we change it, make the change easier with a small improvement to our engineering design

With this approach in mind, the dichotomy between delivery and technical debt is exposed as false. We improve our engineering design when we deliver something and when we deliver something, we improve our engineering design. There is no need to categorise work as ”technical debt” or “delivery”. Such categorisation and the careful management that usually follows is a useful local optima for teams that are not evolving their engineering design continuously enough yet, but it shouldn’t be our ultimate goal.

“Technical debt” as a model for thinking about the need to continuously improve our engineering design is a good raft – it can help you get across the river as you learn how to do the right amount of work up-front and master improving it gradually over time. But when you get to the other side of that river, you should put the raft down.

How to Stop Branching — August 9, 2021

How to Stop Branching

I know several developers that were forced into trunk-based development without preparation and had a terrible experience.  I don’t want to see that happen again, so I’m offering up this post as a practical guide to help your team safely climb down from their branches and practice continuous integration.

If you’re brand new to thinking critically about branching, you can read Martin Fowler’s explanation of continuous integration. If you’re a little further in, I love this blog post by Dave Farley. If you like branching and want to keep doing it, this isn’t the post for you.

Branching is a local optimum – it works just well enough for many teams to tolerate. It requires a number of interdependent practices and when we change one by itself, the system is degraded and the team reverts. When we change all of them together, we throw everything into chaos and nobody wants to try again. It’s a Gordian Knot I’ve spent my whole career untangling.

It has a way of recursively reinforcing itself. By facilitating working independently with limited collaboration, it delays feedback and causes rework during the pull request process, which in turn makes us feel that pull requests – and by extension branching – must be very important. Andrew Cain once told me that recursion is the most powerful force in the universe. Instead of fighting it, we’re going to take steps that will improve our lives now while preparing us for trunk-based development in the future.

Build in quality

Nobody wants to integrate poor quality changes into an unsafe codebase. That’s one of the things people attached to branching are trying to avoid – they want to know that each change is safe.

We can help them achieve this by driving the adoption of quality-focused development practices like:

  • Test-driven development
  • Overlapping coverage across multiple layers of testing
  • Tests as documentation
  • Static analysis
  • Linting

These techniques are more effective at catching defects than pull requests and also provide earlier feedback, reducing rework. They will add a lot of value, even if you keep branching.

Work in small steps

Even people who really like pull requests usually agree that smaller pull requests are preferable, because they tend to be reviewed faster and more thoroughly. More frequent integration also reduces the chance and size of merge conflicts.

We can switch from feature branching to task branching and work in a series of small, safe-to-deploy steps with a pull request for each one. We’ll spend less time debugging, receive earlier feedback from our peers and enjoy faster and more thorough reviews.

Many people are used to tearing everything apart and putting it back together again on their branch. It may take some people a lot of effort to learn a more incremental approach and you will have to be patient and persistent during this phase.

As you improve, you can begin working exclusively in small, production-ready commits. Even the handful of commits made on a small task branch can be made using the red-green-refactor cycle to ensure that your software is working at regular intervals, with only small changes between each known-good state. This will radically reduce cognitive load and debugging effort.

Use feature toggles

One challenge of breaking our work into small task branches is that we must decouple deployment and release. After all, we want to integrate our code – at which point it must be deployable – without releasing our unfinished story.

We can use feature toggles to achieve this – just think of it as runtime branching. We integrate our code, but delay releasing it by disabling it with a feature toggle.

In addition to facilitating more frequent integration, feature toggles allow us to quickly disable problematic features in an emergency. With the right tooling, we can even use them to do pre-release testing in production. Think of all the time we’ve spent trying to maintain production-like environments and test data when we could have just tested in production all along!

Feature toggles can be complex to manage. Putting them in the right places, tracking when they can be turned on and remembering to remove them may take some new tooling, some practice and even some workflow changes. Don’t rush this step – we need to be great at this before we can practice trunk-based development.

Collaborate more

Pull requests are now more frequent, but without increased collaboration this is usually slow and annoying. A little tension helps drive us to continue improving, but we have to guide our team carefully in this phase so they don’t learn the wrong lessons and go backwards.

Substantive pull request discussions cause unnecessary rework and can be taken as a sign that we did not collaborate well. We can eliminate this by working together throughout the development process using techniques like:

  • Pair programming with rotation
  • Thorough kick-offs
  • Tech huddles

We’ll get more context-aware, earlier feedback from more people. Over time, pull requests will become trivial – your team will be aligned before the pull request, reducing delay and rework.

Now, we can point out that the feedback achieved through continuous collaboration appears to be more useful than the feedback received in pull requests. If we were comfortable merging solo work with 2 pull request reviewers, shouldn’t we be even more comfortable merging paired work with a single pull request reviewer? What if we rotated pairs – isn’t the context-aware feedback of 3 close collaborators worth more than 2 peers with minimal context looking at a diff?

Stop branching

By now, we’ve already:

  • Built quality into our practices and codebase so that we feel confident integrating our work
  • Learned how to integrate more frequently by working in small steps and using feature toggles
  • Discovered that collaboration is more effective than critique and broken our dependence on pull request reviews for feedback

We’re ready. All that’s left is to be brave and ask our team if branches and pull requests are still adding enough value to justify the delayed integration and context switching they cause.

Pragmatic to a Fault — November 10, 2020

Pragmatic to a Fault

In the last few years, you’ve probably heard these phrases a lot.

Can we focus on the problem in front of us?

We need to be practical here

Let’s try to be pragmatic

We’ve all worked with engineers who care more about applying SOLID principles than delivering working software. As an industry we’ve been reacting to that – and rightly so. But I have also seen this language used to shut down conversations that – though they may have seemed insufficiently pragmatic at the time – turned out to be very important. We push aside conceptual problems to focus on practical problems, but today’s practical problems are often symptoms of the conceptual problems we pushed aside yesterday. Pragmatism can become a shortsighted cycle of conceptual neglect.

Conceptual Problems You Should Care About

Some conceptual problems can be kept in our peripheral vision until we understand them better, but others need to be addressed right away. How do we know which is which? To try and help, here are some of conceptual problems that have caused issues for my team. When we see one of these now, we tackle it right away.

Inconsistent Thinking

If your code reflects different ways of thinking about and solving the same problem, clashing mental models will cause confusion sooner rather than later.

In publishing certain messages for downstream systems to consume, we needed a mechanism to identify which billing period each message related to. Different messages were implemented by different people over a period of several months and when I examined the whole subsystem at the end, it was clear that the logic for finding the billing period had been invented anew by each engineer. Depending on the message, it was sometimes being found by the bill, sometimes by a created-at timestamp and sometimes by a combination of the two. There were 3 different strategies implemented across 5 different code paths.

When I expressed concern, I was encouraged to be pragmatic – after all, the integration was working fine. But we have enough users that errors in thinking quickly manifest as mishandled edge-cases, so I wanted to get out in front of this one. I began converging our various code paths and strategies around the most sensible solution.

Just before I completed the job a Product Manager in another team sounded the alarm – their numbers weren’t adding up because certain messages couldn’t be placed in billing periods and it was costing us money. Because we had begun solving the conceptual problem instead of waiting for a practical problem, we were able to resolve the issue on the same day and avoid serious financial harm for the business.

Self-Defeating Compromises

We make trade-offs to get things to market faster, but there’s no point delivering something faster if the core value remains unrealised.

One of our teams developed a service for centralising ownership of certain account information. Downstream systems had been designed to expect this information with a special key, but so far only one key had been used. Developers of the new upstream system decided to save time by only supporting that one key.

But downstream systems weren’t designed to expect a key arbitrarily – being able to differentiate these records by key was essential to providing a good user experience. As soon as the business added a second key, the upstream system was rendered useless. The compromise was self-defeating, in that it undermined the core value of the work.

Misleading Behaviour

Anything that doesn’t behave the way it leads people to think it will behave is an expensive mistake waiting to happen.

A member of our team named the primary key column for a new fees table transaction_id. As the team already had a ledger table called transactions and a convention of storing off-ledger supplementary data in other tables (with transaction_id as the foreign key), the naming decision misled many engineers into thinking that the new fees table was another example of this pattern. Much time was wasted explaining to people why they were not able to find those transaction_id values in the transactions table. When the misunderstanding began to influence people’s design decisions, we stepped in and changed it.

Lost Information

We all know not to throw out important information. But many engineers conflate what happened with their interpretation of what happened, discarding the former and only storing the latter. I wrote a whole post on this topic called Write Now, Think Later.

Engineering Managers and Ants: The Search for Consistency — July 23, 2019

Engineering Managers and Ants: The Search for Consistency

Have you ever seen (or been) an Engineering Manager or Software Architect looking for consistency between teams? Have you ever seen a trail of ants, carrying food back to their colony? I’m going somewhere with this – I promise.

Imagine (or remember) a trail of ants. The terrain is complex and – for an ant – dangerous. The food source is very small. The ants have a terrible vantage point and can’t get a good look at anything. They’re not very bright, even by our standards. But the path they’ve found? It’s almost perfect! It has some twists and turns, but it’s relatively direct. There’s a dangerous obstacle nearby, but the trail gives it a wide berth. How did the ants find that food? How did they figure out a path that balances safety and directness from such a terrible vantage point and with such tiny brains?

How do the ants do it?

The basic mechanism behind this is well understood and very efficient – so much so that we frequently use it to solve problems of our own. Let’s start with a simplified model of ant navigation:

  1. Ants follow the pheromone trails left by ants
  2. If there’s no pheromone trail to follow, ants wander off in a for-our-purposes-random direction

Drop a colony of ants in the middle of a flat plane with no pheromone trails to follow and they’ll spread out in all directions, each ant leaving a new pheromone trail. When an ant finds food it resets and follows its own pheromone trail back to the colony. This trail is now twice as strong as any other, because it has been traversed twice. Other ants encountering this trail will change course and begin to follow it,  strengthening it further until eventually all the ants are using it to ferry food back to the colony.

Now we know how ants find food. It’s an amazing trick – they’re small, stupid (by our standards) and can’t see what’s going on, but with a couple of simple behaviours they can do route-finding in complex environments. It’s a triumph of distributed intelligence – a whole greater than the sum of its parts.

Imitation is the sincerest form of flattery

So I guess we should do that too, right? Teams try all kinds of ideas until one of them finds something that works, then we all converge on that solution and keep doing it (well, until the food runs out). That’s roughly what I’ve seen in many organisations – perhaps you’ve seen it too:

  • “The other tech leads and I talked about it and decided we should all use <Library> from now on”
  • “<Manager> wants all the story walls to be consistent, so we rearranged yours a bit last night”
  • “All the other teams follow <Process>, so we should really be doing it too”

To some extent this emerges naturally – humans are great at learning by imitation – but often convergence on a single solution is driven from the top down. There are plenty of good reasons a leader might nudge people in that direction and you’ve probably heard some of them:

  • “It’s easier to move between codebases if they all use <Pattern/Library>”
  • “We don’t want to be constantly reinventing the wheel”
  • Homogenous teams and codebases simplify resource allocation and reporting (you don’t hear this one spoken plainly very often, but it’s a biggie)

Local optima

So far, so good. But we haven’t explained the ants’ real secret yet! If ants really used our simplified model of navigation, they’d converge on the first solution they found. You’d get a trail that leads all over the place before it eventually happens to collide with something useful. Ants following it would waste unnecessary energy and be exposed to needless risk using a wildly inefficient solution they arrived at almost by accident. They’d be trapped in a local optimum.

As we said earlier, ants generally do better than that. Ant trails are relatively direct, even over challenging terrain. They’re small, stupid and have a terrible vantage point – so how do they overcome these weaknesses and escape local optima when all they know how to do is follow pheromone trails? Amazingly, they do it by being even worse than you thought! As it turns out, ants can’t even follow pheromone trails properly. They regularly get lost and wander off the trail and in so doing, they sometimes happen upon a shorter or safer route than the one they were trying to follow. Ants that move back and forth along this new route will – quite incidentally – end up increasing the strength of the pheromone trail quicker such that it eventually becomes the dominant trail. After enough of this, the ants will end up with something close to the best trail possible – a global optimum (for a suitably narrow definition of the problem). Their inconsistency is a crucial part of their success.

Intellectually we know the risks of converging too quickly on a solution. When we solve problems using particle swarm optimisation or evolutionary algorithms we are careful to keep introducing random variations and thereby avoid getting trapped in local optima. But when you get a few tech leads, architects or engineering managers together we seem to forget all about it. We forget that inconsistency is almost synonymous with innovation – that it is the most effective means by which new ideas can be tried, evaluated and eventually adopted.

Finding balance

There are good reasons to want some amount of consistency, but we regularly take it so far that I suspect there’s more going on. Maybe we want predictability in a stochastic world, control over empowered teams or some other hopeless paradox. Perhaps in our fear of uncertainty and our haste to converge on the safe and the known, we trap ourselves in local optima. Not only does this limit our potential, it is deeply disempowering to the people doing the work. After all, even ants are allowed to wander off the trail!

While a little consistency can be very useful, total consistency is like rigor mortis. No new tools, techniques or ideas. Disempowered developers stuck in the past, feeling like pawns and taking orders from people who think they found all the answers years ago in a rapidly changing world. Developers working under such a regime will – if they have any interest in their field or their sanity – quit and find something more interesting to do.

To find balance,  we need only stop meddling. Humans are big imitation learners, we often go along with bad ideas just to fit in and we’ve all heard that good developers are lazy. If despite all of this your teams are still trying out something new, then they probably have a strong intuition that they are in local optima – that a better solution exists somewhere out there. We should respect that intuition and avoid upsetting the balance with top-down pressure to converge prematurely.