RSS Filter

Feed: Fred Hebert

Entries found: 5

Software Acceleration and Desynchronization

Published: Mon, 05 Jan 2026 11:00:00 EST
Updated: Mon, 05 Jan 2026 11:00:00 EST
UTC: 2026-01-05 16:00:00+00:00
URL: https://ferd.ca/software-acceleration-and-desynchronization.html

A bit more than a month ago, I posted the following on social media:

Content Preview

A bit more than a month ago, I posted the following on social media:

Seeing more reports and industry players blaming code reviews for slowing down the quick development done with AI. It's unclear whether anyone's asking if this is just moving the cognitive bottleneck of "understanding what's happening" around. "Add AI to the reviews" seems to be the end goal here.

And I received multiple responses, some that were going "This is a terrible thing" and some going "yeah that's actually not a bad idea." Back then I didn't necessarily have the concepts to clarify these thoughts, but I've since found ways to express the issue in a clearer, more system-centric way. While this post is clearly driven by the discourse around AI (particularly LLMs), it is more of a structural argument about the kind of changes their adoption triggers, and the broader acceleration patterns seen in the industry with other technologies and processes before, and as such, I won’t really mention them anymore here.

The model I’m proposing here is inspired by (or is a dangerously misapplied simplification of) the one presented by Hartmut Rosa’s Social Acceleration , 1 bent out of shape to fit my own observations. A pattern I’ll start with is one of loops, or cycles.

Loops, Cycles, and Activities

Let’s start with a single linear representation of the work around writing software:

This is a simplification because we could go much deeper, such as in this image of what value stream mapping could look like in the DORA report: 2

But we could also go for less linear to show a different type of complexity, even with a simplified set of steps:

Each of the steps above can imply a skip backwards to an earlier task, and emergencies can represent skips forwards. For the sake of the argument, it doesn't matter that our model is adequately detailed or just a rough estimation; we could go for more or less accurate (the “write tests” node could easily be expanded to fill a book), this is mostly for illustrative purposes.

Overall, in all versions, tasks aim to go as quickly as possible from beginning to end, with an acceptable degree of quality. In a mindset of accelerating development, we can therefore take a look at individual nodes (writing code, debugging, or reviewing code) for elements to speed up, or at overall workflows by influencing the cycles themselves.

For example, code reviews can be sped up with auto formatting and linting—automated rule checks that enforce standards or prevent some practices—which would otherwise need to be done by hand. This saves time and lets people focus on higher-level review elements. And the overall cycle can be made faster by moving these automated rules into the development environment, thereby tightening a feedback loop: fix as you write rather than accumulating flaws on top of which you build, only to then spend time undoing things to fix their foundations.

Concurrent Loops

So far, so good. The problem though is that this one isolated loop is insufficient to properly represent most of software work. Not only are there multiple tasks run in parallel by multiple people each representing independent loops, each person also is part of multiple loops as well. For example, while you might tackle a ticket for some piece of code, you may also have to write a design document for an upcoming project, provide assistance on a support ticket, attend meetings, focus on developing your career via mentorship sessions, and keep up with organizational exercises through publishing and reading status reports, and so on.

Here's a few related but simplified loops, as a visual aid:

Once again, each person on your team may run multiple of these loops in parallel during their workday, of various types.

But more than that, there might be multiple loops that share sections. You can imagine how writing code in one part of the system can prepare you or improve your contributions to multiple sorts of tasks: writing code that interacts with it, modifying it, reviewing changes, writing or reviewing docs, awareness of possible edge cases for incidents, better estimates of future tasks, and so on.

You can also imagine how, in planning how to best structure code for new product changes, experience with the current structure of the code may matter, along with awareness of the upcoming product ambitions. Likewise, the input of people familiar with operational challenges of the current system can prove useful in prioritizing changes. This sort of shared set of concerns informed ideas like DevOps, propelled by the belief that good feedback and integration (not throwing things over the fence) would help software delivery.

Basically, a bunch of loops can optionally contribute to one set of shared activities, but some activities can also be contributors to multiple loops, and these loops might be on multiple time scales:

Here, the activity of reviewing code might be the place where the coding loops gets straightforward reviews as desired, but it is also a place where someone's career growth plans can be exercised in trying to influence or enforce norms, and where someone looking at long-term architectural and growth pattern gets to build awareness of ongoing technical changes.

These shared bits are one of the factors that can act like bottlenecks, or can counter speed improvements. To make an analogy, consider the idea that if you were cycling to work 30 minutes each way every day, and sped up your commute by going twice as fast via public or private transit, you’d save 2h30 every week; however some of that time wouldn’t be “saved” if you consider you might still need to exercise to stay as fit physically. You would either need to spend much of your time saved on exercise outside of commute , or end up incidentally trading off commute time for longer-term health factors instead.

Applied to software, we may see this pattern with the idea of “we can now code faster, but code review is the new bottleneck.” The obvious step will be to try to speed up code reviewing to match the increased speed of code writing. To some extent, parts of code reviewing can be optimized. Maybe we can detect some types of errors more reliably and rapidly through improvements. Again, like linting or type checking, these ideally get moved to development rather than reviews.

But code reviewing is not just about finding errors. It is also used to discuss maintainability, operational concerns, to spread knowledge and awareness, to get external perspectives, or to foster broader senses of ownership. These purposes, even if they could be automated or sped up, can all indicate the existence of other loops that people may have to maintain regardless.

Synchronization and Desynchronization

If we decide to optimize parts of the work, we can hope for a decent speedup if we do one of:

Continuously develop a proper understanding of the many purposes a task might have, to make useful, well-integrated changes

give up some of the shared purposes by decoupling loops

The first option is challenging and tends to require research, iteration, and an eye for ergonomics. Otherwise you’ll quickly run into problems of “working faster yet going the same speed”, where despite adopting new tools and methods, the bottlenecks we face remain mostly unchanged. Doing this right implies your changes are made knowing they'll structurally impact work when attempting to speed it up, and be ready to support these disruptions.

The second is easy to do in ways that accidentally slow down or damage other loops—if the other purposes still exist, new activities will need to replace the old ones—which may in turn feed back into the original loop (e.g.: code reviews may block present code writing, but also inform future code writing), with both being weakened or on two different tempos when decoupled. This latter effect is something we’ll call “desynchronization.” One risk of being desynchronized is that useful or critical feedback from one loop no longer makes it to another one.

To cope with this (but not prevent it entirely), we have a third option in terms optimization:

Adopt “norms” or practices ahead of time that ensure alignment and reduce the need for synchronization.

This is more or less what “best practices” and platforms attempt to provide: standards that when followed, reduce the need for communication and sense making. These tend to provide a stable foundation on which to accelerate multiple activities. These don’t fully prevent desynchronization, they just stave it off.

To illustrate desynchronization, let’s look at varied loops that could feed back into each other:

These show shared points where loops synchronize, across ops and coding loops, at review time. The learnings from operational work can feed back into platform and norms loops, and the code reviews with ops input are one of the places where these are "enforced". 3 If you remove these synchronization points, you can move faster, but loops can also go on independently for a while and will grow further and further apart:

There's not a huge difference across both images, but what I chose to display here is that lack of dev-time ops input (during code review) might lead to duplicated batches of in-flight fixes that need to be carried and applied to code as it rolls out, with extra steps peppered through. As changes are made to the underlying platform or shared components, their socialization may lag behind as opportunities to propagate them are reduced. If development is sped up enough without a matching increase in ability to demonstrate the code's fitness (without waiting for more time in production), the potential for surprises goes up.

Keep in mind that this is one type of synchronization across one shared task between two high-level loops. Real work has more loops, with more nodes, more connections, and many subtler synchronization points both within and across teams and roles. Real loops might be more robust, but less predictable. A loop with multiple synchronization points can remove some of them and look faster until the few remaining synchronization points either get slower (to catch up) or undone (to go fast).

Not all participants to synchronization points get the same thing out of them either. It’s possible an engineer gets permission (and protection) from one, another gets awareness, some other team reinforces compliance, and a management layer claims accountability out of it happening, for example.

It's easy to imagine both ends of a spectrum, with on one end, organizations that get bogged down on synchronous steps to avoid all surprises, and on the other, organizations that get tangled into the web of concurrent norms and never-deprecated generations of the same stuff all carried at once because none of the synchronous work happens.

Drift that accumulates across loops will create inconsistencies as mental models lag, force corner cutting to keep up with changes and pressures, widen gaps between what we think happens and what actually happens. 4 It pulls subsystems apart, weakens them, and contributes to incidents—unintended points of rapid resynchronization.

I consider incidents to be points of rapid resynchronization because they're usually where you end up desynchronizing so hard, incident response forces you to suspend your usual structure, quickly reprioritize, upend your roadmap, and (ideally) have lots of people across multiple teams suddenly update their understanding of how things work and break down. That the usual silos can't keep going as usual points to forced repair after too much de-synchronization.

As Rosa points out in his book, this acceleration tends to grow faster than what the underlying stable systems can support, and they become their own hindrances. Infrastructure and institutions are abandoned or dismantled when the systems they enabled gradually feel stalled or constrained by them, and seek alternatives:

[Acceleration] by means of institutional pausing and the guaranteed maintenance of background conditions is a basic principle of the modern history of acceleration and an essential reason for its success as well. [Institutions] were themselves exempted from change and therefore helped create reliable expectations, stable planning, and predictability. [...] Only against the background of such stable horizons of expectation does it become rational to make the long-term plans and investments that were indispensable for numerous modernization processes. The erosion of those institutions and orientations as a result of further, as it were, “unbounded” acceleration [...], might undermine their own presuppositions and the stability of late modern society as a whole and thereby place the (accelerative) project of modernity in greater danger than the antimodern deceleration movement.

The need for less synchronization doesn’t mean that synchronization no longer needs to happen. The treadmill never slows down, and actors in the system must demonstrate resilience to reinvent practices and norms to meet demands. This is particularly obvious when the new pace creates new challenges: what brought us here won’t be enough to keep going, and we’ll need to overhaul a bunch of loops again.

There’s something very interesting about this observation: A slowdown in one place can strategically speed up other parts.

Is This Specific Slow a Good Slow or a Bad Slow?

There’s little doubt to me that one can go through a full cycle of the “write code” loop faster than one would go through “suffering the consequences of your own architecture” loop—generally that latter cycle depends on multiple development cycles to get adequate feedback. You can ship code every hour, but it can easily take multiple weeks for all the corner cases to shake out.

When operating at the level of system design or software architecture (“We need double-entry bookkeeping that can tolerate regional outages”), we tend to require an understanding of the system’s past, a decent sense of its present with its limitations, and an ability to anticipate future challenges to inform the directions in which to push change. This is a different cycle from everyday changes (“The feature needs a transaction in the ledger”), even if both are connected.

The implication here is that if you’re on a new code base with no history and a future that might not exist (such as short-term prototypes or experiments), you’re likely to be able to have isolated short loops. If you’re working on a large platform with thousands of users, years of accrued patterns and edge cases, and the weight of an organizational culture to fight or align with, you end up relying on the longer loops to inform the shorter ones.

The connections across loops accrue gradually over time, and people who love the short loops get very frustrated at how slow they’re starting to be:

Yet irreversible decisions require significantly more careful planning and information gathering and are therefore unavoidably more time intensive than reversible ones. In fact, other things equal, the following holds: the longer the temporal range of a decision is, the longer the period of time required to make it on the basis of a given substantive standard of rationality. This illustrates the paradox of contemporary temporal development: the temporal range of our decisions seems to increase to the same extent that the time resources we need to make them disappear.

That you have some folks go real fast and reap benefits while others feel bogged down in having to catch up can therefore partially be a sign that we haven’t properly handled synchronization and desynchronization. But it can also be a function of people having to deliberately slow down their work when its output either requires or provides the stability required by the fast movers. Quick iterations at the background level—what is generally taken for granted as part of the ecosystem—further increases the need for acceleration from all participants.

In a mindset of acceleration, we will seek to speed up every step we can, through optimization, technological innovation, process improvements, economies of scale, and so on. This connects to Rosa’s entire thesis of acceleration feeding into itself. 5 One of the point Rosa makes, among many, is that we need to see the need for acceleration and the resulting felt pressures (everything goes faster, keeping up is harder; therefore we need to do more as well) as a temporal structure , which shapes how systems work. So while technical innovation offers opportunities to speed things up (often driven by economic forces), these innovations transform how our social structures are organized (often through specialization), which in turn, through a heightened feeling of what can be accomplished and a feeling that the world keeps going faster, provokes a need to speed things up further and fuels technological innovation. Here's the diagram provided in his book:

We generally frame acceleration as an outcome of technological progress, but the idea here is that the acceleration of temporal structures is, on its own, a mechanism that shapes society (and, of course, our industry). Periods of acceleration also tend to come with multiple forms of resistance; while some are a bit of a reflex to try and keep things under control (rather than having to suffer more adaptive cycles), there are also useful forms of slowing down, those which can provide stability and lengthen horizons of other acceleration efforts.

Few tech companies have a good definition of what productivity means, but the drive to continually improve it is nevertheless real. Without a better understanding of how work happens, we’re likely to keep seeing a wide variation in how people frame the impact of new tech on their work as haphazard slashing and boosting of random parts of random work loops. I think this overall dynamic can provide a useful explanation for why some people, despite being able to make certain tasks much faster, either don't feel overall more productive, or actually feel like they don't save time and it creates more work. It's hard to untangle which type of slowdown is being argued for at times, but one should be careful to classify all demands of slowing down as useless Luddite grumblings. 6 It might be more useful down the road to check whether you could be eroding your own foundations without a replacement.

What do I do with this?

A systems-thinking approach tends to require a focus on interactions over components. What the model proposed here does is bring a temporal dimension to these interactions. We may see tasks and activities done during work as components of how we produce software. The synchronization requirements and feedback pathways across these loops and for various people are providing a way to map out where they meet.

Ultimately even the loop model is a crude oversimplification. People are influenced by their context and influence their context back in a continuous manner that isn’t possible to constrain to well-defined tasks and sequences. Reality is messier. This model could be a tool to remind ourselves that no acceleration happens in isolation. Each effort contains the potential for desynchronization, and for a resulting reorganization of related loops. In some ways, the aim is not to find specific issues, but to find potential mismatches in pacing, which suggest challenges in adapting and keeping up.

The analytical stance adopted matters. Seeking to optimize tasks in isolation can sometimes yield positive local results, within a single loop, and occasionally at a wider scale. Looking across loops in all its tangled mess, however, is more likely to let you see what’s worth speeding up (or slowing down to speed other parts up!), where pitfalls may lie, and foresee where the needs for adjustments will ripple on and play themselves out. Experimentation and ways to speed things up will always happen and will keep happening, unless something drastically changes in western society; experimenting with a better idea of what to look for in terms of consequences is not a bad idea.

1 : While I have not yet published a summary or quotes from it in my notes section , it's definitely one of the books that I knew from the moment I started reading it would have a huge influence in how I frame stuff, and as I promised everyone around me who saw me reading it: I'm gonna be very annoying when I'll be done with it. Well, here we are. Grab a copy of Social Acceleration: A New Theory of Modernity . Columbia University Press, 2013.

2 : Original report , figure 50 is on p. 75.

3 : This example isn't there to imply that the synchronization point is necessary, nor that it is the only one, only that it exists and has an impact. This is based on my experience, but I have also seen multiple synchronization points either in code review or in RFC reviews whenever work crosses silo boundaries across teams and projects become larger in organizational scope.

4 : I suspect it can also be seen as a contributor to concepts such as technical debt, which could be framed as a decoupling between validating a solution and engineering its sustainability.

5 : I believe this also connects to the Law of Stretched Systems in cognitive systems engineering, and might overall be one of these cases where multiple disciplines find similar but distinct framings for similar phenomena.

6 : Since I'm mentioning Luddism, I need to do the mandatory reference to Brian Merchant's Blood in the Machine , which does a good job at reframing Luddism in its historical context as a workers' movement trying to protect their power over their own work at the first moments of the Industrial Revolution. Luddites did not systemically resist or damage all new automation technology, but particularly targeted the factory owners that offered poor working conditions while sparing the others.

Ongoing Tradeoffs, and Incidents as Landmarks

Published: Sat, 20 Sep 2025 11:00:00 EDT
Updated: Sat, 20 Sep 2025 11:00:00 EDT
UTC: 2025-09-20 15:00:00+00:00
URL: https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html

The Gap Through Which We Praise the Machine

Published: Mon, 09 Jun 2025 10:00:00 EDT
Updated: Mon, 09 Jun 2025 10:00:00 EDT
UTC: 2025-06-09 14:00:00+00:00
URL: https://ferd.ca/the-gap-through-which-we-praise-the-machine.html

In this post I’ll expose my current theory of agentic programming: people are amazing at adapting the tools they’re given and totally underestimate the extent to which they do it, and the amount of skill we build doing that is an incidental consequence of how badly the tools are designed.

Content Preview

In this post I’ll expose my current theory of agentic programming: people are amazing at adapting the tools they’re given and totally underestimate the extent to which they do it, and the amount of skill we build doing that is an incidental consequence of how badly the tools are designed.

I’ll first cover some of the drive behind AI assistant adoption in software, the stochastic-looking divide in expectations and satisfactions with these tools, and the desire to figure out an explanation for that phenomenon.

I’ll then look at what successful users seem to do, explore the type of scaffolding and skills they need to grow to do well with LLMs when coding or implementing features. By borrowing analytical ideas from French Ergonomists, I’ll then explain how this extensive adaptive work highlights a gap in interaction design from AI tool builders, which is what results in tricky skill acquisition.

Basically, things could be much better if we spent less time congratulating machines for the work people do and we instead supported people more directly.

Money Claps for Tinkerbell, and so Must You

A few months ago, Charity Majors and I gave the closing plenary talk at SRECon Americas 2025 . While we were writing the talk, trying to thread a needle between skepticism and optimism, Charity mentioned one thing I hadn’t yet understood by then but was enlightening: investors in the industry already have divided up companies in two categories, pre-AI and post-AI, and they are asking “what are you going to do to not be beaten by the post-AI companies?”

The usefulness and success of using LLMs are axiomatically taken for granted and the mandate for their adoption can often come from above your CEO. Your execs can be as baffled as anyone else having to figure out where to jam AI into their product. Adoption may be forced to keep board members, investors, and analysts happy, regardless of what customers may be needing.

It does not matter whether LLMs can or cannot deliver on what they promise: people calling the shots assume they can, so it’s gonna happen no matter what. I’m therefore going to bypass any discussion of the desirability, sustainability, and ethics of AI here, and jump directly to “well you gotta build with it anyway or find a new job” as a premise. My main focus will consequently be on people who engage with the tech based on these promises, and how they do it. There’s a wide spectrum where at one end you have “true believers,” and at the other you have people convinced of the opposite—that this is all fraudulent shit that can’t work.

In practice, what I’m seeing is a bunch of devs who derive real value from it at certain types of tasks and workflows ranging from copilot-as-autocomplete to full agentic coding, and some who don’t and keep struggling to find ways to add LLMs to their workflows (either because they must due to some top-down mandate, or because they fear they’ll be left behind if they don’t 1 ). I can also find no obvious correlation between where someone lands on that spectrum and things like experience levels; people fall here and there regardless of where they work, how much trust I have in their ability, how good they are at communicating, how much of a hard worker they are, or how willing to learn they might be.

A Theory of Division

So where does that difference come from? It could be easy to assign dissatisfaction to “you just gotta try harder”, or “some people work differently”, or “you go fast now but you are just creating more problems for later.” These all may be true to some degree, and the reality is surely a rich multifactorial mess. We also can’t ignore broader social and non-individual elements like the type of organizational culture people evolve in, 2 on top of variations that can be seen within single teams.

My gut feeling is that, on top of all the potential factors already identified, people underestimate their own situatedness (how much they know and interpret and adjust from “thing I am told to build” and tie that to a richer contextualized “thing that makes sense to build” by being connected participants in the real world and the problem space) and how much active interpretation and steering work they do when using and evaluating coding assistants. 3 Those who feel the steering process as taxing end up having a worse time and blame the machine for negative outcomes; those for whom it feels easy in turn praise the machine for the positive results.

This tolerance for steering is likely moderated or amplified by elements such as how much people trust themselves and how much they trust the AI, how threatened they might feel by it, their existing workflows, the support they might get, and the type of “benchmarks” they choose (also influenced by the preceding factors). 4

I’m advancing this theory because the people I’ve seen most excited and effective about agentic work were deeply involved in constantly correcting and recognizing bugs or loops or dead ends the agent was getting into, steering them away from it, while also adding a bunch of technical safeguards and markers to projects to try and make the agents more effective. When willingly withholding these efforts, their agents’ token costs would double as they kept growing their context windows through repeating the same dead-end patterns; oddities and references to non-existing code would accumulate, and the agents would increasingly do unhinged stuff like removing tests they wrote but could no longer pass.

I’ve seen people take the blame for that erratic behavior on themselves (“oh I should have prompted in that way instead, my bad”), while others would just call out the agent for being stupid or useless.

The early frustration I have seen (and felt) seems to be due to hitting these road blocks and sort of going “wow, this sucks and isn’t what was advertised.” If you got more adept users around you, they’ll tell you to try different models, tweak bits of what you do, suggest better prompts, and offer jargon-laden workarounds.

That gap between “what we are told the AI can do” and “what it actually does out of the box” is significant. To bridge that gap, engineers need to do a lot of work.

The Load-bearing Scaffolding of Effective Users

There are tons of different artifacts, mechanisms, and tips and tricks required to make AI code agents work. To name a few, as suggested by vendors and multiple blog posts, you may want to do things such as:

Play and experiment with multiple models, figure out which to use and when, and from which interfaces, which all can significantly change your experience.

Agent-specific configuration files (such as CLAUDE.md , AGENTS.md , or other rule files ) that specify project structure, commands, style guidelines, testing strategies, conventions, potential pitfalls, and other information. There can be one or more of them, in multiple locations, and adjusted to specific users.

Optimize your prompts by adding personality or character traits and special role-play instructions, possibly relying on prompt improvers .

Install or create MCP servers to extend the abilities of your agents. Some examples can include file management or source control, but can also do stuff like giving access to production telemetry data or issue trackers.

Use files as memory storage for past efforts made by the agent.

Specify checkpoints and manage permissions to influence when user input may be required.

Monitor your usage and cost .

There are more options there, and each can branch out into lots of subtle qualitative details: workarounds for code bases too large for the model’s context, defining broader evaluation strategies, working around cut-off dates, ingesting docs, or all preferences around specific coding, testing, and interaction methods. Having these artifacts in place can significantly alter someone’s experience. Needing to come up with and maintain these could be framed as increasing the effort required for successful adoption.

I’ve seen people experimenting, even with these elements in place, failing to get good results, and then being met with “yeah, of course, that’s a terrible prompt” followed with suggestions of what to improve (things like “if the current solution works, say it works so the agent does not try to change it”, asking for real examples to try and prevent fake ones, or being more or less polite ).

For example, a coworker used a prompt that, among many other instructions, had one line stating “use the newest version of <component> so we can use <feature> ”. The agent ignored that instruction and used an older version of the component. My coworker reacted by saying “I set myself up for refactoring by not specifying the exact version.”

From an objective point of view, asking for the newest version of the component is a very specific instruction: only one version is the newest, and the feature that was specified only existed in that version. There is no ambiguity. Saying “version $X.0 ” is semantically the same. But my coworker knew, from experience, that a version number would yield better results, and took it on themselves to do better next time.

These interactions show that engineers have internalized a complex set of heuristics to guide and navigate the LLM’s idiosyncrasies. That is, they’ve built a mental model of complex and hardly predictable agentic behavior (and of how it all interacts with the set of rules and artifacts and bits of scaffolding they’ve added to their repos and sessions) to best predict what will or won’t yield good results, and then do extra corrective work ahead of time through prompting variations. This is a skill that makes a difference.

That you need to do these things might in fact point at how agentic AI does not behave with cognitive fluency, 5 and instead, the user subtly does it on its behalf in order to be productive.

Whether you will be willing to provide that skill for the machine may require a mindset or position that I’ll caricature as “I just need to get better”, as opposed to taking a stance of “the LLM needs to get better”. I suspect this stance, whether it is chosen deliberately or not, will influence how much interaction (and course-correcting) one expects to handle while still finding an agent useful or helpful.

I don’t know that engineers even realize they’re doing that type of work, that they’re essential to LLMs working for code, that the tech is fascinating but maybe not that useful without the scaffolding and constant guidance they provide. At least, people who speak of AI replacing engineers probably aren’t fully aware that while engineers could maybe be doing more work through assisting an agent than they would do alone, agents would still not do good work without the engineer. AI is normal technology , in that its adoption, propagation, and the efforts to make it work all follow predictable patterns. LLMs, as a piece of tech, mainly offer some unrealized potential.

It may sound demeaning, like I’m implying people lack awareness of their own processes, but it absolutely isn’t. The process of adaptation is often not obvious, even to the people doing it. There are lots of strategies and patterns and behaviors people pick up or develop tacitly as a part of trying to meet goals. Cognitive work that gets deeply ingrained sometimes just feels effortless, natural, and obvious. Unless you’re constantly interacting with newcomers, you forget what you take for granted—you just know what you know and get results.

By extension, my supposition is that those who won’t internalize the idiosyncrasies and the motions of doing the scaffolding work are disappointed far more quickly: they may provide more assistance to the agent than the agent provides to them, and this is seen as the AI failing to improve their usual workflow and to deliver on the wonders advertised by its makers.

The Gap Highlighted Through Adaptive Work

What AI sells is vastly different from what it delivers, particularly what it delivers out of the box. In their study of the difference between work-as-imagined (WAI) and work-as-done (WAD) , ergonomists and resilience engineers have developed a useful framing device to understand what’s going on.

Work-as-imagined describes the work as it is anticipated or expected to happen, how it can be specified and described. The work-as-done comprises the work as it is carried out, along with the supporting tasks, deviations, meanings, and their relationships to the prescribed tasks.

By looking at how people turn artifacts they’re given into useful tools, we can make sense of that gap. 6 This adjustment ends up transforming both the artifacts (by modifying and configuring them) and the people using them (through learning and by changing their behavior). The difference between the original artifact developed by the people planning the work and the forms that end up effectively used in the field offer a clue of the mismatch between WAI and WAD.

Tying this back to our LLM systems, what is imagined is powerful agents who replace engineers (at least junior ones), make everyone more productive, and that will be a total game changer. LLMs are artifacts. The scaffolding we put in place to control them are how we try to transform the artifacts into tools; the learning we do to get better at prompting and interacting with the LLMs is part of how they transform us. If what we have to do to be productive with LLMs is to add a lot of scaffolding and invest effort to gain important but poorly defined skills, we should be able to assume that what we’re sold and what we get are rather different things.

That gap implies that better designed artifacts could have better affordances, and be more appropriate to the task at hand. They would be easier to turn into productive tools. A narrow gap means fewer adaptations are required, and a wider gap implies more of them are needed.

Flipping it around, we have to ask whether the amount of scaffolding and skill required by coding agents is acceptable. If we think it is, then our agent workflows are on the right track. If we’re a bit baffled by all that’s needed to make it work well, we may rightfully suspect that we’re not being sold the right stuff, or at least stuff with the right design.

Bad Interaction Design Demands Greater Coping Skills

I fall in the baffled camp that thinks better designs are possible. In a fundamental sense, LLMs can be assumed to be there to impress you. Their general focus on anthropomorphic interfaces—just have a chat!—makes them charming, misguides us into attributing more agency and intelligence than they have, which makes it even more challenging for people to control or use them predictably. Sycophancy is one of the many challenges here, for example.

Coding assistants, particularly agents, are narrower in their interface, but they build on a similar interaction model. They aim to look like developers, independent entities that can do the actual work. The same anthropomorphic interface is in place, and we similarly must work even harder to peel the veneer of agency they have to properly predict them and apply them in controlled manners.

You can see the outline of this when a coding agent reaches limits it has no awareness of, like when it switches from boilerplate generation (where we’re often fine letting it do its thing) to core algorithms (where we want involvement to avoid major refactors) without proper hand-offs or pauses. Either precise prompting must be done to preempt and handle the mode switch, or we find the agent went too far and we must fix (or rewrite) buggy code rather than being involved at the right time.

And maybe the issue is prompting, maybe it’s the boilerplatey nature of things, maybe it’s because there was not enough training material for your language or framework. Maybe your config files aren’t asking for the right persona, or another model could do better. Maybe it’s that we don’t even know what exactly is the boundary where our involvement is more critical. Figuring that out requires skill, but also it’s kind of painful to investigate as a self-improvement workflow.

Coding agents require the scaffolding, learning, and often demand more attention than tools, but are built to look like teammates. This makes them both unwieldy tools and lousy teammates. We should either have agents designed to look like a teammate properly act like a teammate, and barring that, have a tool that behaves like a tool. This is the point I make in AI: Where in the Loop Should Humans Go? , where a dozen questions are offered to evaluate how well this is done.

Key problems that arise when we’re in the current LLM landscape include:

AI that aims to improve us can ironically end up deskilling us;

Not knowing whether we are improving the computers or augmenting people can lead to unsustainable workflows and demands;

We risk putting people in passive supervision and monitoring roles, which is known not to work well;

We may artificially constrain and pigeonhole how people approach problems, and reduce the scope of what they can do;

We can adopt known anti-patterns in team dynamics that reduce overall system efficiency;

We can create structural patterns where people are forced to become accountability scapegoats.

Hazel Weakly comes up with related complaints in Stop Building AI Tools Backwards , where she argues for design centered on collaborative learning patterns (Explain, Demonstrate, Guide, Enhance) to play to the strengths that make people and teams effective, rather than one that reinforces people into being ineffective.

Some people may hope that better models will eventually meet expectations and narrow the gap on their own. My stance is that rather than anchoring coding agent design into ideals of science fiction (magical, perfect workers granting your wishes), they should be grounded in actual science. The gap would be narrowed much more effectively then. AI tool designers should study how to integrate solutions to existing dynamics, and plan to align with known strength and limitations of automation.

We Oversell Machines by Erasing Ourselves

Being able to effectively use LLMs for programming demands a lot of scaffolding and skills. The skills needed are, however, poorly defined and highly context dependent, such that we currently don’t have great ways of improving them other than long periods of trial and error. 7

The problem is that while the skills are real and important, I would argue that the level of sophistication they demand is an accidental outcome of poor interaction design. Better design, aimed more closely to how real work is done, could drastically reduce the amount of scaffolding and learning required (and increase the ease with which learning takes place).

I don’t expect my calls to be heard. Selling sci-fi is way too effective. And as long as the AI is perceived as the engine of a new industrial revolution, decision-makers will imagine it can do so, and task people to make it so.

Things won’t change, because people are adaptable and want the system to succeed. We consequently take on the responsibility for making things work, through ongoing effort and by transforming ourselves in the process. Through that work, we make the technology appear closer to what it promises than what it actually delivers, which in turn reinforces the pressure to adopt it.

As we take charge of bridging the gap, the machine claims the praise.

1 : Dr. Cat Hicks has shared some great research on factors related to this , stating that competitive cultures that assume brilliance is innate and internal tend to lead to a much larger perceived threat from AI regarding people’s skills, whereas learning cultures with a sense of belonging lowered that threat. Upskilling can be impacted by such threats, along with other factors described in the summaries and the preprint .

2 : Related to the previous footnote, Dr. Cat Hicks here once again shares research on cumulative culture , a framing that shows how collaborative innovation and learning can be, and offers an alternative construct to individualistic explanations for software developers’ problem solving.

3 : A related concept might be Moravec’s Paradox . Roughly, this classic AI argument states that we tend to believe higher order reasoning like maths and logic is very difficult because it feels difficult to us, but the actually harder stuff (perception and whatnot) is very easy to us because we’re so optimized for it.

4 : The concept of self-trust and AI trust is explored in The Impact of Generative AI on Critical Thinking by HPH Lee and Microsoft Research. The impact of AI skill threat is better defined in the research in footnote 1 . The rest is guesswork.
The guess about “benchmarks” is based on observations that people may use heuristics like checking how it does at things you’re good at to estimate how you can trust it at things you’ve got less expertise on. This can be a useful strategy but can also raise criteria for elements where expertise may not be needed (say, boilerplate), and high expectations can lay the groundwork for easier disappointment.

5 : The Law of Fluency states that Well-adapted cognitive work occurs with a facility that belies the difficulty of resolving demands and balancing dilemmas , basically stating that if you’ve gotten good at stuff, you make it look a lot easier than it actually is to do things.

6 : This idea comes from a recent French ergonomics paper . It states that “Artifacts represent for the worker a part of the elements of WAI. These artifacts can become tools only once the workers become users, when they appropriate them. [Tools] are an aggregation of artifacts (WAI) and of usage schemas by those who use them in the field (WAD).”

7 : One interesting anecdote here is hearing people say they found it challenging to switch from their personal to corporate accounts for some providers, because something in their personal sessions had made the LLMs work better with their style of prompting and this got lost when switching.
Other factors here include elements such as how updating models can significantly impact user experience , which may point to a lack of stable feedback that can also make skill acquisition more difficult.

AI: Where in the Loop Should Humans Go?

Published: Fri, 07 Mar 2025 11:00:00 EST
Updated: Fri, 07 Mar 2025 11:00:00 EST
UTC: 2025-03-07 16:00:00+00:00
URL: https://ferd.ca/ai-where-in-the-loop-should-humans-go.html

This is a re-publishing of a blog post I originally wrote for work, but wanted on my own blog as well.

Content Preview

This is a re-publishing of a blog post I originally wrote for work , but wanted on my own blog as well.

AI is everywhere, and its impressive claims are leading to rapid adoption. At this stage, I’d qualify it as charismatic technology —something that under-delivers on what it promises, but promises so much that the industry still leverages it because we believe it will eventually deliver on these claims.

This is a known pattern. In this post, I’ll use the example of automation deployments to go over known patterns and risks in order to provide you with a list of questions to ask about potential AI solutions.

I’ll first cover a short list of base assumptions, and then borrow from scholars of cognitive systems engineering and resilience engineering to list said criteria. At the core of it is the idea that when we say we want humans in the loop, it really matters where in the loop they are.

My base assumptions

The first thing I’m going to say is that we currently do not have Artificial General Intelligence (AGI). I don’t care whether we have it in 2 years or 40 years or never; if I’m looking to deploy a tool (or an agent) that is supposed to do stuff to my production environments, it has to be able to do it now . I am not looking to be impressed, I am looking to make my life and the system better.

Another mechanism I want you to keep in mind is something called the context gap . In a nutshell, any model or automation is constructed from a narrow definition of a controlled environment, which can expand as it gains autonomy, but remains limited. By comparison, people in a system start from a broad situation and narrow definitions down and add constraints to make problem-solving tractable. One side starts from a narrow context, and one starts from a wide one—so in practice, with humans and machines, you end up seeing a type of teamwork where one constantly updates the other:

The optimal solution of a model is not an optimal solution of a problem unless the model is a perfect representation of the problem, which it never is.
— Ackoff (1979, p. 97)

Because of that mindset, I will disregard all arguments of “it’s coming soon” and “it’s getting better real fast” and instead frame what current LLM solutions are shaped like: tools and automation. As it turns out, there are lots of studies about ergonomics, tool design, collaborative design, where semi-autonomous components fit into sociotechnical systems, and how they tend to fail.

Additionally, I’ll borrow from the framing used by people who study joint cognitive systems: rather than looking only at the abilities of what a single person or tool can do, we’re going to look at the overall performance of the joint system.

This is important because if you have a tool that is built to be operated like an autonomous agent, you can get weird results in your integration. You’re essentially building an interface for the wrong kind of component—like using a joystick to ride a bicycle.

This lens will assist us in establishing general criteria about where the problems will likely be without having to test for every single one and evaluate them on benchmarks against each other.

Questions you'll want to ask

The following list of questions is meant to act as reminders—abstracting away all the theory from research papers you’d need to read—to let you think through some of the important stuff your teams should track, whether they are engineers using code generation, SREs using AIOps, or managers and execs making the call to adopt new tooling.

Are you better even after the tool is taken away?

An interesting warning comes from studying how LLMs function as learning aides . The researchers found that people who trained using LLMs tended to fail tests more when the LLMs were taken away compared to people who never studied with them, except if the prompts were specifically (and successfully) designed to help people learn.

Likewise, it’s been known for decades that when automation handles standard challenges, the operators expected to take over when they reach their limits end up worse off and generally require more training to keep the overall system performant.

While people can feel like they’re getting better and more productive with tool assistance, it doesn’t necessarily follow that they are learning or improving. Over time, there’s a serious risk that your overall system’s performance will be limited to what the automation can do—because without proper design, people keeping the automation in check will gradually lose the skills they had developed prior.

Are you augmenting the person or the computer?

Traditionally successful tools tend to work on the principle that they improve the physical or mental abilities of their operator: search tools let you go through more data than you could on your own and shift demands to external memory, a bicycle more effectively transmits force for locomotion, a blind spot alert on your car can extend your ability to pay attention to your surroundings, and so on.

Automation that augments users therefore tends to be easier to direct, and sort of extends the person’s abilities, rather than acting based on preset goals and framing. Automation that augments a machine tends to broaden the device’s scope and control by leveraging some known effects of their environment and successfully hiding them away. For software folks, an autoscaling controller is a good example of the latter.

Neither is fundamentally better nor worse than the other—but you should figure out what kind of automation you’re getting, because they fail differently. Augmenting the user implies that they can tackle a broader variety of challenges effectively. Augmenting the computers tends to mean that when the component reaches its limits, the challenges are worse for the operator.

Is it turning you into a monitor rather than helping build an understanding?

If your job is to look at the tool go and then say whether it was doing a good or bad job (and maybe take over if it does a bad job), you’re going to have problems. It has long been known that people adapt to their tools, and automation can create complacency . Self-driving cars that generally self-drive themselves well but still require a monitor are not effectively monitored.

Instead, having AI that supports people or adds perspectives to the work an operator is already doing tends to yield better long-term results than patterns where the human learns to mostly delegate and focus elsewhere.

(As a side note, this is why I tend to dislike incident summarizers. Don’t make it so people stop trying to piece together what happened! Instead, I prefer seeing tools that look at your summaries to remind you of items you may have forgotten, or that look for linguistic cues that point to biases or reductive points of view.)

Does it pigeonhole what you can look at?

When evaluating a tool, you should ask questions about where the automation lands:

Does it let you look at the world more effectively?

Does it tell you where to look in the world?

Does it force you to look somewhere specific?

Does it tell you to do something specific?

Does it force you to do something?

This is a bit of a hybrid between “Does it extend you?” and “Is it turning you into a monitor?” The five questions above let you figure that out.

As the tool becomes a source of assertions or constraints (rather than a source of information and options), the operator becomes someone who interacts with the world from inside the tool rather than someone who interacts with the world with the tool’s help . The tool stops being a tool and becomes a representation of the whole system, which means whatever limitations and internal constraints it has are then transmitted to your users.

Is it a built-in distraction?

People tend to do multiple tasks over many contexts. Some automated systems are built with alarms or alerts that require stealing someone’s focus, and unless they truly are the most critical thing their users could give attention to, they are going to be an annoyance that can lower the effectiveness of the overall system.

What perspectives does it bake in?

Tools tend to embody a given perspective. For example, AIOps tools that are built to find a root cause will likely carry the conceptual framework behind root causes in their design. More subtly, these perspectives are sometimes hidden in the type of data you get: if your AIOps agent can only see alerts, your telemetry data, and maybe your code, it will rarely be a source of suggestions on how to improve your workflows because that isn’t part of its world .

In roles that are inherently about pulling context from many disconnected sources, how on earth is automation going to make the right decisions? And moreover, who’s accountable for when it makes a poor decision on incomplete data? Surely not the buyer who installed it!

This is also one of the many ways in which automation can reinforce biases—not just based on what is in its training data, but also based on its own structure and what inputs were considered most important at design time. The tool can itself become a keyhole through which your conclusions are guided.

Is it going to become a hero?

A common trope in incident response is heroes —the few people who know everything inside and out, and who end up being necessary bottlenecks to all emergencies. They can’t go away for vacation, they’re too busy to train others, they develop blind spots that nobody can fix, and they can’t be replaced. To avoid this, you have to maintain a continuous awareness of who knows what, and crosstrain each other to always have enough redundancy.

If you have a team of multiple engineers and you add AI to it, having it do all of the tasks of a specific kind means it becomes a de facto hero to your team. If that’s okay, be aware that any outages or dysfunction in the AI agent would likely have no practical workaround. You will essentially have offshored part of your ops.

Do you need it to be perfect?

What a thing promises to be is never what it is—otherwise AWS would be enough, and Kubernetes would be enough, and JIRA would be enough, and the software would work fine with no one needing to fix things.

That just doesn’t happen. Ever. Even if it’s really, really good , it’s gonna have outages and surprises, and it’ll mess up here and there, no matter what it is. We aren’t building an omnipotent computer god, we’re building imperfect software.

You’ll want to seriously consider whether the tradeoffs you’d make in terms of quality and cost are worth it, and this is going to be a case-by-case basis. Just be careful not to fix the problem by adding a human in the loop that acts as a monitor!

Is it doing the whole job or a fraction of it?

We don’t notice major parts of our own jobs because they feel natural. A classic pattern here is one of AIs getting better at diagnosing patients, except the benchmarks are usually run on a patient chart where most of the relevant observations have already been made by someone else . Similarly, we often see AI pass a test with flying colors while it still can’t be productive at the job the test represents.

People in general have adopted a model of cognition based on information processing that’s very similar to how computers work (get data in, think, output stuff, rinse and repeat), but for decades, there have been multiple disciplines that looked harder at situated work and cognition, moving past that model. Key patterns of cognition are not just in the mind, but are also embedded in the environment and in the interactions we have with each other.

Be wary of acquiring a solution that solves what you think the problem is rather than what it actually is . We routinely show we don’t accurately know the latter.

What if we have more than one?

You probably know how straightforward it can be to write a toy project on your own, with full control of every refactor. You probably also know how this stops being true as your team grows.

As it stands today, a lot of AI agents are built within a snapshot of the current world: one or few AI tools added to teams that are mostly made up of people. By analogy, this would be like everyone selling you a computer assuming it were the first and only electronic device inside your household.

Problems arise when you go beyond these assumptions: maybe AI that writes code has to go through a code review process, but what if that code review is done by another unrelated AI agent? What happens when you get to operations and common mode failures impact components from various teams that all have agents empowered to go fix things to the best of their ability with the available data? Are they going to clash with people, or even with each other?

Humans also have that ability and tend to solve it via processes and procedures, explicit coordination, announcing what they’ll do before they do it, and calling upon each other when they need help. Will multiple agents require something equivalent, and if so, do you have it in place?

How do they cope with limited context?

Some changes that cause issues might be safe to roll back, some not (maybe they include database migrations, maybe it is better to be down than corrupting data), and some may contain changes that rolling back wouldn’t fix (maybe the workload is controlled by one or more feature flags).

Knowing what to do in these situations can sometimes be understood from code or release notes, but some situations can require different workflows involving broader parts of the organization. A risk of automation without context is that if you have situations where waiting or doing little is the best option, then you’ll need to either have automation that requires input to act, or a set of actions to quickly disable multiple types of automation as fast as possible.

Many of these may exist at the same time, and it becomes the operators’ jobs to not only maintain their own context, but also maintain a mental model of the context each of these pieces of automation has access to.

The fancier your agents, the fancier your operators’ understanding and abilities must be to properly orchestrate them. The more surprising your landscape is, the harder it can become to manage with semi-autonomous elements roaming around.

After an outage or incident, who does the learning and who does the fixing?

One way to track accountability in a system is to figure out who ends up having to learn lessons and change how things are done. It’s not always the same people or teams, and generally, learning will happen whether you want it or not.

This is more of a rhetorical question right now, because I expect that in most cases, when things go wrong, whoever is expected to monitor the AI tool is going to have to steer it in a better direction and fix it (if they can); if it can’t be fixed, then the expectation will be that the automation, as a tool, will be used more judiciously in the future.

In a nutshell, if the expectation is that your engineers are going to be doing the learning and tweaking, your AI isn’t an independent agent—it’s a tool that cosplays as an independent agent.

Do what you will—just be mindful

All in all, none of the above questions flat out say you should not use AI, nor where exactly in the loop you should put people. The key point is that you should ask that question and be aware that just adding whatever to your system is not going to substitute workers away . It will, instead, transform work and create new patterns and weaknesses.

Some of these patterns are known and well-studied. We don’t have to go rushing to rediscover them all through failures as if we were the first to ever automate something. If AI ever gets so good and so smart that it’s better than all your engineers, it won’t make a difference whether you adopt it only once it’s good. In the meanwhile, these things do matter and have real impacts, so please design your systems responsibly.

If you’re interested to know more about the theoretical elements underpinning this post, the following references—on top of whatever was already linked in the text—might be of interest:

Books:

Joint Cognitive Systems: Foundations of Cognitive Systems Engineering by Erik Hollnagel

Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David D. Woods

Cognition in the Wild by Edwin Hutchins

Behind Human Error by David D. Woods, Sydney Dekker, Richard Cook, Leila Johannesen, Nadine Sarter

Papers:

Ironies of Automation by Lisanne Bainbridge

The French-Speaking Ergonomists’ Approach to Work Activity by Daniellou

How in the World Did We Ever Get into That Mode? Mode Error and Awareness in Supervisory Control by Nadine Sarter

Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis by David D. Woods

Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity by Gary Klein and David D. Woods

MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination by Sidney Dekker

Managing the Hidden Costs of Coordination by Laura Maguire

Designing for Expertise by David D. Woods

The Impact of Generative AI on Critical Thinking by Lee et al.

Local Optimizations Don't Lead to Global Optimums

Published: Mon, 18 Nov 2024 11:00:00 EST
Updated: Mon, 18 Nov 2024 11:00:00 EST
UTC: 2024-11-18 16:00:00+00:00
URL: https://ferd.ca/local-optimizations-don-t-lead-to-global-optimums.html

I like to think that I write code deliberately. I’m an admittedly slow developer, and I want to believe I do so on purpose. I want to know as much as I can about the context of what it is that I'm automating. I also use a limited set of tools. I used old computers for a long time, both out of an environmental mindset, but also because a slower computer quickly makes it obvious when something scales poorly.1

Content Preview

I like to think that I write code deliberately. I’m an admittedly slow developer, and I want to believe I do so on purpose. I want to know as much as I can about the context of what it is that I'm automating. I also use a limited set of tools. I used old computers for a long time, both out of an environmental mindset, but also because a slower computer quickly makes it obvious when something scales poorly. 1

The idea is to seek friction, and harness it as an early signal that whatever I’m doing may need to be tweaked, readjusted. I find this friction, and even frustration in general to also be useful around learning approaches. 2

In opposition to the way I'd like to do things, everything about the tech industry is oriented towards elevated productivity, accelerated growth, and "easy" solutions to whole families of problems.

I feel that maybe we should teach people to program the way they teach martial arts, like only in the most desperate situations when all else failed should you resort to automating something. I don’t quite know if I’m just old and grumpy, seeing industry trends fly by me at a pace I don’t follow, or whether there’s really something to it, but I thought I’d take a walk through a set of ideas and concepts that motivate my stance.

This blog post has a lot of ground to cover. I'll first start with some fundamental properties of systems and how overload propagates through various bottlenecks. Then I'll go over some high-level pressures that are shared by most organizations and force trade-offs down their structure. These two aspects—load propagation and pervasive trade-offs—create the need for compensatory actions, of which we'll discuss some limits. This, finally, will be tied back to friction and ways to listen to it, because it's one of the things that underpins adaptation and keeps systems running.

Optimizing While Leaving Pressures in Place

Optimizing a frictional path without revising the system’s conditions and pressures tends to not actually improve the system. Instead, what you’re likely to do is surface brittleness in all the areas that are now exposed to the new system demands. Whether a bottleneck was invisible or well monitored, and regardless of scale, it offered an implicit form of protection that was likely taken for granted.

For a small scale example, imagine you run a small bit of software on a server, talking to a database. If you suddenly get a lot of visits, simply autoscaling the web front-end will likely leave the database unprotected and sensitive to tipping over (well, usually after having grown the connection pool, raised the connection limit, vertically scaled the servers, and so on). None of this will let you serve heavy traffic at a reasonable price until you rework your caching and data distribution strategy. Building for orders of magnitude more traffic than usual requires changing some fundamental aspects of your solution.

Similar patterns can be seen at a larger scale. An interesting case was the Clarkesworld magazine; as LLMs made it possible to produce slop at a faster rate than previously normal, an inherent bottleneck in authorship ("writing a book takes significant time and effort") was removed, leading to so much garbage that the magazine had to stop taking in submissions . They eventually ended up bearing the cost of creating a sort of imperfect queuing "spam filter" for submissions in order to accept them again. They don't necessarily publish more stories than before, they still aim to publish the good human-written stuff, there's just more costly garbage flowing through the system. 3

A similar case to look for is how doctors in the US started using generative AI to fight insurance claim denials . Of course, insurers are now expected to adopt the same technology to counteract this effect. A general issue at play here is that the private insurance system's objectives and priorities are in conflict with those of the doctors and patients. Without realigning them, most of what we can expect is an increase in costs and technological means to get the same results out of it. People who don’t or can’t use the new tools are going to be left behind.

The optimization's benefit is temporary, limited, and ultimately lost in the overall system, which has grown more complex and possibly less accessible. 4

I think LLMs are top of mind for people because they feel like a shift in how you automate. The common perspective is that machines are good at repetitive, predictable, mechanical tasks, and that solutions always suffered when it came to the fuzzy, unpredictable, and changing human-adjacent elements. LLMs look exactly the opposite of that: the computers can't do math very well anymore, but they seem to hold conversations and read intent much better. They therefore look like a huge opportunity to automate more of the human element and optimize it away, following well-established pressures and patterns. Alternatively, they seemingly increase the potential for new tools that could be created and support people in areas where none existed before.

The issues I'm discussing here clearly apply to AI, Machine Learning, and particularly LLMs. But they also are not specific to them . People who love the solution more than they appreciate the problem risk delivering clumsy integrations that aren’t really fit for purpose. This is why it feels like companies are wedging more AI in our face; that's what the investors wanted in order to signal innovativeness, or because the engineers really wanted to build cool shit, rather than solving the problems the users wanted or needed solved. The challenges around automation were always there from their earliest days and keep being in play now. They remain similar without regards to the type of automation or optimization being put in place, particularly if the system around them does not reorganize itself.

The canonical example here is what happens when an organization looms so large that people can't understand what is going on. The standard playbook around this is to start driving purely by metrics, which end up compressing away rich phenomena . Doing so faster, whether it is by gathering more data (even if we already had too much) or by summarizing harder via a LLM likely won't help run things better. Summaries, like metrics, are lossy compression. They're also not that different from management by PowerPoint slides, which we've seen cause problems in the space program, as highlighted by the Columbia report :

As information gets passed up an organization hierarchy, from people who do analysis to mid-level managers to high-level leadership, key explanations and supporting information is filtered out. In this context, it is easy to understand how a senior manager might read this PowerPoint slide and not realize that it addresses a life-threatening situation.

At many points during its investigation, the Board was surprised to receive similar presentation slides from NASA officials in place of technical reports. The Board views the endemic use of PowerPoint briefing slides instead of technical papers as an illustration of the problematic methods of technical communication at NASA.

There is no reason to think that overly aggressive summarization via PowerPoint, LLM, or metrics would not all end similarly. If your decision-making layer cannot deal with the amount of information required to centrally make informed decisions, there may be a point where the solution is to change the system's structure (and decentralize, which has its own pitfalls) rather than to optimize the existing paths without question. 5

Every actor, component, or communication channel in a system has inherent limits. Any part that suddenly becomes faster or more productive without feedback shifts greater burdens onto other parts. These other parts must adapt, adjust, pass on the cost, or stop meeting expectations. Eliminating friction from one part of the system sometimes just shifts it around. System problems tend to remain system problems regardless of how much you optimize isolated portions of them.

Pressures and Propagation

How can we know what is worth optimizing, and what is changing at a more structural level? 6 It helps to have an idea of where the pressures that create goal conflicts might come from, since they eventually lead to adaptations. Systems tend to continually be stretched to the limit of their capacity , and any improvement is instantly leveraged to accelerate the pace of existing activities.

This is usually where online people say things like "the root cause is capitalism" 7 —you shouldn't expect local solutions to fix systemic problems in the long term. The moment other players dynamically reduce their margins of maneuver to gain efficiency, you become relatively less competitive. You can think of how we could all formally prove software to be safe before shipping it, but instead we’ll compromise by using less formal methods like type analysis, tests, or feature flags to deliver acceptable products at much lower costs—both financial and cognitive. Be late to the market and you suffer, so there's a constant drive to ship faster and course-correct often.

People more hopeful or trusting of a system try to create and apply counteracting forces to maintain safe operating margins. This tends to be done through changing incentives, creating regulatory bodies, and implementing better control and reporting mechanisms. This is often the approach you'll see taken around the nuclear industry, the FAA and the aviation industry, and so on. However, there are also known patterns (such as regulatory capture ) that tend to erode these mechanisms, and even within each of these industries, surprises and adaptations are still a regular occurrence.

Ultimately, the effects of any technological change are rather unpredictable. Designing for systems where experts operate demands constantly revisiting and iterating. The concepts we define to govern systems create their own indifference to other important perspectives , and data-driven approaches carry the risk of "bias laundering" mechanisms that repeat and amplify existing flaws in the system.

Other less predictable effects can happen. Adopting objectively more accurate algorithms can create monocultures in decision-making, which can interact such that the overall system efficiency can go down compared to more diverse environments—even in the absence of disruption.

Basically, the need for increased automation isn't likely to "normalize" a system and make it more predictable. It tends to just create new types of surprises in a way that does not remove the need for adaptation nor shift pressures; it only transforms them and makes them dynamic.

Robust Yet Fragile

Embedded deeply in our view of systems is an assumption that things are stable until they are disrupted. It’s possibly where ideas like “root cause” gain their charisma: identify the one triggering disruptor (or its underlying mechanism) and then the system will be stable again. It’s conceptually a bit Newtonian in that if no force is applied, nothing will change.

A more ecological stance would instead assume that any perceived stability (while maintaining function) requires ongoing dynamic adjustments. The system is always decaying, transforming, interacting, changing. Stop interfering with it and it will eventually reach stability (without maintaining function) by breaking down or failing. If the pressures are constant and shifting as well as the counteracting mechanisms, we can assume that evolution and adaptation are required to deal with this dynamism. Over time, we should expect that the system instead evolves into a shape that fits its burdens while driven by scarcity and efficiency.

A risk in play here is that an ecosystem's pressures make it rational and necessary for all actors to optimize when they’re each other’s focal point—rather than some environmental condition. The more aggressively it is done, the more aggressively it is needed by others to stay in the game.

Robust yet fragile is the nature of systems that are well optimized for their main use cases and competitive within their environment, but which become easily upended by pressures applied from unexpected angles (that are therefore unprotected, since resources were used elsewhere instead).

Good examples of this are Just-In-Time supply chains being far more efficient than traditional ones, but being far easier to disrupt in times of disasters or pandemics . Most buffers in the supply chain (such as stock held in warehouses) had been replaced by more agile and effective production and delivery mechanisms. Particularly, the economic benefits (in stable times) and the need for competitiveness have made it tricky for many businesses not to rely on them.

The issue with optimizations driven from systemic pressures is that as you look at trimming the costs of keeping a subsystem going in times of stability, you may notice decent amounts of slack capacity that you could get rid of or drive harder in order to be more competitive in your ecosystem. That’s often resources that resilience efforts draw on to keep adapting and evolving.

Another form of rationalization in systems is one where rather than cutting "excess", the adoption and expansion of (software) platforms are used to drive economies of scale. Standardization and uniformization of patterns, methods, and processes is a good way to get more bang for your buck on an investment, to do more with less. Any such platform is going to have some things it gives its users for cheap, and some things that become otherwise challenging to do. 8 Friction felt here can both be caused by going against the platform's optimal use cases or by the platform not properly supporting some use cases—it's a signal worth listening to.

In fact, we can more or less assume that friction is coming from everywhere because it's connected to these pressures. They just happen to be pervasive, at every layer of abstraction. If we had infinite time, infinite resources, or infinite capacity, we'd never need to optimize a thing.

Compensatory Adaptive Mechanisms

Successfully navigating these pressures is essentially drawing from concepts such as graceful extensibility and sustained adaptability . In a nutshell, we're looking to know how systems stretch themselves to deal with disruptions and surprises in a context of finite resources, and also how a system manages and regulates its own abilities to do that on an ongoing basis. Remember that every actor or component of a system has inherent limits. This is also true of our ability to know what is going on, something known as local rationality .

This means that even if we're really hoping we could intervene from the system level first and avoid the (sometimes deceptively ineffective) local optimizations, it will regardless be attempted through local efforts. Knowing and detecting the friction behind it is useful for whoever wants the broader systematic view to act earlier, but large portions of the system are going to remain dynamic and co-evolving from locally felt pains and friction. Local rationality impacts everyone, even the most confident of system thinkers.

Friction shifts are unavoidable, so it's useful to also know of the ways in which they show up. Unfortunately, these shifts generally remain unseen from afar, because compensatory mechanisms and adaptation patterns hide them. 9 . So instead, it's more practical to find how to spot the compensatory patterns themselves.

One of the well-known mechanisms is the Efficiency–thoroughness trade-off (ETTO) principle, which states that since time and resources are limited, one has to trade-off efficiency and thoroughness to accomplish a task. Basically, if there's more work to do than there's capacity to do it, either you maintain thoroughness and the work accumulates or gets dropped, or you do work less thoroughly, possibly cut corners, accuracy, or you have to be less careful and keep going as fast as required.

This is also one of the patterns feeding concepts such as "deviance" (often used in normalization of deviance , although the term alone points to any variation relative to norms ), where procedures and rules defining safe work start being modified or bent unofficially, until covert work patterns grow a gap between the work as it is specified and how it is practiced . 10

Of course, another path is one of innovation, which can mean some reorganization or restructuring. We happen to be in tech, so we tend to prefer to increase capacity by using new technology. New technology is rarely neutral and never isolated. It disturbs established patterns—often on purpose, but sometimes in unexpected ways—can require a complex support system, and for everyone to adjust around it to maintain the proper operational context. Adding to this, if automation is clumsy enough, it won’t be used to its full potential to avoid distracting or burdening practitioners using it to do their work. The ongoing adaptations and trade-offs create potential risks and needs for reciprocity to anticipate and respond to new contingencies .

You basically need people who know the system, how it works, understand what is normal or abnormal, and how to work around its flaws. They are usually those who have the capacity to detect any sort of "creaking" in local parts of the system, who harness the friction and can then do some adjusting, mustering and creating slack to provide the margin to absorb surprises. They are compensating for weaknesses as they appear by providing adaptive capacity.

Some organizations may enjoy these benefits without fixing anything else by burning out employees and churning through workers, using them as a kind of human buffer for systemic stressors. This can sustain them for a while, but may eventually reach its limits.

Even without any sort of willful abuse, pressures lead a system to try to fully use or optimize away the spare capacity within. This can eventually exhaust the compensatory mechanisms it needs to function, leading to something called "decompensation".

Decompensation

Compensatory mechanisms are often called on so gradually that your average observer wouldn't even know it's taking place. Systems (or organisms) that appear absolutely healthy one day collapse, and we discover they were overextended for a long while. Let's look at congestive heart failure as an example. 11

Effects of heart damage accumulate gradually over the years—partly just by aging—and can be offset by compensatory mechanisms in the human body. As the heart becomes weaker and pumps less blood with each beat, adjustments manage to keep the overall flow constant over time. This can be done by increasing the heart rate using complex neural and hormonal signaling.

Other processes can be added to this: kidneys faced with lower blood pressure and flow can reduce how much urine they create to keep more fluid in the circulatory system, which increases cardiac filling pressure, which stretches the heart further before each beat, which adds to the stroke volume. Multiple pathways of this kind exist through the body, and they can maintain or optimize cardiac performance.

However, each of these compensatory mechanisms has less desirable consequences. The heart remains damaged and they offset it, but the organism remains unable to generate greater cardiac output such as would be required during exercise. You would therefore see "normal" cardiac performance at rest, with little ability to deal with increased demand. If the damage is gradual enough, the organism will adjust its behavior to maintain compensation: you will walk slower, take breaks while climbing stairs, and will just generally avoid situations that strain your body. This may be done without even awareness of the decreased capacity of the system, and we may even resist acknowledging that we ever slowed down.

Decompensation happens when all the compensatory mechanisms no longer prevent a downward spiral. If the heart can't maintain its output anymore, other organs (most often the kidneys) start failing. A failing organ can't overextend itself to help the heart; what was a stable negative feedback loop becomes a positive feedback loop, which quickly leads to collapse and death.

Someone with a compensated congestive heart failure appears well and stable. They have gradually adjusted their habits to cope with their limited capacity as their heart weakened through life. However, looking well and healthy can hide how precarious of a position the organism is in. Someone in their late sixties skipping their heart medication for a few days or adopting a saltier diet could be enough to tip the scales into decompensation.

Decompensation usually doesn’t happen because compensation mechanisms fail, but because their range is exhausted. A system that is compensating looks fine until it doesn’t. That's when failures may cascade and major breakdowns occur. This applies to all sorts of systems, biological as well as sociotechnical.

A common example seen in the tech industry is one where overburdened teams continuously pull small miracles and fight fires, keeping things working through major efforts. The teams are stretched thin, nobody's been on vacation for a while, and hiring is difficult because nobody wants to jump into that sort of place. All you need is one extra incident, one person falling ill or quitting, needing to add one extra feature (which nobody has bandwidth to work on), and the whole thing falls apart.

But even within purely technical subsystems, automation reaching its limits often shows up a bit like decompensation when it hands control back to a human operator who doesn't have the capacity to deal with what is going on (one of the many things pointed out by the classic text on the Ironies of Automation ). Think of an autopilot that disengages once it reached the limit of what it can do to stabilize a plane in hazardous conditions. Or of a cluster autoscaler that can no longer schedule more containers or hosts and starts crowding them until performance collapses, queues fill up, and the whole application becomes unresponsive.

Eventually, things spin out into a much bigger emergency than you'd have expected as everything appeared fine. There might have been subtle clues—too subtle to be picked up without knowing where to look—which shouldn't distract from their importance. Friction usually involves some of these indicators.

Seeking the Friction

Going back to friction being useful feedback, the question I want to ask is: how can we keep listening? The most effective actions are systemic, but the friction patterns are often local. If we detect the friction, papering over it via optimization or brute-force necessarily keeps it local, and potentially ineffective. We need to do the more complex work of turning friction into a system-level feedback signal for it to have better chances of success and sustainability. We can't cover all the clues, but surfacing key ones can be critical for the system to anticipate surprises and foster broader adaptive responses.

When we see inappropriate outcomes of a system, we should be led to wonder what about its structure makes it a normal output. What are the externalities others suffer as a consequence of the system's strengths and weaknesses? This is a big question that feels out of reach for most, and not necessarily practical for everyday life. But it’s an important one as we repeatedly make daily decisions around trading off “working a bit faster” against the impacts of the tools we adopt, whether they are environmental, philosophical, or sociopolitical.

Closer to our daily work as developers, when we see code that’s a bit messy and hard to understand, we either slow down to create and repair that understanding, or patch it up with local information and move on. When we do this with a tool that manages the information for us, are we in a situation where we accelerate ourselves by providing better framing and structure, or one where we just get where we want without acknowledging the friction? 12

If it's the latter, what are the effects of ignoring the friction? Are we creating technical debt that can’t be managed without the tools? Are we risking increasingly not reorganizing the system when it creaks, and only waiting to see obvious breaks to know it needs attention? In fact, how would you even become good at knowing what creaking sounds like if you just always slam through the hurdles?

Recognizing these patterns is a skill, and it tends to require knowing what “normal” feels like such that you can detect what is not there when you start deviating. 13

If you use a bot for code reviews, ask yourself whether it is replacing people reviewing and eroding the process. Is it providing a backstop? Are there things it can't know about that you think are important? Is it palliating already missing support? Are the additional code changes dictated by review comments worth more than the acts of reviewing and discussing the code? Do you get a different result if the bot only reviews code that someone else already reviewed to add more coverage, rather than implicitly making it easier to ignore reviews and go fast?

Work that takes time is a form of friction, and it's therefore tempting to seek ways to make it go faster. Before optimizing it away, ask yourself whether it might have outputs other than its main outputs. Maybe you’re fixing a broken process for an overextended team. Maybe you’re eroding annoying but surprisingly important opportunities for teams to learn, synchronize, share, or reflect on their practices without making room for a replacement.

When you're reworking a portion of a system to make it more automatable, ask whether any of the facilitating and structuring steps you're putting in place could also benefit people directly. I recall hearing a customer who said “We are now documenting things in human-readable text so AI can make use of it”—an investment that clearly could have been worth it for people too. Use the change of perspective as an opportunity to surface elements hidden in the broader context and ecosystem, and on which people rely implicitly.

I've been disappointed by proposals of turning LLMs into incident reviewers; I'd rather see them becoming analysis second-guessers: maybe they can point out agentive language leading to bias , elements that sound counterfactual , highlights elements that appear blameful to create blame awareness ?

If you make the decision to automate, still ask the questions and seek the friction. Systems adjust themselves and activate their adaptive capacity based on the type of challenges they face . Highlight friction. It’s useful, and it would be a waste to ignore it.

Thanks to Jordan Goodnough, Alan Kraft, and Laura Nolan for reviewing this text.

1 : I’m forced to refresh my work equipment more often now because new software appears to hunger for newer hardware at an accelerating pace.

2 : As a side note, I'd like to call out the difference between friction, where you feel resistance and that your progression is not as expected based on experience, and one of pain, where you're just making no progress at all and having a plain old bad time. I'd put "pain" in a category where you might feel more helpless, or do useless work just because that's how people first gained the experience without any good reason for it to still be learned the same today. Under this casual definition, friction is the unfamiliar feeling when getting used to your tools and seeking better ways of wielding them, and pain is injuring yourself because the tools have poor ergonomic properties.

3 : the same problem can be felt in online book retail, where spammers started hijacking the names of established authors with fake books . The cost of managing this is left to authors—and even myself, having published mostly about Erlang stuff, have had at least two fake books published under my name in the last couple years.

4 : In Energy and Equity , Ivan Illich proposes that societies built on high-speed motorized transportation create a "radical monopoly," basically stating that as the society grows around cars and scales its distances proportionally to time spent traveling, living without affording a car and its upkeep becomes harder and harder. This raises the bar of participation in such environments, and it's easy to imagine a parallel within other sociotechnical systems.

5 : AI is charismatic technology . It is tempting to think of it as the one optimization that can make decisions such that the overall system remains unchanged while its outputs improve. Its role as fantasized by science fiction is one of an industrial supply chain built to produce constantly good decisions. This does not reduce its potential for surprise or risk. Machine-as-human-replacement is most often misguided . I don't believe we're anywhere that point, and I don't think it's quite necessary to make an argument about it.

6 : Because structural changes often require a lot more time and effort than local optimizations, you sometimes need to carry both types of interventions at the same time: a piecemeal local optimization to "extend the runway", and broader interventions to change the conditions of the system. A common problem for sustainability is to assume that extending the runway forever is both possible and sufficient, and never follow up with broader acts.

7 : While capitalism has a keen ability to drive constraints of this kind, scarcity constraints are fairly universal. For example, Sonja D. Schmid, in Producing Power illustrates that some of the contributing factors that encouraged the widespread use of the RBMK reactor design in the USSR—the same design used in Chernobyl—were that its manufacturing was more easily distributed over broad geographic areas and sourced from local materials which could avoid the planned system's inefficiencies, and therefore meet electrification objectives in ways that couldn't be done with competing (and safer) reactor designs. Additionally, competing designs often needed centralized manufacturing of parts that could then not be shipped through communist USSR without having to increase the dimensions of some existing train tunnels, forcing upgrades to its rail network to open power plants.
An entirely unrelated example is that a beehive's honeycomb structure optimizes for using the least material to create a lattice of cells within a given volume.

8 : AWS or Kubernetes or your favorite framework all come with some real cool capabilities and also some real trade-offs. What they're built to do makes some things much easier, and some things much harder . Do note that when you’re building something for the first time on a schedule, prioritizing to deliver a minimal first set of features also acts as an inherent optimization phase: what you choose to build and leave for later fits that same trade-off pattern.

9 : This is similar to something called the Law of Fluency , which states that well-adapted cognitive work occurs with a facility that belies the difficulty of resolving demands and balancing dilemmas. While the law of fluency works at the individual cognitive level, I tend to assume it also shows up at larger organizational or system levels as well.

10 : Rule- and Role-retreat may also be seen when people get overloaded, but won't deviate or adjust their plans to new circumstances. This "failure to adapt" can also contribute to incidents, and is one of the reasons why some forms of deviations have to be considered positive for the system.

11 : Most of the information in this section came from Dr. Richard I. Cook, explaining the concept in a group discussion, a few years before his passing.

12 : this isn’t purely a tooling decision; you also make this type of call every time you choose to refactor code to create an abstraction instead of copy/pasting bits of it around.

13 : I believe but can't prove that there's also a tenuous but real path between the small-scale frictions, annoyances, and injustices we can let slip, and how they can be allowed to propagate and grow in greater systemic scales. There's always tremendously important work done at the local level, where people bridge the gap between what the system orders and what the world needs. If there are paths leading the feedback up from the local, they are critical to keeping things aligned. I'm unsure what the links between them are, but I like to think that small adjustments made by people with agency are part of a negative feedback loop partially keeping things in check.