Musings of an AI engineer on using AI for engineering
I recently read a thought-provoking HBR article that argued that AI doesn't reduce work, but rather intensifies it. This resonated with me perhaps more than it should have for someone who builds AI systems for a living – especially as my own workflows have intensified over the past couple of months with ever more capable AI agents.
We should use agents, no doubt. But how?
There is a particular irony in being an AI engineer who uses AI agents to do engineering. You spend your days building systems meant to make work easier — and end them more mentally spent than before. You write less code, sometimes none at all, yet spend far more time deciding what should be built and why. Instead of doing things the one way you are used to, you now find yourself weighing the pros and cons of ten different approaches that your preferred AI agent presents for every problem. And you're running ten agents in parallel.
The HBR piece captures something real: AI shifts the type of work, not the amount of it. And this new type of work is, in many ways, harder. It is also more emotionally loaded than we like to admit. There is often some anxiety that comes from delegating to something powerful and fast, then realizing the decision-making — the accountability — still lands on you. You don't want to waste AI agent cycles, or tokens, especially if you are footing the bill. The workload gets “evened out” in the sense that you’re always on: always scoping, always checking, always steering. It starts to feel less like coding and more like… management.
Everyone is now a line manager
Here is the thing nobody tells you when you start delegating to AI agents: you become a manager. Not organizationally — you don’t get a promotion or a pay rise — but structurally.
An agent is not a tool in the way a compiler is a tool. It is an entity with a task, a context, a set of capabilities, and an unmatched level of energy and determination. It also has a propensity to go spectacularly off-rails in ways that are genuinely difficult to anticipate. Managing it requires the same things managing an intern requires: clear scoping, well-defined objectives, some guardrails, and regular 1:1s.
The difference is that agents require more scoping and more direction per unit time than a human would. A human engineer brings tacit knowledge, common sense, and the social intelligence to ask a clarifying question when a task doesn’t make sense. An agent might confidently go down the wrong path and produce something syntactically perfect and semantically wrong. A human engineer also takes regular breaks and works 9-to-5 (or 996) – but an AI agent takes a break only when it is waiting for your decision or permission to run some new bash command.
Once you accept this framing — that using agents is fundamentally a management activity — a lot of other things fall into place. The engineers who thrive are the ones who can write a good brief, decompose a problem cleanly, and review output critically. These are not coding skills. They are communication and judgment skills often seen as hallmarks of staff engineers. Now even junior engineers need to develop them.
And as agents become more capable of self-organizing into hierarchies and working in groups — which we'll undoubtedly see a lot more of this year — some of us will find ourselves managing managers of agents. That sounds crazy, and would have been unimaginable even a year or two ago. But it aligns with what managers already do today: coordinate, prioritize, and give direction — just at a very different pace.
More leverage, less understanding?
There is a subtler problem, though, that I find more troubling: you start to understand the code less.
When we write some code or design a system, we understand it. We know why a particular loop is structured that way, why we chose that specific data structure or config, and what the edge cases are. When an agent writes the same function — and it will most likely write it better/more efficiently than most of us would — we have to read it to understand it. This sounds trivially obvious, but the implication is not. Reading and understanding code you did not write is slower and shallower than understanding code you did write. The act of construction is also an act of comprehension.
For tenured engineers who already know a code base well, this is probably a manageable inconvenience. You have enough context to evaluate the agent's output, catch the subtle bugs, and understand the tradeoffs. But for newcomers — e.g. someone spinning up on a code base for the first time — the risk is serious. The traditional path to understanding a system was to build parts of it. If agents are building all the parts, what is the path?
And then there is the code review problem. Humans review code, and humans miss things — we know this empirically. Would a team of agents reviewing a pull request not do a better job, at least on the mechanics? Probably, yes, given a sufficiently detailed pull request (PR) description (and a sufficiently detailed skills.md file). The agents would probably not question the motivation or purpose of the PR though – and sometimes that is critical.
This also raises an uncomfortable question: if agents are writing and reviewing code, what exactly is the responsibility of a software engineer? Taste? Architecture? The "motivation" section of the PR? These are not nothing, but they represent a meaningful contraction of the craft. In this brave new world where agents do most of the work, what exactly distinguishes a great engineer from a good one?
On ownership and craft
The issues of ownership and craft are not sufficiently brought up in discussions about AI and engineering. (Full disclosure – I work at Stripe, and one of our operating principles is "create with craft and beauty".)
For every engineer, there is a particular satisfaction, perhaps even joy, in building something — a system, a feature, even a single function — that you thought through and made real. This is not sentimentality. Ownership produces better outcomes: you debug more carefully, you think harder about edge cases, you care about the technical and business consequences of your decisions because they are yours. You care about the elegance of your code, even if elegance does not make your code any more functional. When an agent does the building and you do the reviewing, you are merely a curator, not a creator. There is, inevitably, a certain psychological distance between you and the output.
For straightforward work, such as updating feature flags, writing unit tests, or extending an API, this does not matter. But for building something genuinely new, something that requires a novel approach and carries a high chance of failure, that loss of ownership compounds over time. Like helping friends watch their pets while they travel: not impossible to care about, but definitely different, and the difference matters.
On the other hand, this might not actually matter if we are simply going to build so much more software – individually and collectively – that the craft of any individual piece becomes irrelevant. Marc Andreessen once said that software is eating the world. Perhaps AI agents are now eating the software. If that is true, maybe the point about beauty and craft is moot. We are in an industrial era now (as far as building software is concerned), and arguing for craftsmanship is akin to arguing against factory production lines.
The real question is: what is ownership and craft in the age of highly capable AI agents? What is craft and beauty when it comes to AI-written software?
Reasoning != Understanding
As models become more advanced with ever-larger context windows, reasoning is no longer the bottleneck. Understanding is.
Current models reason remarkably well within their context windows. Unlike just six months ago, the failures I now see in practice are almost never "the model hallucinated wildly and reasoned wrongly". They are "the model didn't index sufficiently on the rationale of this work" or "the model has did not understand the broader system it is operating in." What we desperately need — and what I expect will be enormously valuable when it exists — is an efficient, persistent, and self-learning world model. A structure that lets agents maintain and update a coherent representation of the systems they are working in, the decisions that have been made, the constraints that apply.
LLMs today "understand" the world by virtue of observations encoded in their training data — which is to say, primarily through text. But text can be a pretty lossy representation, and worse, it is static. When humans perceive the world, we don't transcribe everything into words before remembering them, and we don't ever stop learning. We remember the sights, the sounds, the weight of things, the way something feels. And crucially, we update those memories as we go.
There are better representation structures for certain kinds of knowledge. A graph, for instance, is vastly more efficient than text for representing relationships — what entities are linked to what other entities, how many distinct edges are there for a given node, and which parts of the graph look alike. Text forces a linear narrative onto inherently non-linear information. Agents need built-in access to structured and up-to-date representations of the world they operate in, not just snapshots of text from a training cutoff.
This is, incidentally, why I think the current wave of "give the agent more tools" is partly missing the point. Tools help agents act. Understanding helps agents know what to act on. The two are not the same; we need both. Perhaps Yann LeCun's AMI — a startup focused on world models — is on to something after all.
What AI means for the engineering ladder
If agents can write code and review code, what should the hiring process for engineers look like? The traditional technical interview — white-board system design, LeetCode exercises on HackerRank, and so on — is premised on a world where engineers write code from scratch under time pressure and without internet access. That world is long over, though the interview processes of even top tech companies remain unchanged – and thus seriously lag behind.
What replaces these old-school interview loops? How do you evaluate someone's judgment about AI output? Or their taste? Or their ability to scope a problem for an agent? Or juggle five different agents all at once? I don't know how to interview for these things yet, and I suspect most hiring managers don't either.
Can engineering leaders accept that the interviews that they once agonized over (and clearly excelled in) are no longer the right yardstick for new engineering candidates today?
There is a deeper structural problem here too. If entry-level roles — the ones where junior engineers historically learned by doing — are automated away, how does anyone get promoted to senior or staff roles? The traditional path was: write lots of code, deploy things to production, make lots of mistakes, and slowly develop taste and judgment over years of iteration. If the iteration is happening at the agent layer, the learning may no longer happen at the human layer.
And yet — are the only worthwhile roles today the ones building the agents themselves? Should everyone apply to Anthropic, OpenAI and the other frontier AI labs? Will there still be a need for engineers in specific domains, given how Claude and Codex can basically create anything? These questions feel obviously wrong, but there is a certain level of anxiety I sense amongst friends working in tech (and something I can't stop thinking about myself). The economics of this transition are genuinely uncertain, and the distribution of value (i.e. who gets paid what) is going to shift in ways that are hard to predict. A bifurcation between those who build the systems and those who use them seems likely; what that does to wages and career trajectories is an open question.
I believe that as an industry, we should continue investing in more junior engineers. They are the ones who are starting their careers in this brave new world of AI agents – one could argue they are truly "AI natives", just like how those of us who grew up in the late 80s/90s were "digital natives". And they will be the ones who will shape how engineering is done in the future.
What remains human
There is a version of this story where AI agents are the equivalent of the industrial revolution for not just engineering but all white-collar work. AI agents could be disruptive, affecting different job families in a multitude of ways, but ultimately expansive in what they make possible for humans – pushing out the pareto frontier of our collective output. There is also a version where we get all of the disruption without none of the economic expansion. I don't know which one we are in. We seem to be participants in a live play for which we don't know the ending.
But here is what I keep coming back to: models are, at their core, optimized to be average. They are trained to predict the most likely next token (alright, I know, cue diffusion models everyone), the most statistically reasonable output – the response that satisfies the most people. This is clearly useful. It is also, by definition, not where originality lives.
To be human is to be imperfect — we all have a particular sensibility, a certain style, and a set of obsessions (or things we care more about, sometimes to the point of irrationality). Models do not have obsessions. They have distributions. And the things that make work worth doing — getting into the flow of things, injecting hidden Easter eggs in a code base, exerting a genuine point of view when crafting something, and making sure the finished product is not just correct but actually feels right — those still require a human to care enough to put them there.
So: use the agents. Subscribe to the $200/month Claude Max plan (no, I don't get a commission). Delegate aggressively. Let them write the boilerplate, the tests, the first and last draft of everything. Perhaps even let them deploy stuff to production. But do not let them tell you what you want. Take time to think slowly. Cultivate opinions. Get interested in things for no good reason. Feel free to make mistakes and learn from them.
The physical world, at least for now, still requires human participants. I find some comfort in this — in loading the dishwasher in my very particular way, in optimizing my espresso shots, and having honest conversations with friends that aren't prompted and aren't scripted. Though I am also aware that Optimus robots are coming for the dishes eventually, so perhaps I should not get too attached.
We are going to have to figure out what it means to be human in a world where most cognitive labor is automated. The schools are not ready for this question. Neither, honestly, are we. But it is the question, and I would rather mull over it (and sleep on it) than pretend the answer is obvious.
For humans, the best insights tend to arrive unannounced: on the commute home, out in the woods on a quiet ski trail, or during a shower. And, dear AI agent, those are things you cannot do (yet)!
Member discussion