The End of Instructions

For fifty years we told our tools exactly what to do. Now we tell them what we think and they infer the rest. A field guide to the shift from instruction to judgment.

A few weeks ago I was looking at a screen with three pricing options on it. They were laid out cleanly, the typography was correct, nothing was broken. I looked at them for a while and then I said, more or less, "Yes, but we need a very brief explanation of the cost and time difference between the options."

I did not say anything about layout. I did not specify a typeface, a hierarchy, a spacing value, a component, or a place on the page. I did not describe the thing I wanted. I described a dissatisfaction and pointed, vaguely, in a direction. And the system I was working with did something that would have been unremarkable to a colleague and was, on reflection, remarkable coming from software: it understood that the real problem was not a missing sentence. The real problem was that a person looking at those three options could not tell, quickly, why they would choose one over another. The deeper reading was something like: the user is struggling to differentiate. Reduce the cognitive load. Make the decision clearer.

That gap, between what I said and what was understood, is the whole subject of this essay. I have been building software for fifteen years, and I have spent most of that time learning to say exactly what I mean to machines that punished me when I did not. Something has changed in the last two years that is easy to mistake for a change in what software can make. It is actually a change in what we have to say to make it. That is a smaller-sounding claim and a much larger one.

Fifty years of telling machines exactly what we meant

The history of computing is, from one angle, a long negotiation over who carries the burden of precision. For most of that history, the answer was: the human, completely.

A punch card did exactly what its holes said and nothing else. Assembly language did what you wrote, instruction by instruction, and your intentions were irrelevant to it. The command line was a covenant of exactness: a misplaced character was not interpreted charitably, it was an error. We tend to remember the graphical interface as the great humanizing leap, and in some ways it was, but it did not change the underlying contract. A button is still a precise instruction. A slider is a precise instruction with a range. A form is a precise instruction broken into fields. The whole apparatus of modern software, the thing we have spent decades refining, is a system for letting humans issue exact commands a little more comfortably. The comfort improved. The contract did not. You still had to know what you wanted, and you still had to translate it into the machine's terms before the machine would move.

This is worth sitting with, because it shaped a few generations of how we think about skill. To be good with software was to be good at translation. The expert was the person who could take a fuzzy human goal and decompose it into the exact sequence of operations that would produce it. That decomposition was hard, it was learnable, and it was valuable, and so we built entire professions on top of it. The person who could translate "I want to understand my sales" into the correct pivot table was valuable precisely because the pivot table would not meet them halfway.

Search loosened the contract slightly. A search box forgives a misspelling and guesses at intent, and for twenty years that felt like the frontier of forgiveness. But search was still a retrieval system. It found the thing that already existed. It did not infer what you were trying to do and then do it.

What has changed is that the burden of precision has begun, for the first time, to move off of the human and onto the machine. Not entirely, and not always well. But the direction is unmistakable, and direction is what matters.

Why "prompting" is the wrong word for this

We have settled on the word "prompting" to describe how people work with these systems, and the word is doing us a disservice. It carries the old contract inside it. To prompt is to author an instruction, just a longer and chattier one, and so we talk about prompt engineering as though the skill is still translation, still the careful construction of an exact specification, only now in prose instead of code.

That is not what most real work with these systems looks like once you get past the first hour. What it looks like is closer to how you would work with a sharp junior colleague who happens to be very fast. You do not write them a specification. You react. You say "this is close but it feels too aggressive." You say "I don't love it and I can't tell you why yet." You say "warmer." You say "what if it were the opposite." You say the thing I said about the pricing options, which was not an instruction at all but a complaint with a heading attached. And the work moves anyway, because the gap between your vague reaction and a concrete next version is now being closed on the other side.

So the more honest description is not that we have learned to prompt. It is that we have stopped issuing instructions and started issuing judgments. The unit of communication is no longer the command. It is the reaction: the preference, the critique, the intuition, the dissatisfaction, the directional nudge that we could not fully justify if you asked us to. We have moved from authoring procedures to authoring judgments, and the machine has taken over the part we used to think was the work, which was turning the judgment into the thing.

Three layers: the request, the intent, and the goal

It helps to be precise about what is actually being communicated, because "judgment" is a slippery word. I find it useful to separate three layers, stacked from shallow to deep.

The first layer is the explicit request. This is the literal surface of what was said. "Add a brief explanation of cost and time." Taken alone, it is an instruction, and a system that only hears this layer behaves like the old software: it adds the sentence, and the underlying problem remains, and you go around again.

The second layer is the intent. This is what the request was for. The intent behind "add an explanation" was "I cannot tell these options apart and I am worried the visitor can't either." A system that hears this layer does not just add a sentence. It looks at the decision the person is actually trying to support and reshapes the moment to support it, which might mean a sentence, or might mean a small comparison, or might mean removing an option that was muddying the other two.

The third layer is the goal, and this is the one that is genuinely new. The goal is the thing the intent is in service of, which the person may not have stated and may not even be holding consciously in the moment. The goal behind the intent behind the request might be "I want more people to choose with confidence and fewer to bounce because they froze." That is a business objective, and it was nowhere in what I said. But it was the reason I said anything at all.

Old software lived entirely on the first layer. Good colleagues operate on the second and reach, sometimes, for the third. What is striking about the current systems is that they have begun to climb this ladder. They infer intent routinely now. They occasionally infer goals. And the trajectory of the technology points directly at the top of the ladder, at systems that infer the objective you are pursuing before you have managed to put it into words. That is a different relationship with a tool than any we have had. It is worth being clear-eyed about both how useful and how strange it is.

Taste is a decision-making system, not a decoration

To understand why this shift favors certain kinds of people, you have to take taste seriously as a faculty rather than treating it as a polite word for opinion.

The cognitive psychology here is well-trodden. Experts do not, for the most part, decide by deliberation. Gary Klein's work on how firefighters and nurses and chess players make decisions under pressure found that they rarely compare options at all. They recognize. A pattern presents itself and the right move comes with it, fast, before any conscious reasoning, and the reasoning, when it happens, is mostly a story told afterward to justify what recognition already produced. He called it recognition-primed decision making, and once you see it you see it everywhere that expertise lives.

Taste is this faculty pointed at quality. The senior designer who glances at a layout and says "no" before they can tell you why is not being precious. They are running a pattern-matcher trained on twenty years of examples, and it has returned a result faster than their capacity to explain it. The explanation, when you press for it, arrives a beat later and is often a rationalization of a judgment that was already made. This is not a flaw in their process. It is the process. It is what expertise feels like from the inside.

Michael Polanyi gave us the phrase for the problem this creates: we can know more than we can tell. The most valuable knowledge in any craft is tacit. It lives below the waterline of language. The master cannot fully transmit it to the apprentice in words, which is why apprenticeship takes years and consists mostly of doing the thing badly next to someone who does it well until the pattern transfers. Design systems, brand guidelines, and style guides are all attempts to drag this tacit knowledge above the waterline and write it down, and they are all, necessarily, lossy. The rules are a little dead because the taste that produced them was alive.

Here is what is new, and it is the hinge of the whole argument. For the entire history of tools, the fact that taste could not be fully articulated meant it could not be fully delegated. You could not hand your judgment to a tool, because a tool needed instructions and your judgment refused to become instructions. You had to carry the judgment all the way down into the production yourself, or stand over someone who could. For the first time, we have systems that act on the output of the taste-function without requiring you to articulate the function. You can run your recognition, say "warmer," and the gap gets closed. The untellable part of what you know has become, suddenly, usable.

Production was never the bottleneck

This reframes what creative and knowledge work actually is, and the reframing is uncomfortable because it asks us to admit something about what we were being paid for.

Think of any piece of creative work as a loop. Something is proposed. Someone reacts to it. It is revised. Around and around until it is good or the deadline arrives. For all of history, humans did both halves of that loop. We proposed, by producing, and we reacted, by judging. And because producing was slow and effortful and required real skill, we came to believe that producing was the work, and that the judgment was a sort of finishing layer on top of it.

I think we had it backwards, and the current moment is exposing it. The scarce thing was never the production. A director does not hold the camera. A creative director does not set the type. An architect does not pour the concrete. The most senior people in every making discipline are precisely the ones who have stopped making with their hands and spend their days reacting, choosing, killing, approving, and saying "warmer" to people and now to machines. We knew, in the structure of our own organizations, that judgment was the senior function and production was the junior one. We just had production and judgment bundled together in most individual workers, so we mistook the visible labor for the value. The labor was visible. The judgment was not. We paid for the bundle and told ourselves we were paying for the labor.

When a machine can do the production half of the loop, the bundle comes apart, and what is left standing in the open is the half we could never quite see: the knowing what is good, and the knowing why it matters here, in this case, for this audience, against this goal. That was always the scarce resource. It is just that for the first time it is unbundled from the labor that used to hide it.

Who becomes more valuable, and the trap inside that

The naive read of all this is that creative and technical workers are in trouble, because the machine can now do what they did. The more careful read is close to the opposite, and it matters who you are.

If your contribution was production without judgment, the kind of work where someone tells you exactly what to make and you make it competently, then the ground is genuinely moving under you, because that is the half the machine now does well. But if your contribution is judgment, if you are the person who can look at three options and know, fast and for reasons you can eventually defend, which one is right and what is wrong with the other two, then the binding constraint of the whole system has just shifted onto your faculty. You become more valuable, not less, because the thing that is now scarce relative to production is exactly the thing you have the most of. The experienced designer, the architect, the creative director, the product thinker who has internalized a thousand cases, these people are not being automated. They are being amplified, because the slow part of their loop just got fast and the only remaining limit is the quality of their judgment.

There is a trap inside this, and I would be lying by omission if I did not name it. Judgment is not innate. It is compressed experience, and experience comes from producing, from making the thing badly enough times that your pattern-matcher gets trained. If an entire generation skips the production years because the machine does them, where does the judgment come from? You cannot evaluate well what you have never struggled to make. The taste that makes a senior valuable was paid for in junior years of bad work. A world that removes the bad work may also, quietly, remove the path to good judgment, and we will not notice the bill for a decade. I do not have a clean answer to this. I suspect the answer involves deliberately keeping humans in the production loop not because it is efficient but because it is how taste is grown, the way a pilot still hand-flies the plane sometimes so the skill does not die in the autopilot. But I am not certain, and anyone who tells you they are is selling something.

The revision history is the real artifact

Here is the idea I cannot stop turning over, and it is where I think the next decade of valuable software actually lives.

We treat the final artifact as the thing. The shipped design, the merged code, the published page. We keep those carefully. And we treat everything that came before, the rejected versions, the comment threads, the "no, warmer," the thing we killed at eleven at night, the third draft that was wrong in an instructive way, as waste. We throw it out, or we let it rot in a version history nobody reads.

I think we have it exactly backwards. The final artifact is a single point. It is one sample drawn from a function we cannot see, the function that maps "things that could exist" to "how much I want them to exist," which is to say, your taste. One point tells you almost nothing about the function. But the revision history is not a point. It is a sequence of differences, each one a small gradient: this was rejected and that was approved, this was made warmer and that was made sharper, this option was removed and these two were kept. Every "no" is a direction. Every "yes, but" is a vector. The sequence of judgments that produced an artifact contains vastly more information about the underlying taste than the artifact itself, because the artifact is where the function happened to land and the history is the shape of the function on the way down.

Almost no organization treats its revision history as its most valuable record of judgment. We keep the conclusions and discard the reasoning, when the reasoning is the part that does not come back when the senior person leaves. The terse code review where the staff engineer wrote "this works but it will hurt us in eighteen months, do it the other way," and was right, is a more valuable artifact than the code. It is judgment, captured, in the wild, applied to a real case. And it is sitting in a pull request that no system will ever learn from.

Approval, rejection, and the wince are data

If you take that seriously, then a lot of things that look like ephemera turn out to be signal.

The speed of a yes is information. The friction of a no is information. The thing that makes a senior person wince, before they have said a word, is information, and it is some of the highest-quality information a company produces, because it is recognition firing in real time on a real case. Approval patterns, the edits people make to a draft, the parts they leave untouched, the proposals that get killed quickly versus the ones that get argued over, the emotional temperature of the reactions: taken together these form a map of preference far more detailed than any brand guideline. Organizational behavior has always known that the real culture lives in what gets praised and what gets quietly rejected, not in the values printed on the wall. Taste is the same. It is distributed across the people who have been told "that's not us" enough times to know, without being able to fully say, what "us" is.

We have never had a way to capture this. It evaporated as it happened. The wince left no trace. For the first time, we have systems that could, in principle, observe the stream of judgments and learn from it, which raises a question that I think is the real one.

Preference memory, and the next category of software

If taste is a function, and the revision history is the gradient of that function, then a system that remembers your judgments across many projects begins, slowly, to approximate the function itself. Not to replace it. To hold a working model of it, the way a long-tenured deputy holds a working model of their principal's preferences and can draft the memo that will get approved on the first pass because they have absorbed a thousand prior approvals and rejections.

This is a different kind of tool than we have built before. A drawing tool is inert. It does what you push it to do and remembers nothing about why you pushed. A tool with preference memory arrives already knowing that you reject false urgency in copy, that you hate a certain kind of rounded friendliness, that you will always choose the quieter of two options when the louder one is merely clever, that your organization warms to evidence and cools to adjectives. It does not wait for the instruction. It pre-empts the judgment, because it has seen ten thousand of your judgments and built a model of the taste behind them.

I think this is the shape of a major software category that does not quite exist yet, and I do not have a perfect name for it. "Taste infrastructure" is close. "Preference memory" is close. The point is that the defensible thing, the thing that compounds, may not be a better editor or a faster generator, both of which will commoditize fast. The defensible thing may be the accumulated, structured memory of an individual's and an organization's judgment over time. The generator is a commodity. The memory of your taste is not, because it can only be built by living alongside your decisions, and it gets more valuable with every one. The company that captures its judgment compounds it. The company that lets it walk out the door with every departing senior keeps paying to relearn what it already knew.

When the system infers the goal you never stated

The frontier of this, and the place where I get both excited and uneasy, is the top of the ladder I described earlier: goal inference. A system that has watched enough of your judgments does not only learn your aesthetic preferences. It begins to infer your objectives, including the ones you have not articulated and possibly cannot.

Run the pricing example forward. A system notices that across a dozen projects you consistently soften urgency, remove the countdown timers, cut the "only three left," choose the calmer call to action. It could infer a shallow preference, "this person likes calm design." Or it could infer a goal, "this person is building for trust over a long relationship and believes pressure tactics poison that, even when they convert in the short term." If it has inferred the goal, it stops merely matching your style and starts proposing things in service of an objective you never typed, the way a great deputy starts solving the problem you actually have rather than the task you actually assigned.

This is enormously powerful and it is exactly where judgment has to stay human, and not for the reason people usually give. The usual reason is that the machine cannot do it. That is increasingly untrue, and I would rather not rest an argument on a capability gap that keeps closing. The real reason is that someone has to remain the author of the goal, as opposed to the author of the procedure. A system that infers your objectives can serve them, and it can also, without anyone intending it, flatter your taste into a rut, optimize an objective you absorbed without choosing, and make it harder every year to want something genuinely new, because the path of least resistance is now paved with your own past preferences. The procedure can be delegated. The question of what we are for cannot, or rather it can, but only by accident, and accidents in that category are how you wake up having built something efficient and pointless. The machine should author the how. The human has to keep authoring the why.

Organizations have always tried to capture judgment

Step back and this looks less like a technology story and more like a knowledge-management story we have been failing at for a century.

Every organization tries to capture judgment and operationalize it. The brand guideline is captured judgment. The design system is captured judgment. The architectural decision record, the style guide, the checklist, the senior person sitting in the review whose entire job is to be a judgment that does not scale: all of these are attempts to take the tacit thing that lives in a few people's recognition and make it reusable by the many. And all of them are lossy in the same way, because they convert living taste into dead rules, and the rules cannot cover the case the rule-writer did not foresee, which is precisely the case where judgment was needed.

Nonaka described the engine of a knowing organization as the conversion between tacit and explicit knowledge, the slow turning of what individuals sense into what the group can use, and back again. The reason this has always been so hard is that the tacit-to-explicit step required articulation, and the most valuable knowledge resisted articulation, by definition. What is genuinely new is that we may finally have a path to capturing judgment without first forcing it into rules. Not by asking the senior designer to write down why, which they cannot fully do, but by learning from the stream of their actual decisions, which they make all day and which carry the why implicitly. An organization that does this is no longer trying to compress its taste into a document that goes stale. It is maintaining a living model of its own judgment, trained on the real decisions of its best people, available to everyone, improving with use. That is a thing we have wanted for a hundred years and have never had the tools to build. We might be about to.

What is actually ending

So I want to be careful about what the title of this essay claims is ending, because it is not the designer, and it is not the engineer, and it is certainly not the human.

What is ending is the era in which the human contribution to creative and knowledge work was bundled, and hidden, inside the labor of production. For fifty years we communicated with our tools by translating our intentions into precise instructions, and we got so good at the translating that we mistook it for the work. The translation is the part that is ending. The machine will carry the burden of precision now, imperfectly and then less imperfectly, and it will turn our reactions into artifacts, our "warmer" into warmth, our "yes, but" into the next version.

What is left, standing in the open where we can finally see it, is judgment. The taste that knows which of three options is right and can eventually say why. The strategic sense of what the work is for. The stance, the actual opinion about what is good and what matters, held by a person who has earned it across enough cases that their recognition can be trusted. That was always the scarce and valuable thing. We just could not see it clearly while it was wearing the costume of production.

I find this clarifying rather than frightening, though I understand the fear, and the trap I named earlier is real and unsolved. But I keep coming back to that screen with the three pricing options. I did not give an instruction. I gave a judgment, badly phrased, half-formed, pointing in a direction I trusted more than I could justify. And the work got better. The next decade, I think, belongs to the people and the organizations who can do that well: who can look at the thing and say, fast and for real reasons, "yes, but warmer," and mean something specific and correct by it. The instruction is dead. The judgment is the job. It always was. We can just finally see it.

Work with us

Running an operation that lives on edge cases?

We build the operational software and AI that absorbs the chaos so your team’s screen stays calm — in logistics, field services, and industrial ops. Tell us what you’re fixing; most replies land within one business day.