Don Norman gave interface design its two most durable ideas: the gulf of execution and the gulf of evaluation. For forty years, the first got almost all the attention — and deserved it.
Then AI flipped that balance.
Go deeper: the two gulfs, defined
The gulf of execution is the gap between what a user wants and their ability to make the system do it — every menu, mode, and seventeen-step flow lives here. The gulf of evaluation is the gap between what the system did and the user's ability to tell whether it did it well. Pre-AI software demanded that humans translate intent into machine operations, so designers rightly spent careers shrinking the execution side. Evaluation was usually cheap: you clicked "bold," the text got bold, and the feedback loop confirmed itself.
Execution is collapsing toward zero
When a user can type — or say — what they want and the system produces it, there's little left to execute. No commands to memorize. No interface to master.
Execution design isn't dead. Someone still has to make intent expressible: clear affordances for what the system can and can't do, graceful ways to refine a request. But it's no longer where products live or die.
Evaluation is now the whole bill
Here's the trade nobody priced in: every unit of execution effort the AI absorbs is reissued to the user as evaluation effort.
- The system writes the code in a second. The user spends an hour deciding whether it's safe to ship.
- The assistant drafts the email instantly. The user rereads it three times, wondering if the tone lands wrong with a client.
- The analysis appears in moments. The user has no idea whether the numbers underneath are real.
And when a product gives users no support for that judgment, they don't slow down and verify. They leave. Early distrust doesn't produce complaints — it produces silent, immediate churn.
Example
same capability, opposite outcomes
Before: "Q3 churn was driven primarily by onboarding friction." A paragraph of confident prose, no sources. The user's only options are blind trust or redoing the analysis by hand — and either way, they've stopped trusting the product.
After: The same finding, plus the three data sources it drew from (linked), the assumption it made ("excluding trial accounts"), and a note: "Confidence is high on the friction finding, low on the pricing effect — only 40 accounts in that segment." Now the user can judge in thirty seconds.
That second version is a bridged evaluation gulf. It's slower to build, and it's the one that gets a second session.
The flip, side by side
| Pre-AI product design | AI-era product design |
|---|---|
| Ease of use was the moat | Ease of verification is the moat |
| Feedback confirmed actions | Feedback must justify outputs |
| Errors were user errors to prevent | Errors are system errors to expose and recover |
| Success = user masters the interface | Success = user can confidently delegate |
| Trust accrued slowly through reliability | Trust is won or lost in the first sessions |
Go deeper: why probabilistic systems can't earn trust passively
A deterministic interface earns trust passively: it does the same thing every time, and predictability does the work. A probabilistic system cannot earn trust that way, because the same input can produce different output. Trust has to be designed deliberately — into the reasoning the system shows, the sources it cites, the confidence it signals, and the undo it guarantees.
Evaluation is the foundation, not the polish
Evaluation surfaces are not a layer you add after the feature works. They are the feature.
- A recommendation without visible assumptions asks users to trust judgment they can't inspect.
- A summary without sources asks them to gamble their reputation on the system's homework.
- An action without preview or undo asks them to accept irreversible consequences from a system they've known for four minutes.
Each is an evaluation gulf left unbridged — and each one converts a curious trial user into a quiet ex-user.
Try this: take your product's most important AI output and ask five questions. Where did the data come from — can the user see it? What did the system assume? How confident is it, and does the interface say so? What happens if the user says no? How do they take it back if they said yes too fast? Every "they can't" is a gulf you're asking users to cross alone.
Execution design made products usable. Evaluation design makes them trustworthy. In a market where anyone can ship the same capability in a weekend, trustworthy is the only version of usable that still compounds.