Field guide
Guide 10 / 10Designing the wait3 min

Slow Is the New Broken: Designing the Wait in AI Products

Users punish silence, not latency. Streaming, named work, partial results, and honest progress — the wait is a design surface.

The human thresholds haven't moved in decades: about a tenth of a second feels instant, about a second keeps people in flow, and past ten seconds you've lost the room. Generative AI blows through all three on a routine basis — and it does so during the most fragile moments a product has, the first sessions, when the user is still deciding whether any of this is worth believing.

Here's the part most teams miss: users don't experience your latency. They experience your silence.

Silence reads as broken

A blank screen while the model thinks doesn't read as "working hard." It reads as "crashed," then as "wasting my time," then as a tab closing. The same ten seconds, filled with visible, named work — "reading your March invoices… comparing against last quarter…" — reads as effort on your behalf. Same latency, opposite verdicts.

Example

the same 40-second report

Version A: spinner for 40 seconds, then a wall of text.

Version B: the outline streams in after two seconds. Sources appear as they're read. The user cuts an irrelevant section mid-run. By second 40 they've already started judging the content — which was the job all along.

Version B isn't faster. It just never goes silent.

Design the wait, plainly

  • Stream something useful in the first second. An outline, a first sentence, a found source. First useful token beats total completion time.
  • Name the work. "Checking 3 sources" builds trust; an animated shimmer builds suspicion. Progress must be honest — invented steps and fake progress bars backfire the moment users catch on, and they catch on.
  • Hand over partial results. Let people start reading, judging, and correcting while the rest cooks.
  • Keep the user in charge. Cancel, redirect, edit-the-prompt mid-run. An interruptible wait feels shorter than a locked one.
  • Set expectations up front. "This takes about a minute — it's reading all 200 rows" turns a delay into a deliberate act.

When slow is actually better

Speed matters most on low-stakes, high-frequency loops — autocomplete, suggestions, search. But instant answers to heavy questions can feel careless: a verdict on your quarter's strategy delivered in 300 milliseconds reads as a slot machine, not a colleague. For high-stakes outputs, a visible beat of deliberateness — showing the checking — can increase confidence. Match the tempo to the stakes.

Try this: time your product's longest silence — the gap where nothing on screen changes and nothing is named. If it's over three seconds, you've found your churn surface. Fill it with honest work before you spend a dollar on making the model faster.

Go deeper: the psychology of the wait

Decades of queueing research say the same three things: uncertain waits feel longer than known waits, unexplained waits feel longer than explained ones, and idle time feels longer than occupied time. AI waits are usually all three at once — unknown duration, unexplained process, idle user. Every technique above attacks one of the three. None of them require making the model faster.

You can't always make the model faster. You can always refuse to go silent — and silence, not latency, is what users punish.

Next guide

01The Gulfs Have Flipped: Why Evaluation Is the New Foundation of Product Design

The core argument. Execution collapsed, evaluation became the bill, and verification design is now the moat.