White Paper · 03 · September 2025

Two-Phase Streaming

Why Users Should See Results Before AI Finishes

Nick Brandt & Leo Gestetner 11 min read AI Architecture

Abstract

Full AI roundtrips typically take 2.5–7 seconds depending on model size. By the time results arrive, users have already formed a negative impression. Two-phase streaming solves this by delivering rule-based results at ~700ms while AI analysis continues in the background. Users see value in under a second instead of waiting for the full LLM response.

1. The Perception Problem

Users don't experience latency in milliseconds. They experience it in feelings:

Actual LatencyUser Perception
<100msInstant
100–300msFast
300–1000msNoticeable delay
>1000msSlow, frustrating

Full AI roundtrips typically take 2.5–7 seconds depending on model size (~2.5s for small/fast LLMs, 5–7s for larger models like GPT-4 or Claude). By the time results arrive, users have already formed a negative impression — even if the results are excellent.

2. The Two-Phase Alternative

Traditional: User Input → Process Everything → Return All Results ↓ [Wait 2-3 seconds] ↓ "Here's everything" Two-Phase: User Input → Phase 1 (Rules) → Stream at ~700ms → Phase 2 (AI) → Completes ~2.5s

Phase 1 (~700ms): Rules, validation, consensus checks — anything deterministic. With FastAPI and WebSocket, this is achievable including network roundtrip.

Phase 2 (2.5–7s total): AI analysis, pattern detection, natural language explanations. Small LLMs complete around 2.5s; larger models like GPT-4 or Claude can take 5–7s.

The user sees results at 700ms instead of waiting several seconds. That's the difference between "fast" and "slow" in user perception.

3. The Psychology of Progressive Disclosure

Research on perceived performance shows:

First Paint Matters Most

Users judge speed by when they see anything

Progressive Loading Feels Faster

Even if total time is identical

Uncertainty is Worse Than Waiting

A spinner with no progress is anxiety-inducing

Partial Results Reduce Abandonment

Users stay engaged when feedback arrives

The Core Insight

Two-phase streaming isn't about making AI faster. It's about making users feel like the system is faster by delivering value immediately and enhancing it progressively.

4. Implementation Architecture

Phase 1: Deterministic Processing

├── Input validation (required fields, formats) ├── Business rules (hard constraints) ├── Consensus comparison (data-driven checks) ├── Geographic validation └── Typo detection (fuzzy matching) → Stream to client via WebSocket → Target: ~700ms (including network roundtrip)

Phase 2: AI Analysis

├── Receives Phase 1 results as context ├── Pattern detection across fields ├── Contextual reasoning ├── Natural language explanations └── Esoteric issue detection → Stream to client via WebSocket → Target: Full roundtrip ~2.5s

5. The WebSocket Advantage

HTTP request/response forces you to wait for everything. WebSocket streaming lets you send partial results:

HTTP: Request → [Processing] → Response WebSocket: Connect → Partial → Partial → Partial → Complete ↓ ↓ ↓ Phase 1 Phase 2 Phase 2 (rules) (AI pt1) (AI pt2)

Even within Phase 2, AI responses can stream token-by-token if using a streaming LLM API.

6. UX Design for Two-Phase Results

Visual hierarchy matters. Phase 1 results appear instantly. The AI section shows a subtle loading indicator. When AI results arrive, they animate in without disrupting what's already visible.

The key principle: never block visible content on AI completion. Show what you know, enhance progressively.

7. Handling Phase 2 Failures

AI services can be slow or unavailable. Two-phase architecture handles this gracefully:

ScenarioPhase 1Phase 2User Experience
Normal✓ ~700ms✓ ~2.5s totalFull results
AI slow✓ ~700ms⌛ 2–4sPhase 1 instant, AI delayed
AI down✓ ~700ms✗ timeoutPhase 1 results only

The system never blocks on AI. Users always get Phase 1 results, with AI as enhancement.

8. When to Use Two-Phase Streaming

Good Candidates

  • Form validation with AI suggestions
  • Search with AI-enhanced results
  • Content analysis with both rules and intelligence
  • Any workflow where instant feedback + deeper analysis both matter

Not Necessary

  • Pure AI tasks (chat, generation) where there's no fast path
  • Batch processing where latency doesn't matter
  • Simple CRUD operations with no AI component

9. Measuring Success

MetricWhat It Tells You
Time to first resultPhase 1 performance
Time to completeTotal latency
Phase 2 success rateAI reliability
User engagement after Phase 1Are partial results useful?
Abandonment rateComparison to single-phase

10. Conclusion

Two-phase streaming isn't about making AI faster. It's about making users feel like the system is faster by delivering value immediately and enhancing it progressively.

When full AI roundtrips take 2.5–7 seconds, delivering Phase 1 results at ~700ms gives users immediate value while they wait for deeper analysis.

References