LLM inference bottlenecks arise from coupled prefill (prompt processing) and decode (token generation) phases. Disaggregation separates them for parallel scaling.
advanced•2 / 3
Key Concepts
Prefill Phase: Computes KV cache from input prompt (compute-intensive).