TTFT test for streaming APIs
Measure how fast the first token appears.
TTFT means time to first token. It shows how long users wait before a streaming LLM response starts, which often matters more than the final completion time.
TTFT is only the beginning
A low first-token delay makes an app feel responsive, but users still care whether the rest of the answer streams smoothly. The report separates first token latency from total response time.
What TTFT Measures
TTFT captures the delay between sending a chat completion request and receiving the first streamed content token. It includes routing, queueing, upstream model startup, and network delay.
TTFT vs Latency vs Tokens per Second
TTFT is about when the answer begins. Total latency is about when the answer finishes. Tokens per second is about how quickly the answer continues after it starts.
Why First Token Latency Matters
A low TTFT makes an API feel responsive. A high TTFT can make users think a chat app is stuck, even if the final answer eventually arrives at a reasonable speed.
FAQ
Is TTFT the same as full response time?
No. TTFT measures the first token. Full response time measures the whole completion.
Does TTFT only matter for streaming?
It matters most for streaming, because users can see the response begin before the full answer is done.
Can an API have good TTFT but poor output speed?
Yes. An endpoint can start quickly but generate the rest of the answer slowly, so both metrics matter.
Should I compare TTFT across models?
Yes, but compare similar model sizes and providers. Larger reasoning models often have naturally higher first-token latency.
Check first-token speed before users notice it.
Measure TTFT alongside streaming total time and output speed in one quick test.