API Latency Test

OpenAI-compatible API guide

OpenAI Streaming Not Working?

If OpenAI-style streaming is not working, first confirm that normal chat completions work, then test whether stream: true returns real server-sent event chunks instead of a buffered full response.

Why Streaming Breaks

Streaming can fail even when the same endpoint answers normal chat requests. The provider may not implement server-sent events, a proxy may buffer chunks, the selected model may not support streaming through that route, or the client may be parsing the stream incorrectly.

Steps to Debug Streaming

  1. Verify the base URL. Use the API root, such as https://api.example.com/v1, not a dashboard URL.
  2. Run a normal chat request. Confirm the endpoint can return a valid non-streaming answer first.
  3. Enable stream: true. Check whether chunks arrive before the full answer is complete.
  4. Inspect the stream format. Look for event-style data: chunks and a final done marker.
  5. Measure TTFT. A high time to first token usually means buffering, slow routing, or overloaded upstreams.
  6. Compare another model or route. If one model fails but another streams, the issue may be provider routing rather than your code.

Common Streaming Symptoms

Symptom Likely cause What to test
Normal chat works, streaming fails Streaming is not implemented or not enabled for the route Run the same model with and without stream: true
All text arrives at once Proxy or CDN buffering Check TTFT and whether chunks arrive before completion
Client throws a parse error Malformed SSE chunks or non-OpenAI response shape Inspect raw chunks and response headers
Very slow first token Slow upstream, overloaded relay, or distant region Repeat the test and compare TTFT across providers

Example Result

A streaming check should show whether chunks arrive, how long the first token takes, and whether the endpoint behaves like an OpenAI-compatible API.

API relay test result showing slower latency, TTFT, and output speed signals.
Example report showing latency, TTFT, streaming total, and output speed signals.

Run the Check

Use the checker to test normal chat, streaming, TTFT, and response shape in one run. Demo mode is available if you want to see the report before entering an API key.

FAQ

Why does streaming fail when normal chat works?

Some providers support normal chat completions but do not correctly implement server-sent event streaming, or a proxy buffers the stream until the full response is ready.

What should a streaming response look like?

A working OpenAI-style stream returns event chunks, usually as data: lines, before the final done marker.

Can a CDN or gateway break streaming?

Yes. Some proxies and gateways buffer responses, which makes streaming appear to arrive only after the full answer is complete.

Is slow TTFT the same as broken streaming?

No. Slow time to first token means streaming eventually starts, while broken streaming means chunks are missing, buffered, malformed, or never delivered.