How ChatGPT types its answers one word at a time over a single HTTP connection.
A standard HTTP request expects a single, complete response. The client asks for data, the server generates the entire 2MB JSON payload, and sends it back all at once. If generating the data takes 5 seconds, the user stares at a blank screen for 5 seconds. HTTP Streaming solves this. Instead of waiting for the whole response, the server uses Transfer-Encoding: chunked (or Server-Sent Events). The server sends the HTTP Headers immediately, and then dribbles data down the open connection one piece at a time, allowing the browser to render it instantly.
To implement this elegantly in a browser, we use Server-Sent Events (SSE). The browser opens an EventSource connection. The server holds the request open indefinitely and writes specially formatted text lines (data: hello\n\n). The browser triggers a Javascript callback every time a chunk arrives, without closing the connection.
// Client-side (Browser)
const source = new EventSource('/api/chat-stream');
source.onmessage = function(event) {
// This fires every time a new word arrives!
document.getElementById("chat").innerHTML += event.data;
};
// Server-side (Node.js/Express)
app.get('/api/chat-stream', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
// Write chunks over time, DO NOT call res.end()
res.write(`data: Hello\n\n`);
setTimeout(() => res.write(`data: World\n\n`), 1000);
});
Streaming is incredibly difficult to proxy. If you put an Nginx load balancer or an AWS API Gateway in front of your streaming server, they will often try to be "helpful" by buffering the entire response in their own memory before sending it to the client, completely destroying the streaming effect. You have to explicitly configure all intermediaries to disable buffering.
{"story": "Once upon..."}, because JSON is invalid until the final closing brace } arrives. You either have to use Server-Sent Events, or a format like NDJSON (Newline Delimited JSON) where each chunk is a complete, separate JSON object separated by a newline.