Every camera sends one stream up; a smart router fans each stream out to everyone else — no re-encoding, just forwarding.
In a group video call, a mesh topology makes every peer upload one copy of its camera to each of the other peers. With N people that is N−1 uploads per client — uplink, which is the scarce home-network resource, collapses fast. An MCU (multipoint control unit) fixes uplink by decoding every stream, mixing them into one composite, and re-encoding it server-side — but that transcoding is brutally CPU heavy.
An SFU (selective forwarding unit) sits in the middle. Each peer uploads exactly one stream to the SFU. The SFU then selectively forwards each incoming stream to the other subscribers — without decoding or re-encoding it. So uplink stays at one copy per client, and the server just relays packets: its cost is bandwidth, not transcode CPU. Simulcast lets each sender publish a few quality layers so the SFU can forward a lower-quality layer to receivers on weak connections.
The SFU keeps, per receiver, the set of streams that receiver is subscribed to. When a packet arrives from a sender, it forwards a copy to each other subscriber — picking the right simulcast layer for each one based on that receiver's estimated bandwidth. It never decodes the media: it routes RTP packets, so its work is bookkeeping plus egress, not video processing.
# SFU forward loop — relay packets, never transcode
def on_packet(sender, packet):
# packet carries a simulcast layer id (e.g. low / mid / high)
for r in receivers:
if r is sender:
continue # never echo a sender to itself
if sender not in r.subscribed:
continue # only forward what r asked for
layer = pick_layer(r.bwe, packet.layers) # per-receiver layer
if packet.layer == layer:
forward(r, packet) # copy bytes out — no decode/re-encode
# Cost is one upstream copy per sender + up to N*(N-1) forwarded
# streams of egress. CPU stays near idle because nothing is mixed.
| Property | Mesh | MCU | SFU |
|---|---|---|---|
| Client uplink | N−1 streams | 1 stream | 1 stream |
| Server CPU | none (no server) | heavy — decode + mix + re-encode | light — packet forwarding only |
| Server bandwidth | none | N in, 1 mix out | up to N×(N−1) forwarded streams |
| Scalability | poor — uplink dies past ~4 | CPU-bound per call | bandwidth-bound; scales widest |
N×(N−1) streams, so outbound bandwidth — not CPU — is what caps a call. Plan capacity around egress.TURN relay, which adds latency and relay bandwidth cost.NACK retransmits or forward error correction (FEC) so a lost packet doesn't freeze a decoder.A 4-person call. In mesh, each client uploads to the other 3 — that is 3 upstream copies per person, and home uplink usually can't carry that. With an SFU, each client uploads 1 stream and downloads the other 3. The SFU forwards every sender's stream to the 3 other people, so it relays 4 × 3 = 12 streams total. That is real bandwidth on the server — but almost no CPU, because nothing is decoded or mixed, only forwarded.
1. In an SFU call, how many streams does each client upload, regardless of room size?
2. What is the SFU's main cost compared with an MCU?
Coach note: if a pick doesn't land, give it another pass — the reasoning is what sticks.