Video conferencing SFU networking

Every camera sends one stream up; a smart router fans each stream out to everyone else — no re-encoding, just forwarding.

The idea

In a group video call, a mesh topology makes every peer upload one copy of its camera to each of the other peers. With N people that is N−1 uploads per client — uplink, which is the scarce home-network resource, collapses fast. An MCU (multipoint control unit) fixes uplink by decoding every stream, mixing them into one composite, and re-encoding it server-side — but that transcoding is brutally CPU heavy.

An SFU (selective forwarding unit) sits in the middle. Each peer uploads exactly one stream to the SFU. The SFU then selectively forwards each incoming stream to the other subscribers — without decoding or re-encoding it. So uplink stays at one copy per client, and the server just relays packets: its cost is bandwidth, not transcode CPU. Simulcast lets each sender publish a few quality layers so the SFU can forward a lower-quality layer to receivers on weak connections.

Press play to watch one stream go up and fan out to everyone else.

How it works

The SFU keeps, per receiver, the set of streams that receiver is subscribed to. When a packet arrives from a sender, it forwards a copy to each other subscriber — picking the right simulcast layer for each one based on that receiver's estimated bandwidth. It never decodes the media: it routes RTP packets, so its work is bookkeeping plus egress, not video processing.

# SFU forward loop — relay packets, never transcode
def on_packet(sender, packet):
    # packet carries a simulcast layer id (e.g. low / mid / high)
    for r in receivers:
        if r is sender:
            continue                  # never echo a sender to itself
        if sender not in r.subscribed:
            continue                  # only forward what r asked for
        layer = pick_layer(r.bwe, packet.layers)   # per-receiver layer
        if packet.layer == layer:
            forward(r, packet)        # copy bytes out — no decode/re-encode

# Cost is one upstream copy per sender + up to N*(N-1) forwarded
# streams of egress. CPU stays near idle because nothing is mixed.

Cost & trade-offs

PropertyMeshMCUSFU
Client uplinkN−1 streams1 stream1 stream
Server CPUnone (no server)heavy — decode + mix + re-encodelight — packet forwarding only
Server bandwidthnoneN in, 1 mix outup to N×(N−1) forwarded streams
Scalabilitypoor — uplink dies past ~4CPU-bound per callbandwidth-bound; scales widest

Watch out for

Worked example

A 4-person call. In mesh, each client uploads to the other 3 — that is 3 upstream copies per person, and home uplink usually can't carry that. With an SFU, each client uploads 1 stream and downloads the other 3. The SFU forwards every sender's stream to the 3 other people, so it relays 4 × 3 = 12 streams total. That is real bandwidth on the server — but almost no CPU, because nothing is decoded or mixed, only forwarded.

Check yourself

1. In an SFU call, how many streams does each client upload, regardless of room size?

2. What is the SFU's main cost compared with an MCU?

Coach note: if a pick doesn't land, give it another pass — the reasoning is what sticks.