RTP media relay

A relay forwards voice and video packets between callers in real time, fast enough that a dropped one is better skipped than waited for.

The idea

On a live call, audio and video are chopped into tiny RTP packets and sent over the network. A relay in the middle forwards them between the two callers — useful when the callers can't reach each other directly, or when one stream needs to fan out to many viewers.

The network is messy: packets arrive out of order, late, or not at all. The receiver keeps a small jitter buffer that briefly holds packets so it can re-sort them by sequence number before playing. But it never waits forever — for a live call, bounded latency beats perfect delivery. A late or lost packet is concealed and the audio moves on.

See it work

Press play, or step through, to watch packets flow A → relay → B.

How it works

Every RTP packet carries a small header: a sequence number (increments by one per packet, so the receiver can detect gaps and reorder), a timestamp (the media sampling instant, so playout is paced correctly), and an SSRC (which stream this is). The relay typically forwards packets without re-encoding — a TURN relay copies bytes verbatim; an SFU selectively forwards each sender's stream to subscribers. Cheap and low-latency, because there's no transcoding.

The receiver runs a jitter buffer: incoming packets are inserted by sequence number and held just long enough to absorb network variance. A fixed playout clock pops the next expected sequence on each tick. If that sequence hasn't arrived by its deadline, the buffer conceals it (interpolates or skips) and advances — it never blocks the call waiting for one packet. Loss and timing are reported back over RTCP so the sender can adapt its bitrate.

on_packet(p):
    buffer.insert(p.seq, p)            # reorder by sequence number

on_tick(now):                          # fixed playout clock
    want = next_seq
    if want in buffer:
        play(buffer.pop(want)); next_seq += 1
    elif now - deadline(want) > 0:
        conceal(want); next_seq += 1   # don't wait past the deadline

Trade-offs

Choice	Cost	Note
Deeper jitter buffer	+ latency	Tolerates more reorder & loss; too deep and the call feels laggy
Shallow jitter buffer	+ loss / glitches	Low latency, but late packets miss their deadline and get concealed
Relay (TURN / SFU)	+ latency, + bandwidth $	Works behind any NAT; one stream can fan out to many subscribers
Direct P2P	NAT traversal can fail	Lowest latency when it connects; no server media cost
UDP / RTP transport	Must handle loss yourself	No head-of-line blocking — a lost packet never stalls the rest
TCP transport	Head-of-line blocking	Retransmits stall everything behind the loss — wrong for live media

Watch out for

Using TCP for live media. TCP guarantees in-order delivery by retransmitting and blocking everything behind a loss (head-of-line blocking). For a live call you'd rather skip one frame than freeze a second of audio. Use UDP/RTP and conceal loss.
An unbounded jitter buffer. Growing the buffer to "never lose a packet" just trades loss for latency. Past roughly 150–200 ms one-way, conversation feels broken. Bound the buffer and conceal what's late.
Trusting sequence wraparound. The RTP sequence number is 16-bit and wraps at 65535 → 0. Compare sequences with modular ("serial number") arithmetic, not plain <, or reordering breaks at the wrap.
No RTCP loss feedback. Without receiver reports the sender can't see loss or congestion and won't lower its bitrate — quality collapses instead of degrading gracefully.
Relay bandwidth at scale. A relay carries every byte of every stream. For an N-party call an SFU sends each stream to N−1 subscribers — bandwidth (and cost) grows fast. Plan simulcast and layer selection early.

Worked example

Sender A emits packets 1..7, each forwarded by the relay. The network reorders packet 4 so it arrives after 5, and drops packet 6 entirely. Watch what the receiver's jitter buffer does:

arrive: 1 2 3 5 4 7        (6 never shows up)
buffer: holds out-of-order packets, sorted by seq
playout (fixed clock, next_seq advancing):
  1 -> play   2 -> play   3 -> play
  4 -> in buffer (it arrived late) -> play   5 -> play
  6 -> not here, deadline passed -> conceal, skip
  7 -> play
result: 1 2 3 4 5 (6 concealed) 7  with bounded delay

Packet 4 was late but still beat its deadline, so the buffer reordered it back ahead of 5 and played it in order. Packet 6 missed its deadline, so instead of stalling the whole call, the receiver concealed it and moved straight on to 7. Smooth audio, one tiny gap, no freeze.

Check yourself

The receiver is still missing packet 6 and its playout deadline just passed. What should the jitter buffer do?

Why is TCP a poor transport for a live voice call?