SIP session routing

Before any audio flows, a chain of SIP messages negotiates the call path between caller, proxies, and callee.

The idea

When you place a voice or video call over the internet, the audio doesn't just appear. First, the two ends have to find each other and agree to talk. SIP (Session Initiation Protocol) is the signalling layer that handles this introduction.

The caller's phone (the UAC, user agent client) sends an INVITE. It travels through SIP proxies and a registrar that look up where the callee (the UAS, user agent server) is currently registered, then forward it. The callee rings (180 Ringing), answers (200 OK), and the caller confirms (ACK). Only then does the actual media stream begin — and the media (RTP) usually flows directly between the two endpoints, bypassing the proxy entirely. SIP routes the signalling, not the media.

See it work

Press play to watch it run.

How it works

SIP is a text protocol that looks a lot like HTTP. A request has a method line, a stack of headers, and a body. The Via headers record the path the request took so the response can retrace it; From and To identify the parties. Here is a trimmed exchange for Alice calling Bob:

INVITE sip:bob@example.com SIP/2.0
Via: SIP/2.0/UDP pc.alice.example.com;branch=z9hG4bK77
Max-Forwards: 70
From: Alice <sip:alice@example.com>;tag=1928
To: Bob <sip:bob@example.com>
Call-ID: a84b4c76e66710
CSeq: 1 INVITE
Contact: <sip:alice@pc.alice.example.com>
Content-Type: application/sdp
   ... SDP body offers codecs & the caller's media address ...

SIP/2.0 100 Trying          (proxy: I'm working on it)
SIP/2.0 180 Ringing         (Bob's phone is ringing)
SIP/2.0 200 OK              (Bob picked up; SDP answer attached)

ACK sip:bob@pc.bob.example.com SIP/2.0
   ... three-way handshake done; media (RTP) now flows peer-to-peer ...

The INVITE / 200 OK / ACK trio is the three-way handshake that confirms both sides are ready. Record-Route lets a proxy insert itself into the path so that later in-dialog messages (like BYE) still flow through it.

Trade-offs

AspectCostSignal to watch
Signalling vs media separation Two planes to operate; media goes peer-to-peer while SIP stays on the proxies. Signalling succeeds but the call is silent — the media path failed independently.
NAT traversal SIP and RTP both struggle through NAT; you need STUN, TURN, or ICE. One-way audio, or audio only on the same LAN.
Statefulness Stateful proxies track transactions (memory); stateless ones are cheaper but blind. Lost retransmits or duplicate dialogs when state assumptions break.
Latency Extra round trips through proxies before the callee even rings. Slow post-dial delay; users hear silence before ringback.
Interop Header quirks and optional features vary across vendors. Calls work to some destinations but fail to others on the same setup.

Watch out for

Worked example

Alice calls Bob through one proxy. Alice's phone sends INVITE to the proxy. The proxy looks up Bob's current registration (he registered earlier from his desk phone) and forwards the INVITE to him, returning 100 Trying to Alice so she knows it's in flight. Bob's phone starts 180 Ringing — that's where Alice hears ringback. Bob answers: 200 OK travels back Bob → proxy → Alice. Alice confirms with ACK down the same chain, completing the three-way handshake. Now the talking begins: media (RTP) flows directly between Alice and Bob, not through the proxy. When Bob hangs up, his phone sends BYE (acknowledged with 200 OK) and the session tears down. The ladder above walks exactly these messages, step by step.

Check yourself

Does the actual voice audio flow through the SIP proxy?

Which message completes the three-way handshake that establishes the call?