VoIP SIP signaling protocol

SIP is the ringing and the hello — it sets up the call; the voice itself rides separately once both sides agree.

The idea

SIP (Session Initiation Protocol) is a text-based request/response protocol that looks a lot like HTTP. Its only job is signaling: it establishes, modifies, and tears down media sessions between endpoints. It carries no audio of its own.

Instead, SIP messages carry an SDP (Session Description Protocol) body that negotiates the media — codecs, ports, IP addresses. Once both sides agree, the actual voice travels over RTP in a separate stream. The classic three-way exchange — INVITE200 OKACK — is what brings a call up.

Press play to watch Alice call Bob over SIP.

How it works

Alice's phone sends an INVITE with an SDP offer. Bob's phone may answer with provisional responses (100 Trying, 180 Ringing) before the final 200 OK carries its SDP answer. Alice confirms with ACK, and the dialog is established. From that point RTP audio flows directly; SIP only steps back in to change or end the session.

INVITE sip:bob@biloxi.example SIP/2.0
Via: SIP/2.0/UDP pc.atlanta.example;branch=z9hG4bK776asdhds
From: "Alice" <sip:alice@atlanta.example>;tag=1928301774
To: "Bob" <sip:bob@biloxi.example>
Call-ID: a84b4c76e66710@pc.atlanta.example
CSeq: 314159 INVITE
Content-Type: application/sdp      # SDP offer negotiates the media (codecs, ports)

v=0
m=audio 49170 RTP/AVP 0            # audio over RTP, payload 0 = PCMU
...

SIP/2.0 200 OK                     # Bob answers: SDP answer rides here too
CSeq: 314159 INVITE
Content-Type: application/sdp

ACK sip:bob@biloxi.example SIP/2.0  # Alice confirms — dialog established
CSeq: 314159 ACK                   # RTP audio now flows directly between them

Cost & trade-offs

PropertyDetail
TransportUDP (default, lightweight), TCP (large messages), or TLS (SIPS, encrypted signaling)
Signaling vs mediaSIP sets up the session; RTP carries the audio on a separate stream — they are decoupled
ReliabilityProvisional 1xx are hop-by-hop hints; the final response plus ACK are what complete the dialog
NAT and firewallPrivate IPs in SDP break the media path; STUN, TURN, and ICE discover a reachable address

Watch out for

Worked example

Alice dials Bob. Her phone sends INVITE with an SDP offer; Bob's proxy replies 100 Trying so Alice stops resending. Bob's phone rings and returns 180 Ringing, then — when Bob picks up — 200 OK with its SDP answer. Alice's ACK completes the three-way handshake and the dialog is established. Only now does RTP audio flow directly between the two endpoints, carrying the actual voice. When Bob hangs up, his phone sends BYE, Alice answers 200 OK, the RTP stream stops, and the session is terminated.

Check yourself

1. What actually carries the voice once a SIP call is up?

2. Which exchange completes the three-way handshake that establishes the dialog?

Coach note: if a pick doesn't land, give it another pass — the reasoning is what sticks.