SIP is the ringing and the hello — it sets up the call; the voice itself rides separately once both sides agree.
SIP (Session Initiation Protocol) is a text-based request/response protocol that looks a lot like HTTP. Its only job is signaling: it establishes, modifies, and tears down media sessions between endpoints. It carries no audio of its own.
Instead, SIP messages carry an SDP (Session Description Protocol) body that negotiates the media — codecs, ports, IP addresses. Once both sides agree, the actual voice travels over RTP in a separate stream. The classic three-way exchange — INVITE → 200 OK → ACK — is what brings a call up.
Alice's phone sends an INVITE with an SDP offer. Bob's phone may answer with provisional responses (100 Trying, 180 Ringing) before the final 200 OK carries its SDP answer. Alice confirms with ACK, and the dialog is established. From that point RTP audio flows directly; SIP only steps back in to change or end the session.
INVITE sip:bob@biloxi.example SIP/2.0
Via: SIP/2.0/UDP pc.atlanta.example;branch=z9hG4bK776asdhds
From: "Alice" <sip:alice@atlanta.example>;tag=1928301774
To: "Bob" <sip:bob@biloxi.example>
Call-ID: a84b4c76e66710@pc.atlanta.example
CSeq: 314159 INVITE
Content-Type: application/sdp # SDP offer negotiates the media (codecs, ports)
v=0
m=audio 49170 RTP/AVP 0 # audio over RTP, payload 0 = PCMU
...
SIP/2.0 200 OK # Bob answers: SDP answer rides here too
CSeq: 314159 INVITE
Content-Type: application/sdp
ACK sip:bob@biloxi.example SIP/2.0 # Alice confirms — dialog established
CSeq: 314159 ACK # RTP audio now flows directly between them
| Property | Detail |
|---|---|
| Transport | UDP (default, lightweight), TCP (large messages), or TLS (SIPS, encrypted signaling) |
| Signaling vs media | SIP sets up the session; RTP carries the audio on a separate stream — they are decoupled |
| Reliability | Provisional 1xx are hop-by-hop hints; the final response plus ACK are what complete the dialog |
| NAT and firewall | Private IPs in SDP break the media path; STUN, TURN, and ICE discover a reachable address |
ACK, the dialog never completes; the INVITE transaction stays open and the server keeps retransmitting 200 OK.Alice dials Bob. Her phone sends INVITE with an SDP offer; Bob's proxy replies 100 Trying so Alice stops resending. Bob's phone rings and returns 180 Ringing, then — when Bob picks up — 200 OK with its SDP answer. Alice's ACK completes the three-way handshake and the dialog is established. Only now does RTP audio flow directly between the two endpoints, carrying the actual voice. When Bob hangs up, his phone sends BYE, Alice answers 200 OK, the RTP stream stops, and the session is terminated.
1. What actually carries the voice once a SIP call is up?
2. Which exchange completes the three-way handshake that establishes the dialog?
Coach note: if a pick doesn't land, give it another pass — the reasoning is what sticks.