SIP, like HTTP, is versatile and simple to use. It can set up collaborative multimedia conferencing and voice-enabled e-commerce. SIP is expected to become the norm for VoIP implementations within a couple of years, though today just about all enterprise VoIP vendors would rather keep you locked into their proprietary signaling solutions. The IETF published the first version of SIP, RFC 2543, in 1999 and the most recent version, RFC 3261, last June.
SIP is ideal for VoIP, where a session over the Internet replaces the traditional end-to-end circuit for a voice call in a legacy network. The ITU's H.323 multimedia standard, as well as some vendor-proprietary VoIP phones, also do this. VoIP vendors that built products before SIP emerged have adopted H.323. But SIP is simpler to implement than H.323 and is a lighter-weight protocol with less overhead.
SIP is more than a standards-based replacement for legacy phone connections, though. It makes it easier to implement advanced multimedia services, such as presence, which allows you to determine instantly whether a user can and wants to receive a call on a specific phone, as well as over video and instant messaging sessions. It also lets you ring multiple destinations in a VoIP call.
And SIP is making commercial inroads. Microsoft's WinMessenger IM program, which comes packaged with its XP OS, is based on SIP. WinMessenger also uses SIP to make Internet phone calls. Future 3G wireless WANs, too, will use SIP for setting up and tearing down calls.
Still, there are plenty of misconceptions about what SIP can actually do. SIP does not, for instance, transport digitized voice. That's the job of the RTP (Real-Time Transport Protocol), which transports voice after SIP establishes the call. And before SIP can set up voice, text-messaging or video sessions using various codecs and techniques, you need to determine what features the devices in the session support. That's where the Session Description Protocol comes in: SIP relies on SDP to negotiate the capabilities between two endpoints in a potential conversation.
You're Invited
SIP can use UDP (User Datagram Protocol) or TCP as a transport, but by default it uses UDP on Port 5060. If a SIP packet is dropped by an unreliable protocol like UDP, SIP retransmits its command once it decides it has waited long enough for a response.
The most common command SIP sends to another endpoint is the "invite" command. When a SIP phone or UA (user agent) wants to connect to another SIP phone or UA, it sends an invite. If the invite is successful, the originator receives a "200" response, which means everything is OK and the session is established.
Like HTTP and SMTP, SIP is in plain text, which makes it easier to parse the commands. And any protocol analyzer can show the actual commands and responses in a simple ASCII translation.
Along with the invite, a SIP header contains a "to" and "from" address similar to that in an e-mail message. Each of these addresses is called a URI (Uniform Resource Identifier) and looks like an e-mail address:
sip:peter@nwc.com
The "to" field in the URI can contain a standard phone number. The SIP header also contains a "call ID," which is a unique number that identifies the SIP transaction, and a "via" field, which tells the UA which IP address to use for sending its response when it's negotiating the initial connection.
Once the session is established, the "contact" field--the UA's IP address--is used. That's the destination the recipient UA uses to talk to the originating UA. When NAT (network address translation) is deployed, the endpoint uses an unroutable NAT address inserted in the SIP layer as the return address. But SIP vendors can use various work-arounds: A SIP device can use the IP packet's source address, for instance, rather than the IP address that appears in the SIP header. A SIP-aware firewall can use NAT to change the IP address in the SIP header.
The invite request uses SDP syntax to tell the UA the caller's media capabilities. When the called party answers, it replies with the OK message, which also includes the supported media capabilities.