Session Initiation Protocol

from Wikipedia, the free encyclopedia
SIP (Session Initiation Protocol)
Family: Internet protocol family
Operation area: Management of streaming sessions
Port: 5060
5061 ( TLS encryption)
SIP in the TCP / IP protocol stack :
application SIP
transport UDP TCP
Internet IP ( IPv4 , IPv6 )
Network access Ethernet Token
FDDI ...
Standards: RFC 3261 (SIP, 2004 )

The Session Initiation Protocol ( SIP ) is a network protocol for establishing, controlling and terminating a communication session between two or more participants. The protocol will u. a. specified in RFC 3261 . In the IP telephony , SIP is a frequently used protocol.


In contrast to H.323 , which comes from the ITU-T , SIP was developed by the IETF . H.323 can be called ISDN over IP very simply . Although this enabled telephone system manufacturers in particular to convert communication to IP networks relatively quickly and easily, on the other hand, the strengths and weaknesses of IP networks were not sufficiently taken into account. This is particularly evident in connection with NAT , the translation of network addresses that is necessary above all for firewalls and end customer networks ( e.g. DSL routers ), which can only be achieved with great effort with H.323.

The design of the SIP, on the other hand, is based on the Hypertext Transfer Protocol (but is not compatible with it) and is significantly better suited for IP networks. The structure of SIP allows new extensions to be added easily without all devices involved having to understand them. It is also kept more general: While H.323 is only intended for telephony, sessions of any kind can be managed with SIP. The “payload” of the session, i.e. the actual data streams to be transmitted, can be any stream that can be transmitted over a network. The main area of ​​application is in audio and video transmission, some online games also use SIP to manage the transmission.

SIP signaling.svg

To make an Internet call , you need more than just SIP, because it only serves to agree or negotiate the communication modalities - the actual data for the communication must be exchanged using other suitable protocols. For this purpose, the Session Description Protocol ( SDP , RFC 4566 , the translation from the EnglishSession Description Protocol ” is not commonly used) is often embedded in SIP in order to negotiate the details of the video and / or audio transmission. The devices tell each other which methods of video and audio transmission they can use (the so-called codecs ), which protocol they want to use for this, and at which network address they want to send and receive.

This media negotiation is therefore not a direct part of SIP, but is achieved by embedding another protocol in SIP. This separation of session and media negotiation is one of the advantages of SIP, as it allows great flexibility in the supported payload: For example, if a manufacturer wants to use SIP for a specialized application, he can design his own media negotiation for it, if not already Protocol exists.

In Internet telephony, the Real-Time Transport Protocol ( RTP , German real- time transport protocol , RFC 3550 ) is used for media transmission . SIP negotiates the session here, the embedded SDP negotiates the media details, and RTP is then the protocol that ultimately transmits the video and audio streams.

Participant addresses are written in URI format, which is also used in e-mails and WWW addresses. Such a participant address usually follows one of the following three schemes:

  • Unencrypted SIP connection: sip:user@domain.
  • Encrypted SIP connection: sips:user@domain.
  • Telephone number:, tel:nummerfor example tel:+49-69-1234567. This scheme is mainly used by devices that provide an interface to the "normal" telephone network and can be converted into a SIP URI if necessary, for example sip:+49-69-1234567@domain.

Encryption and security

By separating the session and media, both data streams can also be encrypted independently of one another. You can encrypt SIP using the TLS protocol, also known as SIPS, and the media stream (voice data) also using the SRTP protocol. Any combination of these is possible, but does not make sense in terms of secure encryption.

Both data streams (i.e. session and media) must be encrypted at the same time for secure encryption. The symmetrical keys of the media stream are exchanged via SDP (i.e. SIP) and could therefore be attacked via an unencrypted SIP. The symmetric keys of TLS are exchanged at the beginning of the session, but the mechanisms of the SSL certificates take effect here , in which the symmetric keys are in turn encrypted by the asymmetric keys of the SSL certificates.

Since transmission via a connectionless network protocol makes more sense with SIP, a UDP-based counterpart to TLS, which is based on TCP, was designed with DTLS . However, it is currently only implemented by a SIP stack (ReSIProcate).

Network elements

SIP UA registration on SIP registrar with authentication through login
Call flow through redirect server and proxy
Establish a connection with the B2BUA
  • User agent
The user agent is an interface to the user that displays content and receives commands. A SIP telephone is also a SIP user agent that offers the traditional calling functions of a telephone, such as dial, answer, reject and hold.
  • Proxy server
A proxy server is a communication interface in a network. It works as an intermediary ( routing ) who on the one hand receives inquiries in order to then establish a connection to another party via its own address. It is his job to ensure that requests are targeted to the user. Proxies are also necessary to enforce the hierarchy.
  • Registrar server
The registrar server serves as the central switching point in the system architecture of SIP. It takes over the registration of requests for the domain it is processing. It processes one or more IP addresses for a specific SIP URI, which are transmitted through the SIP protocol.
  • Redirect server
The redirect server relieves the proxy server. It transfers the routing information directly to the user agent client. It creates redirects to be able to contact incoming requests in an alternative set of URIs. The redirect server enables SIP session invitations to be sent to external domains.
  • Session border controller
A session border controller is a network component for the secure coupling of computer networks with different security requirements. It serves as the middle node between the user agent and the SIP server for various types of functions, including support for Network Address Translation (NAT)
  • Gateway
A gateway can interface a SIP network with other networks, such as the public telephone network, which uses different protocols or technologies.
  • B2BUA
B2BUA - (in English back-to-back user agent, literally: the user agent "back to back") is a middleware in both the SIP and the RTP data stream. In relation to SIP clients, a B2BUA behaves like a user agent server on one side of the connection and like a user agent client on the other. It makes sense to be able to manipulate the data streams.
The B2BUA is specified in RFC 3261 .
Examples of application:
  • Call management (including billing, call forwarding, automatic shutdown)
  • Pairing different networks (especially adapting the different dialects of the protocols, depending on the manufacturer)
  • Hide the network structure (including private addresses, network topology)
Basically, a B2BUA can be expanded into a proxy with an integrated media gateway.

SIP messages

The clients and servers involved in a SIP session send each other requests (English "requests") and answer them using response codes (English "responses").

SIP requests

RFC 3261 defines six requests: REGISTER, INVITE, ACK, CANCEL, BYE and OPTIONS.

SIP status codes

1xx - Provisional
Preliminary status information that the server is carrying out further actions and therefore cannot yet send a final response.
2xx - Successful
The request was successful.
3xx - redirection
These messages inform about a new contact address of the called party or about other services that enable the connection to be successfully established.
4xx - Request Failures
The previous message could not be processed.
5xx - Server Failures
A server involved in the transmission could not process a message.
6xx - Global Failures
The server was contacted successfully, but the transaction does not take place.


SIP is already being supported in many devices from various manufacturers and seems to be developing into the standard protocol for Voice over IP ( VoIP ). SIP was also selected by the 3rd Generation Partnership Project (3GPP) as the protocol for multimedia support in 3G mobile communications ( UMTS ). The specification of the Next Generation Network (NGN) at the European Telecommunications Standards Institute (ETSI) of the Telecommunications and Internet converged Services and Protocols for Advanced Networking (TISPAN) project group is also based on SIP.

Advantages and disadvantages

One of the advantages of SIP is that it is an open standard that is now very widely used. Since SIP servers are distributed , an attack only affects the respective provider and not the entire telephony switched via SIP. Another advantage of SIP is the ability to modify an already established session. To do this, another INVITE message with the new SDP session properties is simply sent to the other side within the session. A new medium can thus be added or an existing medium can be modified or removed. The corresponding message is also referred to as a Re-INVITE request.

A disadvantage of SIP is that it uses RTP to transmit voice data . The UDP ports used for this are assigned dynamically, which makes the use of SIP in connection with firewalls or Network Address Translation (NAT, RFC 2663 ) difficult, since most firewalls or NAT routers cannot assign the dynamically assigned ports to the signaling connection . A remedy for this problem is the use of STUN (Session Traversal Utilities for NAT), which recognizes and penetrates NAT routers, but also other protocols such as IAX (InterAsterisk eXchange). The use of the STUN protocol determines the IP address and the port with which the NAT firewall or NAT router goes to the outside (i.e. into the public Internet). A much simpler method of circumventing this problem is for the proxy server or the called subscriber to access the IP address and the port used in the IP header directly, whereby the NAT mechanism works again even without a STUN server. IAX combines signaling and media data on a UDP connection. Like H.323, IAX is a binary protocol, so troubleshooting is more difficult than SIP. In addition, IAX is only in the standardization phase.

A more recent IETF method for solving the NAT traversal problem is the Interactive Connectivity Establishment (ICE), which is already supported by some SIP clients and can usually be installed via firmware upgrade.

Another solution to the NAT traversal problem are so-called Application Layer Gateways (ALG). These are interconnected SIP proxies that - installed on a NAT router or firewall - ensure smooth transfer of SIP signaling and media flows to care. An ALG can automatically open the necessary ports on a firewall for SIP calls and mark RTP media streams with DiffServ bits. This allows the media packets to be transported with higher priority over IP networks if a network supports this. In principle, the Internet does not offer any prioritization, see network neutrality

When using IPv6 as the transport protocol, NAT is generally not required, which means that there is no need to circumvent the problems typical of NAT. Only the problem of the firewall remains the same.


A SIP request could look like this:   And such a SIP response :
Start line INVITE sip: 8495302002@ SIP / 2.0
Header Via: SIP / 2.0 / UDP; branch = 1

From: sip: 8495305005@; tag = 29ae1249

Max forwards: 70

To: sip: 8495302002@

Call ID: 48c7df2a9b4 @ myvoip1

Cseq: 1 INVITE

Contact: sip: 8495305005@

Content-Length: 202

Supported: 100rel

Content-Type: application / sdp

empty line
body v = 0

o = Anonymous 1234567890 1234567890 IN IP4

s = SIGMA is the best

c = IN IP4

t = 0 0

m = audio 6006 RTP / AVP 8 3 0

a = rtpmap: 8 PCMA / 8000

a = rtpmap: 3 GSM / 8000

a = rtpmap: 0 PCMU / 8000

Start line SIP / 2.0 200 OK
Header Via: SIP / 2.0 / UDP;branch=z5K8DSbCGCL8593033654

From: sip: 8495305005@; tag = 6248550609-457625817474016

To: sip: 8495302002@; user = phone; tag = 2e679cbc

Call ID: 6248550609-781762546450147

Cseq: 15 INVITE

Contact: sip: 8495302002@

Content-Length: 191

Content-Type: application / sdp

empty line
body v = 0

o = Anonymous 1234567890 7894561230 IN IP4

s = SIGMA is the best

c = IN IP4

t = 0 0

m = audio 6006 RTP / AVP 8 0

a = rtpmap: 8 PCMA / 8000

a = rtpmap: 0 PCMU / 8000

See also


  • RFC 2543 - SIP (obsolete version)
  • RFC 3261 - SIP
  • RFC 3265 - SIP extension: Specific Event Notification
  • RFC 3515 - SIP Update: SIP Refer Method
  • RFC 3665 - SIP Basic Call Flow Examples
  • RFC 3581 - SIP Update: Symmetric Response Routing
  • RFC 3853 - SIP Update: Use of AES instead of 3DES
  • RFC 4320 - SIP Update: Issues with the SIP Non-INVITE Transaction
  • RFC 4916 - Connected Identity in the Session Initiation Protocol


  • Ulrich Trick, Frank Weber: SIP and telecommunications networks. Next Generation Networks and Multimedia over IP - specifically , De Gruyter Oldenbourg, 2015, ISBN 978-3-486-77853-3

Web links

Individual evidence

  1. Using Session Initiation Protocol to build Context-Aware VoIP Support for Multiplayer Networked Games (PDF; 283 kB) by Aameek Singh and Arup Acharya
  2. SIP Session Initiation Protocol structure -, accessed on November 14, 2013