H.264 Evolves For The Internet
In a point-to-point videoconference, the key components are the display device, camera, microphone--and the codec. The codec handles the analog-to-digital conversion, compression, and packetization of digital audio and video streams. It also sends and receives the packets. In the past, videoconferencing systems used codecs from the H.320 line of standards, such as H.262. They worked well on dedicated circuits such as T1s; however, quality was definitely an issue over the public Internet. Packets carrying video are extremely time-sensitive because a single frame of video may be represented by hundreds of packets, which can arrive erratically if there's any congestion and queuing in the network routers. This variable arrival time, called jitter, requires a receive buffer to smooth out the video. With H.264 AVC, also known as MPEG-4 Part 10, the industry believed it had the answer to delivering video via IP to varied endpoints. With profiles that allowed bandwidths from 56 Kbps on up to 27 Mbps, vendors thought they could handle everything from mobile clients to broadcast studio cameras.
But one big problem remained: packet loss.
With H.264 AVC compression, even loss rates of just 1% to 2% had a devastating effect on video quality. That's because it often takes a hundred or more packets to represent a single video frame. In MPEG compression, only about every 12th frame, called a key or index frame, actually contains enough information to decode an entire picture. The remaining 11 frames are derived from information in the key frame. If some of the packets for the key frame are lost, the reconstruction of the key frame is affected, as is that of the next 11 frames.
Vendors scrambled to find a way to conceal errors. The resulting solutions generally involved a form of forward-error correction (FEC). In these techniques, a few extra bits were added to the payload to allow the receiver to accumulate an error-correction code. In video FEC, the sending station splits the code and transmits it by placing 1 or 2 bits in each byte or block of data. The problem then became, extra bits means extra bandwidth. And getting enough packets accumulated means additional delay. So while FEC techniques generally work, they come at a cost.
About a decade ago, a small group of vendors decided to take a different approach. By adding an extension to the H.264 standard, they proposed SVC. In this technique, the original video stream is divided into three substreams. The base stream provides enough information to enable the receiving system to present an acceptable low-resolution image. It is protected by aggressive error protection, and in most cases, this stream will be delivered even if the network is suffering 3% or more packet loss.
A second stream contains the information to raise the presentation to the level of standard definition and has minimal loss protection. The third stream is the data necessary to present an HD video experience; it has little or no error protection. This three-pronged system works because most of our networks have loss only for very short periods of time, say 10 to 20 milliseconds. When a limited amount of loss occurs, the receiver is presented with a lower-resolution image. These periods tend to be so brief that the typical user doesn't even perceive them.
SVC is important enough that vendors including Avaya, HP, LifeSize, Polycom, Radvision, and Vidyo have delivered, or have announced their intention to deliver, products based on the standard. Companies that have a large embedded base of H.264 AVC systems will need to use gateways to integrate the newer SVC systems. This gateway is most often a hardware device, although some vendors implement it in software by adding it to the server that acts as the call setup device.
We anticipate gradually escalating adoption of desktop videoconferencing based on H.264 SVC as the market becomes aware of its capabilities and compelling price points. As with VoIP and telephony in 2000, few thought the Internet could be used to connect video devices for high-quality conferencing. Now that it is feasible, adoption should spread quickly.