Video codecs operate similarly to the software that is commonly used to compress computer files for easier transmission or storage. The biggest difference, however, is the speed at which a video codec must operate. While waiting a few seconds to uncompress 2 MB to 3 MB of data may be tolerable for a computer user, a video codec must do that much or more every second, depending on the resolution of the original signal. Just as when you create smaller files for easier storage and transmission, video codecs provide a smaller bandwidth stream to transmit images without noticeable loss.
Video codecs are able to do their job because a video signal adheres to certain rules. These rules are few and simple to follow. An incoming video signal comprises frames in a set resolution, transmitted at a set frame rate. Data needing to be compressed on your hard drive could be one of many different formats, each of which would have its own rule set for how the data is formatted.
Ten years ago, when the first video codec standards started to emerge, all the work of compressing and decompressing the video signal had to be done within specialized chips. The CPUs back then were unable to perform the necessary calculations fast enough. With today's speedy processors, most video decompression can occur on the desktop computer without the need for special video cards or extra video-specific processors.
All Codecs are Created Equal
Every video codec works in essentially the same manner: They take an incoming video stream, compress the signal by deleting extraneous or duplicate information, and transmit it. At the receiving end, the codec decompresses the signal and displays it on the monitor or television.
The primary separation between all codecs is how tight or how fast they can compress the stream. Tight compression results in a smaller amount of bandwidth being used for transmission or storage but may require more processing horsepower than is available in a given amount of time, resulting in the inability to compress the signal in real time. Faster compressors may take shortcuts to get to the end result, leaving a video signal that does not retain all the original quality and clarity or isn't compressed as tightly. The trick is to find the balance between small files and real-time compression that gives the best quality.
Video codecs are typically classified as lossy or lossless. Typically, lossy means a visible loss of quality, whereas lossless defines an image with imperceptible loss. Truth be told, most video codecs are lossy to some extent. Because video quality is subjective, some amount of loss is acceptable without degrading the experience.
The MPEG Family
The most recognizable form of video codec is the MPEG (Motion Picture Experts Group) family. MPEG started out as a committee dream back in 1988, and the first standard was formalized in November 1992 as MPEG-1 (ISO/IEC 11172). Leonardo Chairiglione and Hiroshi Yasuda are the fathers of the MPEG standards we use today. MPEG-1 began as a standard for broadcasting noninterlaced (progressive scan) video and audio at a combined constant bit rate of 1.5 Mbps. The video portion of the signal occupies 1.15 Mbps, and the audio occupies the remainder.
MPEG-1 was designed as a motion-based extension to the JPEG (Joint Photographic Experts Group) still-image format, which was gaining wide popularity in the desktop publishing world. MPEG-1 can also trace its roots back to the H.261 video standard used for teleconferencing. The MPEG-1 bitstream is broken into three main pieces: system, video and audio. System covers the bitstream itself and its format. The video and audio sections are the compressed streams.
The video stream is created as a constant bit rate comprising the individual frames. These frames are described as an I (intraframe), P (predicted) or B (bidirectional) frame. Intraframes contain all the data necessary to describe that particular frame and are found as the beginning structure of a GOP (group of pictures). Predicted frames are described as the difference between themselves and the previous I or P frame. A basic MPEG stream is thereby created as an I frame followed by 11 to 14 P frames before the next I frame.
The MPEG-1 stream can also use the B frame between an I frame and a P frame or between two P frames. Two B frames are typically between the I or P frames. The B frame describes itself as the difference between the previous frame and the following frame.
Use of the B frames requires more RAM on both the encoder and the decoder. On the encode side, the frames to be compressed as B frames have to be stored and kept until the eventual P frame is calculated. On the decode side, the B frames need to be stored until the P frame arrives and the decoding process can occur.
Going a step deeper into the stream is the macroblock, which is a collection of pixels or PELs (picture elements). The PELs within the macroblock are compressed, then the motion for the block is predicted according to whether the frame is an I, B or P frame. The motion is defined as the movement or change in position of PELs from one frame to the next (see "Video Stream" diagram, above).
The macroblock is one of the differences between MPEG-1 and MPEG-2 (ISO 13818). The size of the macroblock in an MPEG-2 is variable and can include more information than in MPEG-1. This provides the variable bit rate common in MPEG-2 streams. The stream maintains a maximum bit rate, which can actually be lower because of the video being encoded. The encoder thereby can simplify the stream and the number of macroblocks if large areas of the image are not changing much.
MPEG-2 also has support for better audio. MPEG-1 supports only stereo audio synchronized within the stream. MPEG-2 separates the audio, allowing for anything from stereo to multiple channel surround sound to be included with the stream. On the video-quality side, MPEG-2 introduced the ability to encode 10-bit color information, along with handling interlaced video signals.
MPEG-2 can deliver content in one of two forms: as a program or transport stream. A program stream combines audio and video streams sharing a common time base. Since the audio and video are synced together, a program stream is designed for relatively error-free environments. Packet length is variable and has no set limit beyond the Ethernet limit of 4,000 bytes.
Transport stream lets the audio and video run on separate time bases. This allows better transport over networks that are error prone or prone to transmission noise. Transport streams are limited to packets of 188 bytes in length.