FEATURE
Videoconferencing: Bytes, Camera, Action!
by Dave Brown
The future of proprietary algorithms for videoconferencing is as bright
as the future of EBCDIC for data exchange. In the early days of two-way
compressed video, manufacturers engineered specialized, and somewhat secret,
encoding techniques to produce the best possible video and sound between
their own equipment. Yet as the CCITT (now the International Telecommunica-tions
Union) began to negotiate standards, customers demanded interoperability,
and manufacturers had to start offering H.320 options.
These days, all new videoconfe
rencing equipment can observe the standards.
Most manufacturers have shelved work on proprietary audio/video compression
algorithms and are pouring significant development into improving the way
they handle the H.320 protocol suite.
While prospective buyers of video teleconferencing terminals (VTT) have
seen demonstrations between systems from a single vendor running under the
best available conditions, many need to communicate under ISDN basic rate
interface (BRI) line conditions with multivendor equipment. With that in
mind, we tested VTT codecs in standards mode over 112-Kbps connections.
Real-life conditions in the U.S. switched telephone network necessitated
our choice of 112 Kbps, rather than the 128 Kbps possible over an ISDN clear
BRI. Clear 64-Kbps per channel ISDN is routinely connected within Local
Access Transport Areas (LATAs) in many parts of the country, but most intra-LATA
t
runks still use 56-Kbps channels. In our tests, all cross-count
ry calls
but one connected at 112 K
bps. Only CLI, dialing from San Jose, Calif.,
came in at 128 Kbps. (To normalize our performance numbers, we adjusted
CLI's channel-limited frame rates down to 112-Kbps equivalents, but let
its codec-limited frame rates stand.)
You can bet that if things can look reasonably good using codecs that operate
at 112 Kbps, they'll certainly look better at higher connection rates.
Testing the Videoconferencing Systems
We produced a four-minute VHS
video tape with visible frame numbers and shipped an identical copy to each
participating manufacturer. The tape's test clips were designed to generate
various audio patterns and challenge video compression performance: We depict
the results as six graphs on pages 49 through 52.
All participants played the video tape through equipment of their choice
via single-line ISDN to the same RSI Systems ERIS decoder in Network Computing's
test laboratory at the University of Wisconsin-Madison's En
gineering campus.
If the sending codec could
support more than one H.320 audio format, we
tested using G.711 (which leaves only 56 Kbps available for video) and again
at G.728 (which allows 96 Kbps to carry the video).
A VHS deck in our laboratory captured the reference codec's output, so the
tapes could be analyzed and actual transmission frame rates could be calculated
for each sequence in the test.
RSI Systems ERIS
To obtain a uniform target for our testing, we borrowed
an ERIS desktop box from RSI Systems, Edina, Minn.-and immediately loved
it. ERIS is externally packaged in a 10.5-inch long, 6-inch wide, 5.5-inch
deep box that connects either to a Windows PC or Macintosh via SCSI cable.
Its current list price is $4,495. A new model to be announced in the next
few months will cost less than $4,000 and will include a standalone option.
As a desktop system, the ERIS provides excellent audio with a freestanding,
stalk-like Telex
microphone and 3-inch speak
er built into the front of the
unit. An internal micr
ophone can be used instead of the Telex (producing
a less satisfactory "hollow" sound).
We recognize a great future for the ERIS as the core of a build-it-yourself
room system or executive conference facility. The box has Main and Auxiliary
video inputs, each of which can accept SVideo or RCA plugs. Video output
also is provided on SVideo and RCA. Audio inputs are available for the external
microphone, and a VCR or audio mixer. It has a separate output jack for
external speakers, plus two isolated line outputs-one providing local transmitted
audio, the other for received audio.
ERIS supports FCIF video resolution and all three of the current audio algorithms:
G.728, G.722 and G.711. In our performance testing, this codec ranked with
the best room systems.
ERIS' only limit to masquerading as a room system is its single BRI ISDN
interface. Having no RS-449 and RS-366 ports for external IMUXes, current
models a
re intended to play only in the single-line ISDN arena at no
more
than 128 Kbps. Since that is the most pervasive and fastest-growing conferencing
arena, this is not much of a handicap!
Film at 11
We tested four codecs in "room" systems and
six more that manufacturers package as desktop models. Results show that
contemporary videoconferencing systems operating in standards mode are,
well, pretty standard. Some, like PictureTel's 4000/ZX can crank out impressive
frame rates when the picture is steady, but all were brought to their knees
by the diabolical fine resolution and whirling pattern tests we put in our
benchmark. In the end, the codecs in desktop systems perform as well or
better than those in room systems. We've withheld top scores for the next
round of improvements, where you'll see more computational power and predictive
intelligence built into future video encoding processors.
If today's codecs are all pretty much the same, the finish and various op
t
ions
are what distinguish these products. Many of these items will
be detailed
in our Buyer's Guide on videoconference systems in our March 15 issue, and
we'll review them specifically later this year. We focus here, instead,
on what we think are the most significant issues for your purchasing decision:
price, audio support and video quality.
Price: What a Picture's Worth
Manufacturers of the room systems we
tested offer models that cluster in price around $20,000 and in the $40,000
to $60,000 range. For $20,000, you can buy a rollabout configuration with
a single 27-inch or better monitor, pan-tilt-zoom camera and user-friendly
control panel. The more expensive systems are usually built into executive
conferencing rooms. They have two pan-tilt-zoom cameras and a document camera.
Large monitors show what's coming from the other end and preview the camera
shots you're about to send. Better quality cameras on the high-end room
systems also contribute to perceived picture quality.
Room system manufacturers aim at specific market niches. VTEL suppo
rts telemedicine
with interfaces that can accept sound and video from clinical diagnostic
equipment. PictureTel and CLI have been traditionally strong in the administrative
conference room setting.
In the desktop arena, the tradeoffs are in price and computational load
sharing. The PictureTel LIVE 100 and VTEL Enterprise Series, each listing
for less than $5,000, put all the codec's computational power on circuit
boards that plug into a Windows-based PC. RSI's ERIS system (priced at $4,495)
is similar, but it sits externally on the desktop and can hook into either
a Windows or Macintosh PC via SCSI.
The Intel ProShare ($1,999 list, but you can get it for as low as $999 with
rebates from long distance and local telcos) needs computational support
from its Windows PC host, which should be at least a 486 DX2 or a Pentium
with 16 MB RAM. Apple PowerMacs require very little additional investment
for videoconferenci
ng. Our tests showed
the 7100 AV performs codec functions
very well with Quick
Time and the Mac's inherent video processing power.
Do You Hear What I Hear?
Poor audio can quickly give end users a
bad taste for videoconferencing. Our testing showed that all manufacturers
had excellent coder-to-decoder audio performance, even when using the G.728
audio compression algorithm. It's in the choice of microphones, noise suppression
and echo canceling that designers can make their systems sound extremely
good or horrible. Aggressive settings of noise cancellation, for example,
can make normal voices sound gargly and choppy.
The trickiest feat for manufacturers is to design microphones that can pick
up normal speaking voices from anywhere in the room, but not pick up loudspeaker
sound and feed your own voice back with an annoying delay. Before buying,
get whole-system demonstrations end to end that particularly test with a
variety of sound sources.
You must investigat
e one particularly important interaction between audio
and video, especially if you inte
nd to conference routinely at speeds as
low as 112 Kbps. H.320 defines a choice among three audio algorithms: G.711,
which requires as much as 64 Kbps bandwidth, but typically uses 56 Kbps;
G.722, which requires 48 Kbps; and G.728, which gives very good sound, but
uses only 16 Kbps. During a standard call set up, the audio algorithm is
negotiated first. Whatever bandwidth remains is allocated to the H.261 video
algorithm. G.728 audio is the best choice if the codec can support it. Wherever
possible, we tested each codec in G.711 and G.728 to demonstrate the resulting
differences in video throughput.
When Looks Are Everything
All but one codec we tested generated output
in "Full CIF," the best screen resolution (352-by-288 pixels)
defined for H.320 systems. This is only about half the NTSC resolution used
in the U.S., so don't expect studio monitor quality from your videoconferencing
syste
m.
The ProShare 200 using Room Video so
ftware (Intel's H.320 offering) looks
good
when viewed in a window no larger than one-fifth of your computer screen.
However, the ProShare image played through to a room system monitor will
look coarse and jagged. This is because the ProShare currently transmits
only in "Quarter CIF" (176 x 144 resolution). Intel introduced
its Room Video option only in the past year to satisfy customer demand.
Look for significant improvements in 1996.
The camera and monitor system(s) do the most to improve overall perceived
visual quality for any video conferencing system. As we learned from our
tests, it is the higher investment in controllable cameras with good optics,
large monitors and enhanced sound systems that distinguish the room systems
from the desktops, not the codecs.
We also learned that time base correction is an important concern if you
plan to play videotape through your conferencing system. Tape decks have
inherent fra
me-to-frame time-sync instability that modern monitors easily
correct. Some codecs, not as forgiving
, will transmit jittery or undulating
pictures from a tape source. Only one of our test participants had an obvious
problem with this, and was able to correct it before running the benchmark.
Another, the Matsushita-Panasonic Vision Pro KXC-M7500, appeared to transmit
the most stable benchmark picture. The company reports that this is because
it has "genlock" built into all of its video inputs.
Bumps On The Way To Videoconference
Like a golfer's swing, you can get many different opinions and absolutely
stated recommendations, but you must ultimately decide on your own what
is most comfortable for you.
One big issue is cost. Operating at 336 Kbps requires an upfront investment
in six 56-Kbps digital service lines or, if you can get ISDN, three BRI
lines. ISDN provides a small bonus. Each of the six channels (two per BRI)
can operate at 64 Kbps, so
in aggregate, ISDN can do 384 Kbps.
To get this aggregation, you also
need to invest in an invers
e multiplexer
(IMUX). The IMUX performs "bonding." It combines the separate
input channel bandwidth and delivers one sum-total channel with a clock
signal to the codec. For outgoing calls, the IMUX can accept dialing instructions
from the codec or look up the information from pre-stored call records.
Then it places calls on each of the individual lines, aligns and bonds them
before telling the codec "ready to go."
Inverse multiplexers that handle up to eight 56-Kbps DS0 circuits and deliver
up to 448 Kbps on output can cost as much as $5,000 for new equipment. However,
you will find very few sites with which you can communicate at those speeds.
Most sites equipped to operate at "high speed" choose 336 or 384
Kbps as the upper bound. ISDN-capable IMUXes for up to four BRI inputs typically
sell for less than $4,000.
Jerky movements can be notice
able in conferences conducted at 112 Kbps or
128 Kbps. The degree depends on how much motion is taking place. Codecs
achieve video compressions of as much as 1400:1 by analyzing each frame
in the transmission image and only sending information about what actually
changes in successive frames. "Talking head" pictures show up
well at low speeds-most movement is only around the mouth or eyes.
The sharpness of a conference image depends on a codec parameter called
the Common Intermediate Format (CIF). Only the frame presentation rate varies
as a function of available transmission speed. Low speeds cause low frame
rates when there is a lot of motion in the picture.
Conference system designers have found that for applications like training,
the most cost effective strategy is to invest in high-quality, large-screen
installations at participating sites, but to reduce transmission costs by
operating at only 112 or 128 Kbps between the sites. For applications like
telemedicine, the
more effective strategy may be to use 336- or 384-Kbps
transmission speeds.
The cost differe
nces between low
-speed and high-speed conferences become
most apparent in multipoint situations, when a bridging multipoint control
unit links more than two sites. In a recent conference arranged by the author,
four meeting sites within Wisconsin were bridged for two hours using an
MCU-operated by an in-state bridging service, Access Wisconsin. During the
meeting, all sites operated at 336 Kbps. Transmission and bridging charges
totaled $996. Had the same meeting been conducted at 112 Kbps, transmission
and bridging charges would have totaled $572.
Standards: Above And Beyond H.320
H.320 is often mistakenly applied to the entire suite of standards that
define how video conferencing systems from different manufacturers can interoperate.
There actually are five overall standards suites (see chart, below).
The figure on the next page shows how the H.320
suite applies to a standard
video teleconferencing terminal (VTT) that communicates via narrowband ISDN.
N-ISDN covers bandwidths u
p to 1.44 Mbps, but it is most commonly installed
as one or more 128-Kbps basic rate interface (BRI) lines.
H.261: The Video Encoding Standard
All but the new H.324 VTT, which
will operate over 28.8-Kbps public switched telephone network circuits,
use H.261 as the mandatory video encoding standard. A neat thing about H.261
is that it rigidly defines what the decoder must do with the digital video
bit stream, but it leaves many options open in the encoder.
Most video compression algorithms, including H.261, use Predictive Coding,
Discrete Cosine Transform (DCT), Motion Compensation and Variable Length
Coding techniques to transmit TV pictures of acceptable quality at very
low bit rates. Encoders are highly sophisticated image processors designed
to find redundancy in successive video frames.
Depending on how much computational ene
rgy the manufacturer wishes to invest
in a codec's encoder, its apparent intelligence or predictive capability
has no specified limits. Still
, any H.261 device installed today can decode
and present the improved image.
Common Intermediate Format (CIF) is a parameter that has a significant effect
on picture quality as well as the ultimate cost of an encoder. Most manufacturers
of room systems use "Full CIF," which provides a screen resolution
equivalent to 352 x 288 pixels. Many desktop system manufacturers can get
away with "Quarter CIF" (QCIF) if they choose to present the video
images in small 176 x 144 screen windows.
G.711, G.722, G.728 Define Audio Encoding
A modern VTT's audio codec
recognizes three ITU standards that define how the sound is encoded: G711,
G.722 and G.728. While arranging to connect videoconferencing systems that
operate in standards mode, presetting both units to the same audio protocol
is important, as is being sure that the c
alled unit can detect and adjust
to the caller's algorithm.
In the early days of H.320, particularly when channel bit rates of 112 Kbps
w
ere achieved by installing two Switched 56 circuits, a codec's 56-Kbps
Port A would be used by the audio codec in G.711 mode, and the 56-Kbps Port
B would be used by the H.261 video codec. G.711 uses a simple pulse code
modulation technique to achieve audio bandwidth up to 3 KHz.
G.722 was introduced to afford better audio quality (up to 7 KHz) with a
more sophisticated encoding technique: sub-band adaptive differential pulse
code modulation. This requires a bit rate no greater than 48 Kbps. Some
VTT designs that have Port A devoted to audio will give all of the available
speed, either 56 Kbps or 64 Kbps, to the G.722 encoder. Others will always
set up at 48 Kbps, and waste the remaining Port A capacity.
Modern H.320 VTTs are designed for the ISDN world. Most of those configured
for use as room systems can operate over communication ch
annels running
up to 768 Kbps, a few up to T1 speeds. The figure shows how these VTTs can
be connected to an inverse multiplexer (IMUX)-a device that
"bonds"
multiple ISDN BRI channels and presents one synchronous clocked signal to
the VTT's Port A or Port B. If an IMUX is available to provide higher communication
speeds (typically 384 Kbps), G.711 or G.722 audio algorithms can be employed
without significant effect. You still have 320 Kbps available to the video
codec, and that provides very good picture quality.
However, the most rapidly growing segment of the videoconference system
installed base throughout the world involves VTTs designed for basic rate
interface (BRI) ISDN at no more than 128 Kbps. When these have to operate
in G.711 or G.722, audio ties up half the available speed. The video codec
has to work very hard with the remaining 64 Kbps (or in many situations,
56 Kbps).
The recent G.728 audio standard is a major development that alleviates this
crowding. Us
ing a code-excited linear prediction algorithm, G.728 can provide
audio bandwidth up to 3 KHz using only 16 Kbps of the bonded communications
channel. Over clea
r channel ISDN, 112 Kbps remains for the video codec.
Dave Brown is retired from the University of Wisconsin and is presently
a consultant on videoconferencing. He can be reached at dave@dbec.com. We
appreciate the cooperation of Bob Perras, Craig Bluschke and UW-Madison
College of Engineering staff, who helped us with our evaluation.
The T.120 Sideshow: Electronic Document Conferencing
An important selling point for PC-based desktop videoconferencing systems
is that many include document conferencing or "shared whiteboard"
software. These packages are real productivity enhancers. For example, it
would allow two engineers separated by great distance to view and discuss
a product mock-up, while also having a shared screen window on one of their
CAD application programs and files.
Some videoconferencing systems we tested have whiteboard software as a side
offering when comunicating in proprietary mode with other systems from the
same manufactur
er. None can interoperate w
ith other manufacturers' systems
today. Tomorrow is a different story.
The generic video teleconferencing terminal (VTT) has a data communications
protocol module that will use the new T.120 overall standard for collaborative
applications, including desktop data conferencing, multiuser applications
and multiplayer gaming. T.126 is a specific standard for still image exchange
and annotation. T.122 and T.125 are for multipoint communication services.
T.124, generic conference control, combined with token passing, will handle
remote camera control. Today, one of the few good reasons to get all of
your videoconferencing equipment from the same manufacturer is this ability
to control "near side" and "far side" cameras from one
location.
All major manufacturers
have pledged support for the T.120 standard. Many
have already implemented elements of T.120 under the wraps of their current
data communications control modules.
February 27, 1996
|