|
Maybe You Have A Case Of The BLOBsb y Tom HendersonWhen I was a boy, I went to the movies to see Steve McQueen spray liquid oxygen at a huge red gelatinous alien beast, the Blob. The Blob grew larger as it attacked and ate its victims. In a similar way, data BLOBs not only eat network bandwidth, but network storage space, too. BLOBs, or Binary Large Objects, are big single files of noncharacter data. BLOBs include graphics files, still images, sound files and other data types that don't have any "normal" identifiable characters. Where do BLOBs come from? Sound boards, scanners and fax servers have spawned BLOB sightings on networks everywhere. Graphics -- once vices and now habits -- can dim network bandwidth in ways that network designers plainly didn't conceive of even three years ago.
Identifying SymptomsThe symptoms of the BLOB problem are network congestion. The network response slows down for 10 seconds or so, then suddenly returns to normal, as though someone had stuck a foot on the network garden hose. The goal is to make either the network garden hose foot-proof, so to speak-it's possible through isochronous flow control that prioritizes BLOB data-or to increase the diameter of the hose. Newer protocols like 100- BASE-T, VG-AnyLAN, TCNS and Asynchronous Transfer Mode offer fat network pipes that we can only dream of, since many of us have already sunk large mounds of cash into slow Ethernet and Token-Rings.A third method (and a better way of dealing with the water) is to shrink it, say, by compressing it; methodologies are explained later.
Inside the BLOBMost BLOBs are composed of comp ressed rasters: rasters are XY grids of visual data. If you've ever looked at the dots on a printout from an older dot-matrix printer, you can understand the idea of a raster. There's a grid of positions, horizontal and vertical. On other rasters, each XY point can also have other attributes such as intensity and color. The attributes multiply the size of the file by the value possibilities of each attribute. Color turns mere scanned documents into hard disk pigs.There are some types of noncharacter data that aren't typically thought of as BLOBs but still qualify. Those include files made from video streams and other data types that need their own timing or are timed together. Data types needing their own timing are referred to as isochronous data. Isochronous data must be timed together; they use a timing reference that's external to the normal timing of network flow controls; the standards are a function of the standards associated with a the medium through which the analogized data that comprises multimedia objects. For example: Video timing is controlled by the Society of Motion Picture and Television Engineers (SMPTE). Interrupting an isochronous data stream by normal network events can prevent correctly timed data delivery.
Other BLOBsUltrasound streams, Magnetic Resonance Imaging (MRIs) and seismological data are other examples of large data masses that aren't typically thought of as BLOBs. Multimedia presentations often combine BLOBs with multimedia data streams (like accompanying video) into a scripted sequence of events that makes the presentation. Video clip files qualify as BLOBs in some ways, but, in others, they are simply incredibly huge datafiles.Think of blobs as large files of noncharacter-based data. Humongous files of any kind of data can be BLOBs, but the "Object" part of the term usually defines data considered to be an aggregation or object, and the "Binary" part tends to connote noncharacter or nontokenized data.
How Big the BLOB?BLOBs are made by typ ographical graphics, document imaging, networked fax machinery, video sources, sound devices and presentation graphics software packages. BLOB compression is usually done at the workstation, where the BLOB is initially introduced as data to the network, unless the data needs to be represented as a whole to read characters from the data (as in OCR).BLOBs are compressed at the source or interface to the PC at the time they become digitized. Two kinds of compression are popular: Software compression that uses the workstation's CPU to do the work of compression, and hardware compression that uses a Compression Decompression Chip (CODEC) or silicon equivalent to off-load data compression from the workstation's CPU. Hardware compression helps in workstation configurations without multitasking vastly, and it will become ubiquitous downstream. BLOBs can be compressed from as little as 4:1 to as much as 200:1, depending on the data type and how the data lends itself to compression algorithms. Compression higher than 50:1 is tough to achieve without quality and/or fidelity loss. Large standards bodies, notably the IEEE and ANSI/ISO, have come together to agree on different compression standards for data interchangeability and repeatability. (See "Bigger And Bigger BLOBs," above.)
Caging the BLOBsBLOBs are not usually indexed, because they have no common objects (like characters) that can be used to group them together. They do not lend themselves to normal network database storage, although that is improving. Database vendors now accommodate large discrete data types from within the database, instead of using pointers to find them outside the database. Oracle's "long" datatype, an example of the datatype that Oracle's RDBMS used for BLOBs in older editions, went from just 64 KB in Oracle 6, to more than 2 GB in Oracle 7. Emerging on the scene are object databases designed to handle objects of many sizes, including BLOBs. However, when a BLOB exceeds the capacity of network resources, the type of sof tware managing the BLOB isn't as important as its delivery mechanism.BLOBs have traditionally been tagged with other character-based identifiers, and stored as separate entities or in databases that can accommodate the fatuous data sizes associated with BLOBs. Now, they live inside, and within the next few years, products like Oracle's Media Server should be able to recognize the objects (like a brown pair of shoes) in BLOBs in the same way that optical character recognition (OCR) software discerns characters from a scanned document. Say thank you to NASA satellite object recognition technology.
IndigestionTransmitting BLOBs across a network is usually an aperiodic event-the events usually happen randomly and without discernible frequency. Digesting BLOBs often takes network bandwidth that wasn't even dreamed of when a network was originally designed. Rapid access to images can tax even the most flexible network infrastructure.BLOB-making software -- especially imaging workstations that store directly to a network -- must be carefully controlled. It's for that reason that very high-traffic workgroups are often purposely isolated from other networks and their traffic. That means using purposefully independent network segments, high-speed network cards, lots of workstation and file server driver tinkering, and the best compression values that can be tolerated. BLOB traffic today is usually localized to a specific workgroup or LAN. The large amount of data (even with compression) makes communicating BLOBs across a distributed network or WAN expensive. In the NetWare world, several different methods help BLOBs make it through the network garden hose efficiently when the water cannot be shrunk anymore. Packet Burst and Network Least Shortest Path (NLSP) protocols allow long streams of data -- which are what BLOBs are made of -- to be transmitted most efficiently. Packet switch/forwarding devices (Kalpana, SMC, Lannet and so on) can also redistribute network bandwidth to free up traffic on individual network segments. The larger the network bandwidth, the smaller the background traffic, and the largest, supportable packet sizes will increase BLOB throughput. But the best place to start taking control of BLOBs is at their source, where BLOBs can be compressed to their smallest usable form. Find the best usable compression that still allows flexible, timely access to the data that the BLOBs represent. Compression at the workstation pays off in numerous ways: less data to transceive, less overall storage tax, more workstations per network segment, less cost to transceive BLOBs over WANs. Graphical user interfaces and the general public fascination with multimedia objects mean that BLOBs won't go away. Planning for high network utilization by workgroups pays off handsomely when inventive and creative types get interesting ideas. Reinventing network response is one of the most expensive of network costs. Standards are crucial. Although localized networks can stand a bit of anarchy, there are problems in sending incompatible formats through an enterprise network, where a third format is often created as an intermediate data type. This leads to a problem called Alice's Restaurant Syndrome, where everyone has lots of storage space and never needs to take out the trash until the area reeks of rotting data. Then you may need to call in a virtual Steve McQueen and have him freeze up the growing BLOBs with some liquid oxygen of your own. Tom Henderson is vice president of engineering for Unitel, of Indianapolis, IN. He can be reached at 76711.737@compuserve.com.
|











