![]() |
|||||||||
| C O L U M N | |||||||||
Indexing the Web May 29, 2000 By: Richard Hoffman At the recent NetWorld+Interop conference, I met with two companies that are trying to help find creative ways of indexing the ever-increasing volume of web content. The first of these, LinkGuard (http://www.linkguard.com) has a webcrawling indexer, which captures only the links on each Web site, instead of the actual content itself. This reduces the time involved to do a full index on the Web, which can take anywhere from months (common to many content indexers) to about ten days--an eternity of difference in Internet time. Though much of their software is Open Sourced, the real value they provide is held within their 4 Terabyte database of links. Covering most of the available web sites and content, this information is organized as a topological map of the entire 'net. LinkGuard will sell you, for instance, a list of all links anyone has made to your site, or, alternately, a list of who's linking to your competitor's sites. More globally useful is a free "agent" program they offer, which can sit on your server. If someone attempts to access a nonexistent page or a broken link, it notifies you and the person browsing. But most cleverly, it also notifies LinkGuard. This updates their master database, providing both the client and customer with valuable information. There's quite a lot of potential here--definitely a company to watch. Another clever use of limited web indexing is Rulespace, Inc.'s Contexion (http://www.rulespace.com). Like LinkGuard, it indexes meta-information--not the content itself, but in this case, it simply stores a categorization of the type of sites and pages within those sites. RuleSpace has a staff of information specialists who work specifically on standardizing categorization taxonomies, and they use a neural-network-based engine, in theory, to categorize sites automatically and accurately. In a demo, the Contexion engine did an impressive job of correctly categorizing new sites. Like LinkGuard, the real value of a service like this is lies in its database--an accumulated topological map of meta-information. Both of these services aim to solve what I call the "jar of screws" problem, which currently afflicts the World Wide Web. If I have a jar of mixed, unsorted screws, and I need to find a particular kind, I'm most likely going to buy a totally new package of screws, rather than sorting through all of the screws in my jar in the hopes that I can find the right one. The value of having the item close at hand is totally negated by my inability to quickly find the specific one I need. The value of the informational content contained in the Web is directly related to your ability to quickly and easily find exactly what you need. Obviously, search engines are struggling with only limited success to categorize and index the huge, growing mass of data on the Web. That's why focused, meta-information-based services like LinkGuard and Contexion could help make the difference between a jar of random parts and having exactly what you need to do the job.
Finally, as of this issue, I'll be switching gears from being a Technology Editor to a Contributing Editor, as I return back to the development world. I'll be coordinating all of the Web-related development for Fairfax County Public Schools in Virginia, including content management and application development, as well as some wireless data, distance learning, videoconferencing and other interesting initiatives. I'm looking forward to some exciting challenges there--like being up to my elbows in code again, keeping that real-world focus that makes Network Computing so unique among technology industry magazines. So you can bet I'll have some good war stories soon. See you in the trenches!
Send your comments on this column to Richard Hoffman at rhoffman@nwc.com.
| |||||||||












