At the recent NetWorld+Interop conference, I met with two companies that are trying to help find creative ways of indexing the ever-increasing volume of web content. The first of these, LinkGuard (http://www.linkguard.com) has a webcrawling indexer, which captures only the links on each Web site, instead of the actual content itself. This reduces the time involved to do a full index on the Web, which can take anywhere from months (common to many content indexers) to about ten days--an eternity of difference in Internet time.
Though much of their software is Open Sourced, the real value they provide is held within their 4 Terabyte database of links. Covering most of the available web sites and content, this information is organized as a topological map of the entire 'net. LinkGuard will sell you, for instance, a list of all links anyone has made to your site, or, alternately, a list of who's linking to your competitor's sites.
More globally useful is a free "agent" program they offer, which can sit on your server. If someone attempts to access a nonexistent page or a broken link, it notifies you and the person browsing. But most cleverly, it also notifies LinkGuard. This updates their master database, providing both the client and customer with valuable information. There's quite a lot of potential here--definitely a company to watch.
Another clever use of limited web indexing is Rulespace, Inc.'s Contexion (http://www.rulespace.com). Like LinkGuard, it indexes meta-information--not the content itself, but in this case, it simply stores a categorization of the type of sites and pages within those sites. RuleSpace has a staff of information specialists who work specifically on standardizing categorization taxonomies, and they use a neural-network-based engine, in theory, to categorize sites automatically and accurately. In a demo, the Contexion engine did an impressive job of correctly categorizing new sites. Like LinkGuard, the real value of a service like this is lies in its database--an accumulated topological map of meta-information.
Both of these services aim to solve what I call the "jar of screws" problem, which currently afflicts the World Wide Web. If I have a jar of mixed, unsorted screws, and I need to find a particular kind, I'm most likely going to buy a totally new package of screws, rather than sorting through all of the screws in my jar in the hopes that I can find the right one.
The value of having the item close at hand is totally negated by my inability to quickly find the specific one I need. The value of the informational content contained in the Web is directly related to your ability to quickly and easily find exactly what you need. Obviously, search engines are struggling with only limited success to categorize and index the huge, growing mass of data on the Web. That's why focused, meta-information-based services like LinkGuard and Contexion could help make the difference between a jar of random parts and having exactly what you need to do the job.
 |

Links on the Web:
Speaking of links on the Web, take a quick look at a very interesting IBM research report which suggests that the Web may not be nearly as interconnected as we think:
http://www.almaden.ibm.com/almaden/webmap_press.html
Other applications which made a particular impression at N+I were:
-
1) Appstream (http://www.appstream.com), which does real-time streaming for Java applets. It's similar to how audio and video streaming make it unnecessary to wait for the entire download before starting to work with data. (This product won our Best of Show award in the Network Applications category, as well as the People's Choice award.)
-
2) Radiant Logic's RadiantOne (http://www.radiantlogic.com), a datastore which looks like a standard LDAP directory, but which also stores and allows simple access to records in diverse databases and other data sources.
and
-
3) AmikaFreedom.com (http://www.amikanow.com), a service which provides one-stop access to multiple POP3 accounts. It automatically filters, reformats and forwards e-mail to whatever portable devices (2-way pager, PDA, WAP phone, etc.) you happen to be using. This last service is still in its infancy, but personally, I'd pay good money for a service which could simplify my e-mail access to multiple accounts from multiple devices.
|
 |
 |
| |
Back to the Trenches
Finally, as of this issue, I'll be switching gears from being a Technology Editor to a Contributing Editor, as I return back to the development world. I'll be coordinating all of the Web-related development for Fairfax County Public Schools in Virginia, including content management and application development, as well as some wireless data, distance learning, videoconferencing and other interesting initiatives. I'm looking forward to some exciting challenges there--like being up to my elbows in code again, keeping that real-world focus that makes Network Computing so unique among technology industry magazines. So you can bet I'll have some good war stories soon. See you in the trenches!
Send your comments on this column to Richard Hoffman at rhoffman@nwc.com.