Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analysis: Enterprise Search: Page 6 of 26

What's different is its algorithm. The beauty of PageRank is that it enlists millions of humans to do what a program does poorly--make subjective decisions. Because Google's PageRank algorithm works by assigning a higher relevance to Web pages that have a higher number of pages linked to them, it deals in objective data. It's extremely easy to write a program that makes objective decisions. Humans, on the other hand, are better at making subjective decisions. So, if 100 people link to page A, which contains content Z, but only five people link to page B, which contains content Z, chances are, page A has more accurate information about content Z and should thus have a higher ranking than page B. Taking the example one step further, page B might have more occurrences of keywords, and keywords might be bold and in headings more often, but page A will still have a higher relevance.

Problem is, the Web and a typical enterprise's cache of data are totally different animals (and Google may be the only vendor in both games right now). Consider the task of indexing the Web and returning relevant results. The Internet consists mostly of structured documents. When parsing an HTML file, it's easy for the parser to determine which words are highlighted--bold, italicized, headers--and to give these terms higher relevance. The parsed text is placed in an inverted index, which is used by a query engine to find matching results.

HTML documents also are structured in the sense that they contain links to other HTML documents. This second form of structure, using link analysis--essentially peer review--is the basis of Google's PageRank algorithm.

Before link analysis became mainstream, strict keyword searches did an OK job of finding relevant documents. But keyword searches are easy to fool by adding hidden text to HTML pages, and simply counting the number of times certain keywords appear in a document does not give a good measure of relevance. Link-analysis algorithms help solve both problems.

Of course, there will be far fewer results returned in the enterprise. In addition, end users often will have a good idea about document names, and your file servers are most likely partitioned by department, further reducing the number of documents users must wade through. The search products we tested have a number of automatic and configurable features that will help improve search-result relevances. All were easy to install and configure, with even the most challenging taking less than an hour.