Panning for Gold

These search engines also let you re-energize other systems. Processors driving file systems won't spend needless cycles looking for files or content in files. Databases won't have to crunch as many queries, and legacy systems will gain a new lease on life because they're not spending an inordinate amount of time in the search cycle. Better yet, you won't have to train your employees in SQL.

Search-engine software has two components: an indexer and the actual search engine. Indexers retrieve content, extract words and index them for fast retrieval. Engines interpret queries and locate words, concepts or phrases relevant to the question in the index, then format the output in HTML or XML and
send it to the user or device that initiated the question.

We went looking for enterprise-class search engines--those that work behind a firewall or secure VPN. The vendors had to supply search-engine software or an appliance that supported it. We did not want it bundled with portal software or content-management software. Our contestants also had to be able to search both structured data in databases and unstructured data on Web servers and file stores. And we required support for a variety of document formats, including word processing and presentation and graphics editors.

We required indexers to retrieve content from secure Web pages (HTTPS) and standard HTTP servers and file systems, and to remove duplicate pages. We also required them to extract words from HTML, XML, Microsoft Office and PDF documents, and index the content. Finally, they had to support ODBC or JDBC (Java Database Connectivity) connectors or gateways.

As for the search engines, we asked that they include a spellchecker and support for phrase searching and stemming (grammatical variations) in addition to keyword searching. We also required a prebuilt search form or user interface to test the indexers and search engines.

