Upcoming Events

Executive conference

Cloud Connect March 16-18

Comprehensive thought leadership for executives, IT professionals and developers. Topics include: the ROI, cost and economics of on-demand computing; Migration strategies to move from on-premise to cloud-based IT; Vertical cloud specialization, tailoring features and architectures to specific applications, industries, and customer ecosystems

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up
Technology Business Applications
R E V I E W  
Panning for Gold

  September 18, 2003
  By Sean Doherty


TOC Issue TOC
Printer Print full article
Printer Print this page
Printer Download as PDF
E-Mail E-Mail this URL
Discuss Discuss this article
flame author Flame the author
 
  In this article
arrow
Introduction
arrow
CSIRO Panoptic Enterprise Search Engine 4.2.0
arrow
Kanisa Site Search 5.0
arrow
Mondosoft MondoSearch 5.1
arrow
dtSearch Web 6.20
arrow
Executive Summary | Web Links
arrow
How We Tested
arrow
Report Card

Content is king, as valuable as gold. Enterprises create it, save it, license it and sell it. Wherever possible, they reuse data and even refresh it. But what happens when an important bit of data is misplaced? It takes time and money to search for, re-create or reacquire that information. Increasingly, companies are turning to enterprise search engines to index and manage content and avoid the costs associated with recovering lost data.

However, finding the money in the IT budget to buy a search engine can be tough. Search engines don't create content, so they may be perceived as a low priority. But at many sites, current content can be reused and even regenerated into profit. If you look at a search engine as an ongoing maintenance cost for content, you may just get the funds you need.

Perhaps the biggest advantage to using an enterprise search engine is that you can find many of your documents stored in multiple formats using a single product. As long as data can be displayed in text through a browser, it can be indexed and searched using an enterprise search tool.

These search engines also let you re-energize other systems. Processors driving file systems won't spend needless cycles looking for files or content in files. Databases won't have to crunch as many queries, and legacy systems will gain a new lease on life because they're not spending an inordinate amount of time in the search cycle. Better yet, you won't have to train your employees in SQL.


Search-engine software has two components: an indexer and the actual search engine. Indexers retrieve content, extract words and index them for fast retrieval. Engines interpret queries and locate words, concepts or phrases relevant to the question in the index, then format the output in HTML or XML and send it to the user or device that initiated the question.

We went looking for enterprise-class search engines--those that work behind a firewall or secure VPN. The vendors had to supply search-engine software or an appliance that supported it. We did not want it bundled with portal software or content-management software. Our contestants also had to be able to search both structured data in databases and unstructured data on Web servers and file stores. And we required support for a variety of document formats, including word processing and presentation and graphics editors.

We required indexers to retrieve content from secure Web pages (HTTPS) and standard HTTP servers and file systems, and to remove duplicate pages. We also required them to extract words from HTML, XML, Microsoft Office and PDF documents, and index the content. Finally, they had to support ODBC or JDBC (Java Database Connectivity) connectors or gateways.

As for the search engines, we asked that they include a spellchecker and support for phrase searching and stemming (grammatical variations) in addition to keyword searching. We also required a prebuilt search form or user interface to test the indexers and search engines.

We sent invitations to 11 vendors. Four stepped up to the table: CSIRO (Commonwealth Scientific and Industrial Research Organisation), Kanisa, Mondosoft and dtSearch Corp. Each sent software products to our Syracuse University Real-World Labs®.

The companies that dropped out, declined or just didn't qualify ran the gamut from small to large. Copernic Technologies didn't qualify because its product doesn't support ODBC or JDBC. Autonomy Corp. and EasyAsk declined to participate but gave no reason. Convera, Dieselpoint and Fast Search & Transfer each said it is working on a new version of its software and declined. Both Verity and Google declined to participate on the basis of company policy, though Verity was changing its policy as this article went to press.



Navigational Searching
click to enlarge

As for our four contestants, we tested their ability to satisfy navigational searches by using Network Computing's production Web site (www.nwc.com), which contains almost 35,000 pages (see "How We Tested,"). We also tested indexing and searching capabilities using informational searches taken directly from the log files on www.nwc.com. Three of the four products we tested performed above average. Only dtSearch came in under par.

We judged the search engines on their ability to retrieve content using an indexer, also called a spider or crawler. We put a heavy emphasis on the search process, including how much control the administrator could assert, and assessed the amount of control that could be applied as well as the overall performance in navigational searches. We also looked at each vendor's management console and how it accomplished installation, configuration and customization tasks on the search-engine portion. And we considered log files and reporting capabilities. Prices were compared across the board.

Panoptic Enterprise Search Engine won our Editor's Choice award. Its secure and easy-to-use administrative interface, navigational deftness and indexing prowess put it on top.


start top Introduction CSIRO Panoptic Enterprise Search Engine 4.2.0 

Best of the Web

Data deduplication: Declawing the clones

Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.

Quick Read

Compression, Encryption, Deduplication, and Replication: Strange Bedfellows

One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.

Quick Read

WAN Optimization Whitelists and Blacklists

Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.

Quick Read

WAN Optimization as a Managed Service: It's Not About the Cost

This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.

Quick Read

  Sponsored Links

Premium Content

Data Centers Gone Wild
February 22, 2010

NWC


Salary

Video