Like that of MondoSearch and dtSearch, Panoptic's sample user interface can be configured from the administrative interface. But without any configuration, the advanced-search form contained entries that leverage author and title metatags. You can also refine your query to search within your results if you receive too many hits. Panoptic supports all the major standards for metadata, including the Dublin Core. Our other participants support metadata but do not detail their support. And you also can limit your search by document type and date.
To begin the search process, you create a collection--a finite set of Web pages to index and search. If you have logical divisions in your Web content, you can distinguish them by collection to facilitate search and retrieval. For example, you can create separate "collections" distinguished by content type: news, sales support. This can narrow a user's search and increase the number of relevant documents returned.
We created a Web collection by giving it an external display name "Network Computing Magazine" and a unique internal name "nwcmag." Then we identified the collection as our Network Computing production site. As with Kanisa and MondoSearch, you can confine the content collection to specific pages such as those on the www.networkcomputing.com site or its alias www.nwc.com. That way, the Web crawler will not detour and follow off-site links. You can also limit the discovery depth from the starting URL. All four search engines in this review support deep link limitation.
Panoptic supports its own Java-based crawler, called FunnelBack. When you set up a Web collection, you define how a crawler will gather data for the search engine to index. In the advanced settings, you can directly edit a collection configuration file that contains the options for FunnelBack. For example, you can limit the length of time the crawler runs. You can also configure a maximum number of pages to store, limit the number of clicks (links) away from the home page and define many other settings. We excluded a file type to disregard Netgravity links. All the crawlers have a similar feature that excludes certain directories or files from a crawler's scrutiny. This is in addition to following the directives in a robots.txt file.
FunnelBack took just less than nine hours to crawl our production Web site and index 34,720 documents--more than any other participant. Once it completed the crawl, Panoptic made the results to the collection immediately available to the default search form.
Because Panoptic does not provide a preview or prepublishing database--Kanisa or MondoSearch do--to test before going live, it has two options that protect you from putting a partially collected database into production. A changeover-percentage option specifies a minimum size to make a newly gathered collection available vis-ˆ-vis the collection it is replacing. In addition, Panoptic has a "vital_servers" option, which prevents an update from overwriting your production database if a server is down during the collection process.
Panoptic's easy-to-use administrative interface set it apart from Kanisa and MondoSearch. In addition to setting parameters, you can use a form to update collections using the crontab file; this is a multistep process for Kanisa and MondoSearch. Panoptic also has extensive log files, but does not provide the reporting that Kanisa does.
Panoptic Enterprise Search Engine, CSIRO (Commonwealth Scientific and Industrial Research Organisation). +61-2-6216-7060. www.panopticsearch.com