Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

FEATURE STORY

The Paperless Office

by Michael Hurwicz

T he paperless office. We've talked about it for years, and most of us are no nearer to it now than when we started talking.

One reason: Most of the heavy-duty, expensive document imaging solutions have been based on Unix, while Windows and NetWare have been the standards for office LANs.

We've been told that our puny Windows clients and NetWare servers couldn't handle production-level document imaging. Nevertheless, a few intrepid vendors took up the gauntlet and built document imaging products based on NetWare Loadable Modules (NLMs) and Windows. We tested three such products--Compulink Management Center's LaserFiche Windows/NLM, PaperWise's ImageWise and Imagery Software's (a subsidiary of Eastman Kodak) GroupStore.

We wanted to test two others: Simplify Development (Mailroom for Windows and ShareScan) and Lanier Worldwide (IMSONLINE). Both chose not to participate.

The three products share the same basic functionality: scanning, indexing and retrieving. However, they differ in functionality, ease of installation, configuration and use, and reliability and performance.

LaserFiche and ImageWise are neck-and-neck, with ImageWise being a bit less expensive. Yet they differ significantly, most notably in their support for the three basic ways of indexing, searching for and retrieving documents.

All three use templates--a collection of index fields associated with a document. Users fill index fields on templates with keywords that identify the particular document. Only Ima geWise supports automated creation of keywords. Only GroupStore lets you reuse index field definitions from one template in another.

Only LaserFiche offers full-text indexing--translating document images into text via optical character recognition (OCR), making every word a keyword. LaserFiche is also unique in letting you locate documents by browsing a graphical tree of folders and files, much as you would with the Windows File Manager.

LaserFiche's graphical browsing and full-text indexing make it especially well suited to general office filing, while ImageWise's automation for template-based indexing makes it faster and easier for processing masses of paper. GroupStore, despite many good features, including reusable index fields, is not as mature as the other two products.

PaperWise
ImageWise

If we had to process lots of paper quickly, ImageWise would be our choice, particularly if its automated keyword creation fit the particular application. True, ImageWise has no full-text indexing, and it took significantly longer than the other two products to chug paper through the scanner. However, we think an accelerator card, such as a Kofax Image Processing Controller, could well eliminate that difference. Even without an accelerator card, the total person hours required for a given quantity of paper will be lowest with ImageWise, if you can use its help where the majority of the person-power typically has to be applied: in indexing.

Separate scanning, indexing and retrieval modules also make it very easy to set up an efficient document imaging assembly line. Batches of documents are dynamically queued and assigned to indexing stations as the operator at each station requests the next batch. Neither LaserFiche nor GroupStore does that.

ImageWise also offers the best template-based retrieval. It was the fastest retriever in the bunch, once index information had been entered. More important for most users is the "query by thesaurus" feature that lets you enter keywords by selecting fr om a pick list consisting of either the last 10 keywords entered for that field or any 10 keywords you select beforehand. In contrast, LaserFiche just keeps the most recent keyword as a default for the next search. GroupStore doesn't even do that--although it does "remember" the template type of the most recent search. Query by thesaurus can often nearly eliminate keystrokes.

How It Works

ImageWise has five modules: scan, index, query, database manager and integration. PaperWise does not provide its own NLM, but it uses Novell's Btrieve NLM, along with the BREQUEST TSR module on the client. You create a database and index fields with the database manager, populate the database by scanning, supply index values for each document using the index module, retrieve documents using the query module and hook into Novell/SoftSolutions' document management software using the integration module. Everything except Btrieve runs on the client.

PaperWise offers four companion products for ImageWise. Paper-Route provides workflow functions, including document routing, task assignment and flow reporting. HyperDrive provides hard disk-based caching to speed up retrievals from an optical jukebox. DataWise provides Computer Output to Laser Disk (COLD), a function that indexes documents, such as reports or invoices, that were created directly on the computer. DisplayWise provides integration with DOS applications and terminal emulators, so that information on the screen can be used as index values to retrieve images.

We had several days' worth of trouble configuring Kofax Image Products' KF-920 Software Document Processor, which ImageWise requires. Once that worked, we had minor problems--some places where the user interface was nonintuitive or error checking was weak--but no show stoppers.

No OCR, Limited Retrieval

ImageWise lacks OCR. You can use third-party OCR on images exported from ImageWise, but the resulting text file is not connected with the original image, and it doesn't give you any in dexing or retrieval capability. (ImageWise version 4.0, which should be available by the time you read this, displays DataWise COLD data side-by-side with images. In addition, ImageWise version 4.1, available later this year, will offer integrated OCR. Novell/SoftSolutions can also provide full-text indexing.)

In general, ImageWise's retrieval functions are somewhat limited. For instance, each database can have only one set of index fields, rather than multiple templates and folders, which the other products offer. You can define only 10 indexes for a database. Although this may keep users from creating unwieldy, slow indexing systems, it might also keep ImageWise from handling some complex filing tasks.

There are no Boolean searches, just an implied "AND" between index fields. Finally, there is no graphical tree browsing interface.

Automated Indexing

The feature that impressed us the most was automated template-based indexing, implemented via "scan flags." Scan flags let ImageWise create keywords automatically during the scanning process. For instance, scan flags can be used to:

  • Increment a numeric index field.
  • Cause an index field to retain its value across multiple scans.
  • Define a key combination that changes a particular index field to a predefined value. For the next image, the index field returns to a default value or the previous value, depending on how you set it up.
Scan flags will only work when index values are repetitive or (for numeric fields) incremental. Those are severe limitations. Still, they can really speed up template-based indexing where you can make them fit. In contrast, LaserFiche only retains the previous value of each index field as a default within a given scan job, and GroupScan will assign a single set of keywords to all images in a particular batch.

Compulink LaserFiche
NLM/Windows

For general-purpose office filing, LaserFiche is our choice. You may not be able to process mounds of paper as fast as you can with Image Wise. However, convenient retrieval is our primary criterion for office filing, and that's where LaserFiche shines. Full-text indexing is a big plus for many office applications, too.

LaserFiche consists of just two modules: the Windows client program and the NLM. You scan, view images, OCR, fill in index fields and retrieve with the client module. The NLM does file management, full text indexing, all searches and security administration. There is also a short menu of functions available at the file server console, including several monitoring functions, enabling and disabling logins and indexing, and indexing all text documents. LaserFiche really stood out in the installation department, because it required almost no tinkering compared with the other two products.

Retrieving Images: Easy and Effective

We liked LaserFiche's graphical tree for retrieval. Ninety percent of the time we could browse through folders, easily and intuitively, and find what we wanted without entering any index information.

Index-based retrieval was still available as an alternative, though. In fact, LaserFiche's index-based retrieval was superior, because it has Boolean searches, which make it easy to get exactly the documents you want in a single search operation. It has fuzzy searching, which finds instances that are close to the specified keyword instead of requiring an exact match. This feature is especially good if you're going to do full text searches on OCRed documents, because OCR typically inserts numerous mistakes. Fuzzy search could keep you from having to spell check all your OCRed text.

Comprehensive but Manual Document Management

Considering that template-based indexing was a manual process (no reusable index fields when creating templates, no automated creation of keywords), it was quite easy to scan a batch of images, separate them into multiple documents and index them. It would have been even easier if we could see multiple images simultaneously. Instead, LaserFiche shows a list of n umbers, each number representing one page. You only see one image at a time (version 3.0, due by the time you read this, will show multiple images).

Compulink also sells Template Wizard, a product that lets you fill in templates from databases and reuse parts of one template in another. We did not test this, however.

If you import multiple text documents, you can full-text index all of them in a single operation, initiated at the file server console. Bulk indexing, combined with fuzzy search, makes LaserFiche a good choice for managing nonimage documents like customer service records or reports. Importing and full-text indexing are about as automated and convenient as could be (unless, perhaps, they could be ongoing or regularly recurring, instead of manually initiated).

LaserFiche also has application-based security. With other programs, users can access image files through Windows File Manager or DOS, and potentially delete or alter them. With LaserFiche, users don't even need to login to the file server. LaserFiche determines access to folders, files and permitted operations. Files and folders are invisible if you don't have appropriate access rights. ImageWise cannot offer this type of security, because it doesn't have its own NLM through which to implement it. GroupStore could do it, but doesn't.

Imagery GroupStore

GroupStore is really still a version 1.0 product. Despite some unique and well-implemented features, it's not quite ready for prime time.

GroupStore consists of three network services plus a shared scanning program (GroupScan) and a client module (Imagery for Windows). The latter two are Dynamic Link Library-based (DLL). The network services include the Mass Storage Service (MSS), the Image Management Service (IMS) and the Document Management Service (DMS). All three are NLMs that run only on NetWare 4.x. MSS and DMS also include DLL-based administration programs that run at client stations.

MSS provides hierarchical storage management. IMS provides services for applications, such as fax servers, that use GroupStore functions. DMS provides an electronic document filing system. We tested DMS and MSS. We couldn't test IMS because only one product, Biscom's Faxcom fax server, was IMS-compatible at the time of testing, and both Imagery and Biscom declined to provide a fax server.

False Starts

We installed GroupStore easily enough, and that was the end of our easy times. To be fair, most functions worked smoothly, and some were really slick, but we struggled for weeks to get MSS going. It seems that the MSS administration program was incorrectly deciding that it was not on a network. Imagery sent us a patched version of the MSS administration DLLs, hard-coded to run on a network.

On another occasion, we were unable either to access or delete a particular document. At the same time, we were unable to create a folder. A DMS error message said the database might be corrupt and we should exit and reload the database. That didn't help, though. The ultimate solution was to wipe out the existing database and reinstall DMS. We were unable to duplicate the problems.

Scanning and Indexing: Too Time-Consuming

In addition to bug fixes, GroupStore could use some improvements in its front end and possibly some tuning of its back end. For instance, there are no automated indexing functions and no drop-down pick lists for keywords. You have to type in every keyword, every time, even if they're very repetitive. That's a pain for low-volume environments, untenable for high volumes.

If you use GroupScan--the only way to create multiple documents in a single scan--the procedure for indexing the documents requires you to retrieve and save each document with Imagery for Windows. That's unnecessarily time-consuming, especially since retrieval functions can be somewhat clumsy.

In addition, GroupStore doesn't have full-text indexing. Text files created via the integrated OCR do not remain associated with the image files. Nor can you index or retrieve an imag e file using words in the text file. (LaserFiche, in contrast, does keep an association between the image and the text file.) On the other hand, DMS Administrator makes creating templates easier with reusable index fields for templates.

Also on the positive side, GroupScan is the most efficient way we've seen to create multiple documents with a single scan, because it can separate documents based either on a page count or by detecting a blank sheet that acts as a document separator. With the other two products, you have to separate documents manually, either explicitly (with LaserFiche) or by applying appropriate indexing information (with ImageWise).

Clunky Retrieval/Cool Display Retrieval functions are generally a bit primitive and clunky. There is no graphical directory tree. There's also no Boolean searching, just an implied "and" between fields specified. For instance, if you want to look at magazine articles that have either "Imaging" or "Scanners" in the "subject" field, you'll have to do two separate searches.

Retrieval can also be slowed by disappearing pick lists. For instance, if you want to look at all documents with "Imaging" in the subject field, you do your search, select a document and view it. When you close that document and go to open another one, you find the pick list has disappeared. You have to perform the same search all over again!

On the other hand, some of Imagery for Windows' display functions are really good. For instance, it can display up to four pages at a time per document, and you can drag and drop pages within a document or among documents. You can delete, add and shuffle pages around.

Neither of the other products can do this. ImageWise doesn't even track documents as such. Whatever pages are returned by a particular search can be considered a document for the purposes of that search. So, although you can display as many pages as you want and delete any currently displayed page, you move or shuffle pages only by changing index values. For instance, you migh t define each multipage invoice using an "invoice number" field. All pages with the same number in the "invoice number" field belong to the same invoice--about as nongraphical an approach as we can imagine.

LaserFiche is somewhat closer to Imagery in its approach to manipulating pages within an existing document. Using its batch processing function, LaserFiche lets you move pages within a document or to a new document but not among existing documents. You can also delete pages. LaserFiche represents only one page at a time as an image. It represents others by numbers. (We reviewed LaserFiche version 2.3. The next version of LaserFiche is supposed to have a feature similar to Imagery's.)

Other Pluses

After you OCR a page, Imagery for Windows pops you into a "proofing editor" that highlights possible errors and gives you an enlarged view of the original at the questionable spot. ImageWise doesn't do OCR. LaserFiche just uses a simple proprietary text editor for editing and doesn't highlight potential trouble spots.

Neither of the other vendors provides hierarchical storage management. Then again, MSS will work with LaserFiche and ImageWise, too. Its only requirement is storage on a NetWare volume, including magnetic and optical media (but usually not tape drives). Another potential advantage is the fact that GroupStore is designed as a multivendor platform. Time will tell how many third-party software vendors use IMS, DMS and MSS services. Finally, Imagery uses the NetWare Directory Services.


Michael Hurwicz is an independent consultant and technical writer specializing in LANs and imaging. He can be reached on CompuServe at 74777,1616.

Vendor Information

ImageWise v3.4. $6,995 (five users). PaperWise, (800) 790-8324, (801) 261-8850; fax (801) 261-8842.

LaserFiche NLM/Windows , $7,995 (five users). Compulink Management Center, (310) 212-5465; fax (310) 212-5064.0, imaging@ix.netcom.com

GroupStore , $9,995 (50 users, single-server license for DMS, MSS and IMS); Imagery for Windows, $1,679 (five users); MassStorage Service, $5,995 (unlimited users, single server); GroupScan, $1,995 (one user). Imagery Software, (617) 275-7700; fax (617) 280-9710.


How We Tested Imaging Performance

Our NetWare 4.1 file server was a Zenith Z-Server 60-MHz Pentium EISA PC with three 1-GB SCSI disks and 90 MB of RAM. We ran Novell's LANalyzer for Windows network analyzer software v 2.1 to measure network traffic. We also used the LANalyzer software as a stopwatch, by capturing packets relating to a particular operation, such as retrieving a file, and noting the packets' time stamps. Server processor utilization percentages were taken at the file server console using the Novell's Monitor program.

Our primary imaging workstation was a 50-MHz 486 EISA PC, a Compaq SystemPro/XL with 1 GB of SCSI hard disk and 16 MB of memory. The scanner, attached to the Compaq via an Adaptec 154CF SCSI host adapter, was a Fujitsu M3097Gm.

We performed two comparative tests, for scanning and retrieving. We noted server processor utilization, total elapsed time and network traffic measured in kilobytes--both total and peak--for each operation. Indexing proved resistant to a quantitative approach, since the user interfaces and procedures were so different for the three programs.

Scanning : Scanning 60 pages took between two and nearly four minutes and raised the overall processor utilization level during that whole time. It also caused occasional sharp spikes.

The most important scanning measures are usually total time elapsed and typical processor utilization. Scanning takes so long that it can't put any huge load on the network. Imagery's GroupScan was the performance champion for scanning, although it beat Compulink by less than 15 percent, and typical server utilizations were quite comparable. Amazingly, GroupScan also put only about half the load on the network that the other two did and yet was competitive in speed--faster than ImageWise and only about 12 percent slower than LaserFiche. GroupScan is one part of Imagery's GroupStore that is definitely ready for prime time.

Note that all tests were done without an accelerator card. Besides speeding all three products up, such a card would tend to reduce the differences in scan times.

Retrieval : For the retrieval test, we retrieved one image from a database of 100 images, each of which was indexed with a single index value. This operation took only a few seconds and resulted in a single flurry of processor activity.

For retrieving, elapsed time is usually the primary concern, but both server and network loading can be significant issues. In this area, PaperWise was most efficient, with the shortest elapsed time and the lowest loads both on the server and the network. Compulink came in second for all measurements and Imagery, third.

Conclusions : What do these tests show about the dreaded ability of imaging applications to overload networks? When retrieving documents, we saw peak network traffic from one workstation of about 40 to 80 KB per second. Even at the lower rate, a mere nine workstations simultaneously retrieving images would bring 10 Mbps Ethernet to nearly 30 percent utilization. Of course, users don't sit there and retrieve images continually. Nevertheless, LAN managers should monitor traffic for these applications, as the potential for overload is definitely there.

The surprise for us, however, was high server processor utilization. True, our server was only a 60-MHz Pentium. Nevertheless, we didn't expect processor utilizations pushing up to 70 percent and even beyond 90 percent for a single workstation (compared with a baseline of 1 to 4 percent with no imaging activity). Our best guess as to what accounts for differences in elapsed times: PaperWise uses the KIPP (KF-920) software from Kofax. The software may be optimized for the Kofax image processing boards, not for straight SCSI. The other two, whos e times are similar, use driver software from Pixel Translations. Differences in both CPU utilization and amount of data transferred (which roughly follow one another) may relate to the complexity of the database tables maintained at the server to track scan jobs. For instance, only Compulink maintains a hierarchical tree structure, into which scans must be placed. Some data must be exchanged to indicate where this scan belongs in the tree. PaperWise has some built-in complexities, too. For instance, it must check for "scan flags" used for automated indexing. With Imagery, in version 1.0, there may be less for the client and server to talk about.

The numbers for document retrieval seem to relate directly to the complexity of the database NLM. PaperWise uses only Btrieve, which is quick and efficient. Compulink has its own NLM, which implements a complex relational database. Imagery uses NetWare SQL--a notorious resource hog.


Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers