Digital Reef, a startup backed by venture capital, came out of stealth mode this week and began offering a software platform to search, classify, index, and manage large amount of unstructured data to help with challenges like e-discovery and storage optimization. It is jumping into a market that is crowded with hundreds of vendors seeking share in one of the few markets still showing strength in the down economy.
The company says its product will stand out in the crowd for several reasons: its architecture, its ability to create large indexes across a variety of documents and languages in a small amount of space, and its "similarity engine" and auto-classification capabilities that can find correlations among content. Digital Reef uses a tiered architecture with a cluster of systems, including a gateway into the system, a tier of job routers, and an analytics tier, says Brian Giuffrida, vice president of market and business development. "I can add any number of engines on any tier to scale the performance. That is unique to us," he says. Another factor, he says, is the job router understands when an indexing job fails and can restart it on the same machine or another one, rather than starting the entire process over.
Digital Reef also searches out all documents and other unstructured data, and uses descriptions to build the index. It can recognize and reconstruct email threads and uses pattern recognition to identify things like Social Security numbers or vehicle identification numbers or source code to find connections. The company similarity engine, which Giuffrida describes as the "main intellectual property" of the firm, can understand the context of information and "near duplication. Exact duplication is easy. Near duplication identification is hard," he says. The software will then rank the relevance of various pieces of content to each other so help in finding batches of data on the same topic or in the same thread.
The software also can do automatic classification of data without the need to train the system or seed it with examples. "We create a virtual file structure to represent all of the data in an organization and we can group things based on natural patterns and content. We also expose the top terms that cause a grouping to exist to help people understand why a documents is in that group," he says.
Other enterprise search and e-discovery vendors say their products and services offer similar capabilities. Digital Reef offers a more complete feature set than what is currently on the market, Giuffrida says. "And our indexing is very efficient. We only take up 25 percent of the original storage volume, not the 100 percent or 200 percent that other use," he says.