Xerox Researchers Unveil New Document Management Technology

Scientists at Xerox's Research Centre Europe in Grenoble, France, announced Thursday that they've come up with new classification software clever enough to "read" an electronic document, decide how it should

February 27, 2004

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Scientists at Xerox's Research Centre Europe in Grenoble, France, announced Thursday that they've come up with new classification software clever enough to "read" an electronic document, decide how it should be classified, then automatically route it to the right person's e-mail address or an online document management system.

The unnamed technology -- Xerox refers to it as a categorizing tool -- is available now, and can be licensed by enterprises that want to incorporate it into existing document systems, as well as by third-party software vendors in the document management, customer relationship management, and information retrieval markets, said Xerox.

The Xerox tool, said Eric Gaussier, a researcher at the Grenoble facility, uses a hierarchical model able to understand the dependency between multiple categories, unlike so-called "flat" search and retrieval tools which treat each category separately. Biochemistry and biophysics, for example, are closely related -- and so-treated by Xerox's solution -- while flat retrieval systems would consider them separate and thus not cross-link documents in each.

The result of this approach, he said, is faster, better searches, and a virtual hands-off approach to digesting and disseminating digital documents throughout an organization.

In the pilot program that Xerox ran with the Swiss Institute of Bioinformatics, an academic nonprofit foundation, "their traditional search engines for medical articles often presented the most pertinent documents at the end of the list," said Gaussier. "Using our software, they were much more successful at finding what they were looking for, and typically had to browse less than half of the list to find the information."Xerox's new software, written in Java, and suitable for deploying on Unix, Linux, and Windows, is the result of four years of steady work in linguistic modeling, semantics, and machine learning, said Gaussier.

It can be used "out of the box" by adding it to existing document management applications created by an enterprise, he added. In that approach, "with a set of categories already established, the software take documents already categorized and using our models, 'learns' how to automatically classify new documents"

In a fresh environment not already equipped with a document management and routing solution, Xerox's tool walks users through the process of creating categories, then classifies documents as part of one or more of those categories.

In either case, the technology is bright enough to learn new categories on its own as it comes across additional documents. "After a while, if the system doesn't cover all the new topics that have emerged, it will tell you where it's not up to date," said Gaussier, and dynamically suggest new categories.

Able to handle documents written in up to 20 different languages, it also servers as an automatic router, shunting categorized documents to the right person -- via e-mail attachments, for instance -- based on a pre-set user profile that administrators establish."This can be used, for example, to route incoming mail to the person responsible for a given topic and eliminate mail in your inbox you aren't interested in," said Gaussier.

"Imagine clients' complaints going directly to the person responsible for handling them and your e-mail inbox containing only what you're interested in."

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights