Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Spoken Word Search Analyzes Audio Content: Page 2 of 5

One big potential enterprise benefit of spoken-word search is that the content creators--those users producing the podcasts--could bypass the metadata creation and manual transcription of audio files, which have been the conventions followed by companies requiring text-searchable audio files. So the technology represents a significant advancement for a niche enterprise need. It should infiltrate the menu of standard Web searches over a relatively short period. TVEyes thinks this will happen within 18 to 24 months, and we concur.

Search Techniques

Searching multimedia content today is primarily done with the equivalent of 1970s technology. The search requires keywords or metadata, or it relies on extrapolating information from a Web page. If a topic or phrase is mentioned in a podcast but doesn't appear on an associated Web page or within metadata, a standard search on that topic will not produce the podcast as a result.

With the emerging spoken-word search, the audio portion of a multimedia file is "listened to" by the search engine. An index--not necessarily a word-for-word transcript--is built by converting the spoken words to text using one or more voice-recognition algorithms. TVEyes uses at least eight engines; the algorithms can look at vocal inflection and signatures to guess an unclear phrase. The resulting index is text-searchable data. Unlike conventional voice-recognition software for PCs and telephone systems, these spoken-word search engines don't attempt to learn speech patterns and require an extensive library of words, phrases and accents. Background noise and music are ignored, though overlap between these sounds and spoken words will reduce accuracy.