Speech Recognition Comes of Age

System builders can develop a new customer base by installing speech technology, and training users to actually make it productive. (Courtesy of TechBuilder.org)

January 30, 2006

10 Min Read
NetworkComputing logo in a gray background | NetworkComputing

System builders on the lookout for new products and services to add to their catalog should consider a technology which has come-of-age: speech recognition, or SR for short. Long a mainstay in sci-fi films like Star Wars and 2001: A Space Odyssey, SR technology is finally ready for the real-world, thanks in large part to the Dragon software suite from Nuance Communications, the primary focus of this Recipe.

SR offers an alternative to keyboarding and the mouse as the means for interacting with a PC. Basically, SR works by letting systems equipped with a source of sound input (such as a microphone) interpret human speech for transcription.

But there is one caveat for system builders. While installing SR software is simple enough, most SR users need a good deal of help transitioning from the keyboard to the microphone. And that's where the business opportunity comes into the picture for you.

SR Markets for System Builders

With SR software installed, a user with the proper training and some experience can input text at about 130 words per minutes (wpm) with 90% accuracy. On a keyboard, that same user might struggle to reach 30 wpm with comparable accuracy. Also, making corrections with SR is as simple as highlighting the incorrect word and then restating it.Yet even with this speed advantage, there's been no stampede to adopt SR. There are at least two good reasons. First, most people are conditioned to "think with their fingers," and have become accustomed to traditional keyboarding; for them, dictation is an alien activity. What's more, according to Howard Parks, founder of Highland Park, IL-based Microref Systems, an SR consulting firm, most people cannot mentally compose text faster than 50 wpm anyway, a far cry from that promised 130 wpm.

Still, markets do exist for SR. Doctors and lawyers' offices are a good place to start, since both have traditionally used dictation—typically with a transcription service or secretary—as a way to save time. Only about 2% of all doctors and lawyers use SR now, estimates Microref's Parks, making them a potentially fertile market for system builders.

Another viable market for SR is victims of motion impairment, such as carpal tunnel syndrome, which makes keyboarding painful. Speech recognition could restore productivity for them almost immediately, and such users would be highly motivated to master the system. And motivation is important. Without proper support and coaching, about three-fourths of SR users eventually go back to their ingrained keyboarding habits, Parks estimates. But with coaching, most users press on and make speech their default input medium.

I recommend that any system builder who plans to pitch SR to customers should get comfortable with SR themselves first. Plan on using it yourself on a regular basis for several weeks. By thoroughly learning the software yourself, you'll be able to offer combined installation, configuration and coaching services to those who can benefit from SR.

IngredientsTo build an SR system, you'll need the SR software, of course. You'll need an adequate PC system. And although the software comes with a microphone, I recommend that you add a higher-quality microphone for better results. Here are some options:

* Software: Dragon NaturallySpeaking (DNS, version 8) from Nuance Communications Inc. of Burlington, MA, has no serious competitor in the U.S. market. Their software comes in four versions:

  • Standard, for $99.

  • Preferred, for $199.

  • Professional, for $799.

  • Medical or Legal, either for $1,099.

The Standard version is best for home use; it works with most word processors. Preferred adds Excel support, some graphical control support, dictation playback, and support for handheld recorders. Professional adds macros and a host of other features. The Medical/Legal packages add specialized vocabularies.

Here's a shot of DNS Professional (version 8):

* PC System: Minimum requirements for running the NaturallySpeaking software includes an Intel Pentium III processor running at 500 MHz; at least 256 MB of RAM (with at least 500 MB of free disk space); Windows XP; a Creative Labs Sound Blaster sound card or equivalent; and a CD-ROM drive. Of course, newer machines will have much faster processors and much more hard disk space than is required, plus a sound card and CD drive built-in.

The major configuration option will be the RAM. The Dragon software requires 256 MB of free RAM space, which means you need at least 512 MB. A full gigabyte is even better. Without enough RAM, the SR software is still accurate, but it will run so slowly that users will likely find themselves getting ahead of the machine. Also, switching between windows (to edit and navigate) may be painfully slow.

* Microphone: The NaturallySpeaking software comes boxed with a headset microphone that is fine for a quiet office. But for noisy offices (which means most) I recommend a better microphone. You can expect to pay about $50. One I like is the Emicrophones VXI TalkPro Xpress. It's a solid headset mic that retails for about $44. Here's a look:


Other types of microphones are available, such as goosenecks or array microphones. But to cut out extraneous office noise due to ventilators, fans and air conditioners, a headset provides the best results, I've found.

The Lowdown on Software Recognition

After installing the SR software (explained below), your customer must learn two things: The basics of using SR, and then how to get comfortable with it. It's the second step—getting comfortable—that is the biggest hurdle, since the user must change their mindset instead of merely learning a few new keyboard commands. The good news is that a system builder can provide add-on coaching services, assuming they've mastered the SR software themselves.

Installation

It's easy. Simply load the Dragon CD and follow the directions.

After the Dragon SR software is installed, the user will be asked to don the headset mic and read aloud from one of several canned texts. The computer will listen along, highlighting the text as it recognizes the user's progress. This beginner's training will probably take no more than five minutes.The Dragon software will then offer to analyze the user's text files to adapt to the user's writing style. The software will scan all Word, WordPerfect, HTML, RTF and TXT files it finds in the My Documents directory. Be aware that this process can easily take 30 minutes or so. Also, the directory may not contain suitable documents. Especially when installing on a client's machine, it might be wiser to use the Dragon Accuracy Center (described below) to perform the analysis at another time, and after suitable files have been identified for analyzing. After installation and the brief training are completed, the user can begin inputting text.

The Dragon software is resource-intensive, and it's best to avoid multi-tasking. So first, close all other applications running in the background. If response times are still slow when using it within a word processor (such as Word), dictating to Dragon's own internal word processor—called DragonPad—usually gives good results. Text can be pasted to there from other docs.

Learning SR Software

The user must learn to correct recognition errors. There will always be a few errors, but this is made up for by the fact that corrections performed via SR can be much faster than by keyboarding. Also, making corrections correctly will enhance recognition accuracy—and a 97% recognition rate is possible within a couple weeks of use.

There are two ways to enhance recognition: Either make proper corrections as you go along, or feed Dragon examples of your writing.When the user sees a word that has been incorrectly recognized, they highlight it by simply saying "correct" and then the word. For example, if the software misspelled the word "mischief," then the user would say, "correct mischief." The software will then present a list of alternatives for the user to choose from. If a desirable alternative is not listed, the user can then spell the word by simply saying "spell that."

If the user wants to change a word because of a change of mind, they say "select," then the word they want to change, and then the new word. For example, to change "buy" to "acquire," they'd say, "select buy acquire." Or they can use the mouse to highlight the word to be changed. (But it's best not to go to the mouse too often—after all, we're trying to get the user weaned off the old methodologies of inputting!)

The following screen shot gives a glimpse of the correction process in the DragonPad word processor. For instance, the user only has to say "choose one" to make the correction, and Dragon can adjust its recognition database accordingly. You'll note other errors awaiting correction, such as "top" for "tongue," "did I" for "did not," and "its" for "it." Typically, short, common words are more likely to contain errors than are longer ones.


As the user makes corrections, the Dragon software stores the new pronunciation data so it can use it later to enhance recognition. This enhancement does not take place in real-time, however. At intervals, you need to go to the Acoustic Optimizer under the Accuracy Center option of the Tools section of the Dragon menu bar. However, running the Acoustic Optimizer may take more than an hour, so it should be done during off hours. This option can also be set to run automatically.The Accuracy Center inputs lists of words, documents, or whole subdirectories of documents, which it will scan for words to add to its vocabulary. This is similar to the process that takes place at the end of the installation procedure (which is really a generic function for beginners). But in this case, using the Accuracy Center gives the user total control over which files the software should examine. For users who intend to compose e-mail via voice, the Dragon software can also perform similar scans of e-mail files.

If the software repeatedly stumbles over a particular word, it's possible to perform special training on that particular word by using the Train command under the Words option in the Dragon toolbar. Simply type in the word, and then pronounce it as you would normally.

After learning to make corrections, the user can next learn to use voice commands to control applications. Using the mouse is always an alternative, but mixing modes will disrupt the train of thought and may not be an option for the motion-impaired. On the other hand, using the mouse is preferred when dealing with visual controls, such as icons.

Getting Comfortable with SR

Learning to switch gears from keyboarding to dictation is a bigger hurdle than learning the intricacies of the software. Most users either have no experience with dictation, or have dictated only to another person.You might pretend you're a newscaster on a local TV news station. Speak with the kind of care and precision they use. Slips of the tongue, slurring, or mumbling will result in the verbal equivalent of typographical errors. Concentrate on enunciating each sentence clearly, distinctly and evenly, paying little attention to what's happening on the screen (until you stop to edit). If you're stumped for the next thing to say, use the Go To Sleep command to shut down the microphone momentarily.

Coaching Services

When working with customers, use the first session to get the user set up. Make sure they are using the microphone correctly. Get them to perform the system training correctly. And ensure that they are basically productive with the software. This could easily take three hours. At least two more one-hour sessions over the next couple of weeks are advisable. During these follow-up sessions, make sure the user is employing the accuracy features and mastering the software.

Congratulations, you've just learned the basics on software recognition and why you should add it to your product and service line. You'll be shouting from the housetops when those new checks clear!

LAMONT WOOD lives in San Antonio, Texas, and has spent more than two decades freelancing in the high-tech field for publications ranging from Scientific American to trade magazines in Hong Kong. He has also written eight books, including Your 24/7 Online Job Search Guide, E-Trepreneur (with Sherry Szydlik), The Net After Dark, Get On-Line!, and Bulletin Board Systems for Business (all published by Wiley).

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights