Microsoft Speech Server 2004 Lets You Speak to IT

You can develop Web and interactive voice-response applications from a single platform with Microsoft Speech Server 2004.

May 7, 2004

5 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Speak to IT

You'll need two computers to develop and deploy MSS speech apps. I installed the Microsoft Speech Application SDK onto a Windows 2003 Server (1,400-MHz dual-Pentium III, with 1,024 MB of RAM) and housed the MSS on another Windows 2003 Server (1,133-MHz Pentium III, with 2,048 MB of RAM) with an Application Server Role to enable Internet Information Server 6.0. Once apps are developed using the SDK platform, they're deployed to MSS and an ASP.Net Web server (aka IIS 6.0).

MSS hosts speech apps and their resources, including voice-prompt databases and grammar files. The Web server generates the app's Web pages containing HTML and SALT to end users over a telephone or Web interface (see diagram, page 32). In my tests of these functions, I had to jump through a number of installation hoops, but I cleared each of them with little effort.

The product supports a range of Intel Dialogic voice-processing telephony cards to interface with PBXs and telephone switches. I installed a Dialogic D/41JCT-LS (four-port analog) card into the MSS server and set up a hotline to it--that is, I connected the Dialogic card to a standard telephone using a telephone-line simulator that provided dial tone, ringing and DTMF (Dual Tone Multi-Frequency) tone detection. I drove the Dialogic card with Intel's Dialogic Speech Platform Software Release 1.0. After installing the driver for the card, I used the DCM (Dialogic Configuration Manager) to make adjustments.

With the DCM, I turned on the CSPExtraTimeSlot parameter to support CSP (Continuous Call Processing). CSP reserves time slots to send echo-canceled data over our CT (computer telephony) bus. I also used the DCM to nail the firmware file (D41JCSP.FWL) to the board's configuration.

Speech Server ArchitectureClick to Enlarge

TIM (Telephony Interface Manager) is Microsoft's term for the abstraction-layer interface that sits between the MSS TAS (Telephony Application Services) component and the telephony card system drivers. After executing patches specific to MSS and a VB (Visual Basic) script to update ASP.Net settings, I installed Intel's TIM, called NetMerge Call Manager. Using NetMerge, I configured all four ports on the Dialogic card as inbound trunk channels to receive calls. With the hardware set, I installed MSS.

The only difficulty I had installing MSS was entering the license key and creating a custom MMC (Microsoft Management Console) for MSS to manage the SES (Speech Engine Services) and TAS. The SES enables speech recognition and output for voice apps, and it includes automatic speech recognition, ScanSoft's TTS (text-to-speech) engine, and a prompt database engine that manages recorded and TTS prompts. TAS provides the communication layer between the telephone system, SES and Web server, and manages audio input and output. It includes a SALT interpreter to parse imbedded SALT speech tags.

After starting up the services, I picked up the hotline to the server. MSS answered, "Welcome to Microsoft Speech Server." With that, I turned my attention to the SDK.

No Speech Impediment

The SDK includes speech controls, grammar and prompt tools, sample apps, and speech add-ins for Internet Explorer and Pocket IE. To build and debug speech-enabled ASP.Net apps, MSS needed a platform with Visual Studio.Net 2003. In addition, MSS required me to install IIS, IE 6.0 (SP1), ASP.Net 1.1 and MS Enterprise Instrumentation on the Windows 2003 Server.From VB.Net, you can start a project by selecting the Speech Web Application template and choosing either voice-only or multimodal application settings or modes. Each project can include a Telephony Application Simulator that mimics a telephone for debugging purposes.

Rather than start from scratch, I modified a sample voice-only app and installed it. First, I opened a sample application for user input and output. A GUI popped up, displaying a menu of speech controls designed to answer a call and set up a speech session, such as an incoming call on a PBX.

Basic speech controls play recorded speech and recognize voice, dial tone and computer input, such as a mouse click. The GUI leads the caller through a dialogue and elicits responses. For example, the dialogue might begin, "If you would like to make a reservation, press or say '1.' "

Good

Bad

MICROSOFT SPEECH SERVER 2004, Standard Edition: $7,999 per processor; Enterprise Edition: $17,999 per processor. Microsoft Corp. www.microsoft.com/speech

Next, I recorded my own questions and answers, using the SDK's Speech Prompt Editor and Prompt Validation Tool for debugging:

Q: Would you like to speak to an editor about convergence?A: Yes | No

Q: Do you want to discuss voice, video or data?

A: Voice | Video | Data

I added this information to speech controls designed to obtain input for credit-card numbers and expiration dates. After saving the modified sample app, I was ready to port it to MSS.

In Visual Studio, I created a Web Setup Project and included our app as a solution for deployment. This was as easy as creating a .zip archive in WinZip. I then clicked "build" to create MSI (MS Installation), setup.exe and setup.ini files that I exported to MSS. Once the app was installed on the MSS server, I configured TAS to run it (ASPX file) as the start page when answering incoming calls and Web requests. After starting and stopping the service, I picked up the hotline with my credit card to talk convergence.Cheap and Easy

With MSS, your enterprise can build killer talking apps in no time. The cost to get started with the standard edition is a drop in the bucket when you figure you can develop your own speech tools for telephone and Web services all from one platform, without the need for third-party consultants. For application-load distribution and redundancy, however, you'll want the pricier enterprise edition.

Sean Doherty is a technology editor and lawyer based at our Syracuse University Real-World Labs. Write to him at [email protected].

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights