SpeechSC and MRCPv2

This vendor-neutral standard will allow any voice application to control network-based media resources like speech synthesizers and recognizers. But it's getting the cold shoulder from some significant players.

January 19, 2007

6 Min Read
Network Computing logo

Anyone who develops, deploys or uses a voice application knows the benefits of speech processing; the technology enables functions such as sending e-mail or instant messages over the corporate PBX with a cell phone using TTS (text-to-speech) technology. However, setting up these capabilities isn't easy, and a standard method of processing and controlling audio streams across network resources has been conspicuously absent.

The IETF's SpeechSC (Speech Services Control) working group is out to fix that problem with MRCP (Media Resource Control Protocol) version 2. The specification will allow any voice application to control network-based media resources, such as speech synthesizers and recognizers. The working group's ultimate goal is to encourage the development of--and lower the financial bar to--new speech-enabled applications.

Speech-processing vendors such as Nuance Communications and Voxpilot are on board, as is Cisco Systems, and MRCPv2 speech engines are already coming out, despite the standard still being in development.However, the standard is getting the cold shoulder from some significant players. Microsoft hasn't taken an official position on it, and it's one of the few heavyweights with a stake in the speech-processing market not to commit. Most major PBX vendors haven't committed to the standard either, though we suspect that MRCPv2 will quietly gain support from PBX vendors through their relationships with speech-processing vendors.

Simplifying Speech

Without MRCPv2, a voice application programmer would need to develop specifically for each vendor's speech engine. In some cases, a software vendor may have different interfaces for each product it sells, further increasing confusion. To combat proprietary interfaces, Cisco, Nuance and Speechworks developed MRCP in 2001; their effort took place initially outside the domain of the IETF.

The first version of MRCP assumed a highly centralized voice-processing implementation and didn't address such key aspects as SI (speaker identification) and SV (speaker verification). And according to Cantana Technology's Eric Burger, co-chair of the SpeechSC group, MRCPv1 suffered from scalability, security and protocol engineering problems. SI can be used to integrate telephony into other systems; for example, if employees are in a Web conference with a PSTN bridge and other employees are connected to the conference using telephones, speaker-ID services can display to the Web users who is speaking. SV services can be used biometrically, for password controls, for example.

In 2002, the IETF's SpeechSC group was formed to address all those issues, thereby standardizing interfaces for TTS, SI, SV and speech-recognition engines. The basic framework of SpeechSC is described in RFC 4313. MRCPv2 is the implementation of the framework and is expected to be ratified by the IETF before April.MRCPv2 is built on existing VoIP and voice protocols such as SIP (Session Initiation Protocol), RTP (Real-Time Transport Protocol) and VoiceXML. The SpeechSC working group's co-chairs, Burger and David Oran, a Cisco Fellow, are also on the SIP Forum board of directors. Burger is also a member of the VoiceXML Forum MRCP Committee. Thus, the people responsible for MRCPv2 have the capability to work with other voice and VoIP influencers.

Continue Reading This Story...

Who's Involved?

Cisco, IBM and more than a dozen other vendors have contributed to MRCPv2. Most of the participants in the working group represent speech-processing companies.

Microsoft is notably absent from the list of SpeechSC participants. Microsoft Speech Server--which is based on the company's own communication protocol, SAPI (Speech Application Programming Interface)--is an important component of the company's ongoing unified communications strategy. Speech Server, for example, lets an Exchange 2007 user access e-mail from a mobile phone using speech recognition and TTS capabilities. But there's little mention of MRCPv2 from Redmond. None of the company's speech team bloggers are talking about it, and Microsoft has made no official commitments or criticisms.

Timeline

Click to enlarge in another window

SpeechSC CommunicationsClick to enlarge in another window

Microsoft's stance toward MRCPv2 is not an indication of a failed or failing standard; the company's history with VoiceXML is evidence of that. Microsoft initially committed to SALT (Speech Application Language Tags) for Speech Server. In April 2006, the company announced it would fully support both SALT and VoiceXML in Speech Server 2007. One perspective is that the market was swinging toward VoiceXML, and Microsoft saw the need to support it; a more cynical analysis is that Microsoft tried to rule with SALT, and the market rejected it. At any rate, we suspect the industry will favor MRCPv2, and Microsoft will come around. If not, it's possible that a SAPI-to-MRCP translator could be developed, effectively addressing the issue of Microsoft's nonparticipation.

Adoption Benefits

Vendor adoption of MRCPv2 will make it easier and cheaper to develop speech apps and therefore will increase the potential market for speech processing. However, the standard also makes it easier to switch engines, which is beneficial for an IT department, but questionably so for a voice processing vendor. Nuance believes a bigger market is more beneficial than a locked-down market and anything that grows the industry is a good thing for all speech vendors. The speech industry is a relatively small community and hasn't had a period of explosive growth; vendors are hoping MRCPv2 will add fuel to the fire. On the other hand, MRCPv2 should make it easier for a vendor to sway developer support toward its own engine. Finally, Cisco, IBM and other vendors that use, but don't necessarily develop, speech-processing technology stand to benefit from the standard since, by having a standard interface across all engines, it will be easier to develop apps, change engines and find programmers.

Michael J. DeMaria is an associate technology editor based at Network Computing's Syracuse University's Real-World Labs®. Write to him at [email protected].0

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights