SOTAs: The Telephony Code

Service orientation simplifies the building of telephony-enabled applications, but pitfalls still abound. Here's what vendors are up to and what you should look out for.

August 1, 2006

20 Min Read
Network Computing logo

Most companies implement VoIP systems to meet a variety of short-term goals. But the most significant potential for VoIP is the long-term capability for integrating voice services into the enterprise data services architecture. Once businesses convert their te-lephony services into packetized networking services, those telephony services can then be folded into and used to transform business applications.

This is the underlying principle behind SOTAs (service-oriented telephony architec-tures). These telephony/data hybrids enable Web developers to invoke telephony com-mands by calling well-understood Web services.

As with the virtualization of human resource and database systems, a SOTA allows IT to better get more out of an IP PBX: The numbering system used in a telephone, for example, is a unique naming convention that could, in theory, be used by other appli-cations to address devices and users. The corporate phone directory is a massive data-base that could also play a fundamental role in identity management. The presence server that forms part of a SIP service could be leveraged in other applications as well.

By encouraging corporate and commercial application developers to develop against these interfaces, IT can enrich the application experience. Calling on a generalized presence engine that's constantly updated by different sources, for example, could reduce the development time in creating ad-hoc workflow systems. At the same time, these systems extend the value of IP PBX platforms into the rest of the enterprise.

Little wonder then that IP PBX vendors are pushing to create service-oriented inter-faces into their telephony systems. These efforts include Avaya's February, 2006 intro-duction of its Application Enablement Services and Sphere Communications' February release of Sphericall, as well as Siemens' and startup BlueNote's expected introductions of their SOAP-based interface by year's end.To understand the dynamics behind these product introduc-tions, the following special report dissects the coming wave of SOTAs. In particularly we'll detail:

Programming Interfaces — Several programming interfaces already exist for embedding VoIP within applications. Here's what SOTAs have to offer.

The SOTA Architecture — We identify and explain the three layers in all SOTAs.

The Standards — Market activities aren't based on nothing. There was plenty of standards work in the field before ven-dors jumped in. Here's a look at what's available.

The Players — Whose delivering what? The short list is very short and the biggest vendor in the market had yet to make their position clear. We looked at the WSDL files of what's available today and give you the read on their architectures.Our Take — How's this all going to play out? We suggest a picture of what to expect from the SOTA marketplace.

While SOTAs may revolutionize businesses applications there's also plenty of hype. Vendors are promising capabili-ties that aren't yet available. Read on as we explore the facts and hype of this new technology.

Dave Greenfield

Editor, NetworkingPipelineWell over a dozen computer-telephony APIs are commonly used in the industry to-day, although they're usually tied to specific platforms or computing models. In broad terms, these interfaces can be broken down into the following four basic categories:

Local APIs: These interfaces allow an application running on a computer system to communicate with local telephony devices and services through a local connection. Some common interfaces in this space are Microsoft's Telephony API (TAPI) for Windows, the Novell/Lucent cross-platform Telephony Server API (TSAPI) for networked applications, and the Java Telephony API (JTAPI) for Java-based apps. European Computer Manufacturers Association's (ECMA's) CSTA (computer-supported telephony applications) is widely used by telephony equipment providers and third-party software developers and generally falls into the local API category. Another legacy interface is the Simplified Message Desk Interface (SMDI), which defines a serial-line protocol for phone systems to use when communicating with voice-mail systems.PSTN/Carrier APIs: These interfaces allow applications to communicate with devices and services on modern public networks, such as digital cellular and ISDN networks. Due to the limited-access nature of these networks, these interfaces are gen-erally only used by companies that provide services on those networks--the telephone companies themselves, or third-party firms that provides things like stock quotes and alerts to customers. With a few exceptions, PSTN/Carrier APIs are not commonly used by organizations. The two main interfaces in this category are Parlay/OSA from the Parlay vendor consortium, and the Java APIs for Integrated Networking (JAIN).

Standardized Network Protocols: Data-network telephony protocols such as H.323 and SIP provide a variety of control services, such as call setup and management, as a necessary part of their functionality; these control services can be incorporated into data-centric applications if the application developer is willing and able to implement the protocol stack into the application directly. In this scenario, the applications do not use APIs to interface with telephony devices and services, but instead act as direct peers to the devices and services on the voice network.

Vendor-Proprietary Interfaces: Apart from the common interfaces described above, there are also some vendor-specific proprietary interfaces, which are also sometimes supported in software packages and development tools. Two examples are Cisco's Skinny Call Connection Protocol (SCCP), which is widely used on Cisco VoIP gear, and Avaya's Communications Manager API (CMAPI), which is used for connecting third-party gear to Avaya's soft switch.

The Problem With Today's APIsWhile these low-level CTIs are widespread use, they tend to be platform specific and hence are generally not useful for distributed applications. Meanwhile, SIP ostensibly offers a platform-neutral control syntax and provides raw access to many critical functions. But while SIP does indeed work well for many things, the protocol also requires integrating data-centric applications into the voice network--a non-starter for many developers.

Web-service CTI interfaces, on the other hand, provide an abstract service-control layer that is distinct from the application and voice service layers. It therefore provides a generalized interface that data-centric applications can use to tap into the voice ser-vices (and vice versa) without forcing them to become peer devices. Through a single API, applications can talk to devices on SIP, H.323, possibly even PSTN networks, without having to program to each of those networks. Instead, applications need only communicate with front-end devices, such as an IP PBX or a gateway, and let those devices perform network-specific functions.Conversely, developers can also tap into data-centric application services through the same kind of abstraction model, using the same type of tools. For example, a voice-centric application that also ties into e-mail servers or business-process servers through Web services (assuming those systems have the appropriate interfaces) could bring that data into the resulting CTI application through a common interface. Trying to do this with SIP, or some other voice-specific interface, means embedding agents into ap-plications in order to effectively integrate the apps into the voice network.

Another point of consideration: There is no SIP-to-SOAP binding defined in standard form. Although there have been multiple stabs at this, none of these efforts has resulted in a standardized binding, so you'll be on your own if you want to operate at that layer.

Theoretically, wrapping traditional communications services into well-defined XML messages, which are transferred across SOAP, will allow organization to bring telephony and messaging services directly to their applications. Better still, Web services also have the potential to provide an abstract interface into the telephony service, regardless of the underlying telephony protocols; Web services can provide interfaces to devices on SIP and H.323 networks, and even PSTN networks. And, applications must become peer devices of those networks only if the application needs to be voice-enabled.

On the surface, Web services interfaces to traditional telephony systems don't appear all that different from existing interfaces. All provide a collection of functions that computer-based applications can tap into; the only real visible difference is in the transport mechanisms. For example, platform-specific interfaces usually rely on local transports, such as plain old RS-232 serial connections, while network-oriented interfaces use network protocols to extend the interface across a data network. From a cursory examination, the same appears to be true for SOTA interfaces.

However, the real difference with SOTA interfaces is additional layering. Whereas traditional interfaces merely extend the telephony API outward but still require data-oriented applications to conform to that interface, SOTA interfaces provide an abstract service-control layer that is separate from the telephony layer. Moreover, the applica-tion interface looks and feels like any other application-oriented interface, and does not require connected applications to become telephony devices. Executing a telephony task is the same as any other kind of task—you can initiate a phone call just as easily as you can issue a database lookup, or anything else that is available through a parallel Web service, and you can do so without having to become a peer device on the telephony network.

In the SOTA model, a simple WSDL interface exposes a variety of telephony and communication functions to the application plane, thereby allowing applications to make use of whatever communication services are needed. Meanwhile, back-end systems do whatever is needed to make the task succeed, whether that be placing a call through an available telecommunications interface or communicating with an IVR system through a serial line. In other words, the SOTA model provides an abstract interface for applications to use that is entirely separate from the telephony and communications infrastructure; this is a significant difference from traditional interfaces that simply extend the telephony infrastructure outward without any separa-tion.But while this separation makes integrating control functions into data-oriented applications simpler, it's also important to recognize that it imposes a wall between the two worlds as well. In particular, applications do not have access to voice media through the SOTA interface—while they can manipulate calls all day long, they cannot actually participate in the call through this interface. If you need to have your users and/or systems take part in a voice call, you'll still have to bring them to the telephony network, (whether that be SIP/RTP, a POTS line, or whatever.

Given the interest both carrier and enterprise vendors have in virtualizing their equipment, it's not surprising that standards have emerged addressing the requirements of both communities. Within the enterprise, a collection of standards from the Euro-pean Computer Manufacturing Association (ECMA) focuses on call-management functions, already widely implemented in private exchange equipment typically found on enterprise networks. This includes Avaya's and Siemens' IP PBXes, which both use a subset of the ECMA collection for their Web services interfaces.

At the heart of the ECMA collection is CSTA as defined by ECMA-269, which describes a generalized API for telephony applications to use when communicating with other services and devices. ECMA-269 defines more than 130 functions, ranging from basic call-control tasks to operational features, such as putting a device into a "do not disturb" state, and it also describes ASN.1 encoding rules to use for those functions. However, these specifications are heavily focused on call- and device-management, and are missing many important non-telephony functions, such as presence and instant messaging

The CSTA specification was subsequently supplemented by ECMA-323, which de-fines XML-encoding rules as an alternative to the ASN.1 encoding rules, and also provides examples for use with different SOAP bindings. ECMA-323 was further supplemented by ECMA-348, which defines a standard WSDL definition for XML encoding and provides examples for use with SOAP/HTTP. However, we do not know of any vendors that implement ECMA-348. Avaya and Siemens both implement portions of ECMA-269 and ECMA-323, but that's as close as we've seen. Furthermore, many of the vendors we spoke to expressed the opinion that the ECMA standards are too complex for wide-scale adoption outside the vendor community. Given the broad adoption of CSTA however, we feel it's highly probable that these standards will continue to be adopted in some form.

The other significant set of existing standards are the Parlay collection of specifications, as published by the vendor consortium of the same name. Carriers frequently use the Parlay standards for their application interfaces, so the spec is more common in carrier-class systems and associated application-development platforms than in enterprise-class telephony gear. However, IT organizations that want to integrate public-network telephony devices and services into their unified applications will likely need to work with Parlay at some point.In addition. enterprise application platform vendors like BEA, IBM and Oracle al-ready support Parlay in their "carrier" product lines, and it's likely that one or more of those tools will bring some of that functionality into the corporate space, dragging the Parlay interfaces along with them.

The Parlay consortium developed the core Parlay specification in conjunction with the European Telecommunications Standards Institute (ETSI) and the Third Generation Partnership Project (3GPP), the oversight body for 3G digital-cellular technology. The core Parlay interfaces form the API layer of the 3GPP Open Service Architecture (OSA) and are generally referred to as Parlay/OSA APIs. These APIs are intended to be portable across multiple development environments, are documented for use within CORBA and JAIN environments, and include a WSDL definition.

Separately, there's also a subset specification, called Parlay/X, which describes a lighter set of APIs that are optimized for use with Web service interfaces in particular. Whereas Parlay/OSA provides asynchronous access to numerous low-level functions, Parlay/X provides synchronous access to a much smaller number of functions.

However, the Parlay/X dictionary maps quite well to the kinds of functions that corporate CTI developers might want to use, with high-level functions for call control, conferencing, presence, messaging, address book management and so forth; there are also functions that are more suitable for traditional carrier networks, such as functions to manage ring tones and billing information. All in all, the spec provides a fairly straight mapping to most of the services that an IT application developer might want. It would also be very nice to have a single Web CIT interface that worked with devices and users on local and carrier networks simultaneously. All this makes Parlay/X an interesting specification, even if it is not yet widely used in corporate telephony envi-ronments.

One problem with the Parlay model is that it is heavily layered. In those cases where OSA provides the network-native application interface (as is the case with 3G cellular networks), Parlay/OSA simply exists as a programmable service. But in other cases, Parlay support is typically provided by a gateway of some kind. Since Parlay/X repre-sents a subset of the Parlay/OSA APIs, it is also usually implemented as a gateway to the full Parlay/OSA system. This means that Parlay/X can be a gateway to Par-lay/OSA, which is itself a gateway to the native telephony network.Worse though is that there is no real support for Parlay/X in the IP PBX market—we do not know of any vendors who support it at the current time. However, if application vendors begin pushing into this space, or if enterprise IT developers start clamoring to expand their applications into cellular networks, there is some likelihood that the mar-ket will adapt to those demands.

About a dozen IP PBX vendors currently ship Web service interfaces to their systems, but most of those interfaces are aimed at administrative tasks, such as managing the users and phones attached to those systems, or configuring the PBX itself. We could only find three products capable of performing rudimentary call-management tasks through a Web services interface to their IP PBX: Avaya's Application Enablement Services, Sphere Communications' Sphericall and Siemens' HiPath 8000.

Cisco's line of IP PBX systems does not yet have the ability to manage calls through a general SOTA interface, although the company does make use of Web services for some configuration and administrative tasks. It is theoretically possible to use some of these interfaces to emulate a phone device in software and achieve rudimentary integration, but this is not documented and probably would not provide sufficient functionality. Furthermore, Cisco representatives we spoke to said that their short-term strategy was to continue consolidating various acquisitions around common local in-terfaces, while relying on third-party vendors like Metreos to provide additional development tools and services. But while Metreos does indeed have a compelling CTI development platform, it does not have a suitable SOTA interface as of yet. Meanwhile, BlueNote Networks says that it is developing a SOTA interface to its IP PBX line, but it would not allow us to examine the interfaces or its documentation.

SPHERICALL WEB SERVICES

Sphere Communications offers the most comprehensive SOTA interface that we saw, in the Web services SDK for it Sphericall IP PBX.

Sphere exposes Sphericall IP PBX services through a SOAP-compliant WSDL interface and provides a lightweight, synchronous messaging interface, optimized for end-user development. Developers can access third-party call-control, conferencing, call-recording, presence and status, instant messaging, number lookup, and call-history lookups. Sphericall Web services also provide some administrative and event notification functions. Multiple Sphericall PBX systems can be clustered, and third-party devices can also be connected through the TAPI and SMDI local interfaces.We found the call-handling, conferencing, call-recording and IM/presence functions more than suitable for most purposes. Moreover, Sphericall is the only Web services offering that has a sufficiently comprehensive interface at this time; none of the other implementations that we looked at were as broadly usable.

Sphere has also implemented the most complete session management model, with support for asynchronous bi-directional communications over the SOAP channel. Sphere uses semi-permanent session identifiers to maintain long-term state across transactions, coupled with a client-side "fetchEvents" function. In this model, the client opens a connection, ask for any new events that are associated with the session, and then enters a timeout condition while waiting for event messages to arrive. If no notifications are received within a specified interval, the client will eventually timeout, and then reconnect with the server to restart the process. Cumulatively, this provides for bi-directional asynchronous session-level event messaging over SOAP, something none of the other implementations we saw offers.

A couple of other interesting features in the Spherical Web services implementation are worth noting. For one, presence and status information can be set through the Web services interface, meaning that you can have your application change the user's call status automatically--for example, changing an operator's status to reflect the fact that she is talking to a customer whenever she releases a call from an incoming queue. Also, the Sphericall IP PBX has a feature called "forwarding profiles," which allow for user-defined call routing, and those features are also partially exposed through the Web services interface.

Finally, Sphere provides a simulation server that can be used for offline development and testing; this will prove extremely useful for most in-house developers. Overall, the Spherical Web services interfaces is pretty comprehensive, and is by far the most com-plete offering we saw.

AVAYA APPLICATION ENABLEMENT SERVICESAvaya's Web service interfaces are part of its Application Enablement Services offering, which is an add-on gateway to its IP PBX products. Application Enablement Services has both a high-level first-generation Web services interface based on WSDL and SOAP, and an XML-over-TCP interface based on CSTA and ECMA-323.

The product also has a handful of classic interfaces--including JTAPI, TSAPI, and CSTA over ASN.1--as well as its own proprietary interfaces, all of which are implemented on the public side of the gateway for applications to tap into. On the back side, the gateway uses Avaya's proprietary CLAN protocol to communicate with the Avaya IP PBX, which in turn implements the local signaling protocol(s) needed for the canonical telephony functions to work.

The Application Enablement Services Web services interface provides three broad sets of functions: interfaces for managing user accounts and settings, interfaces for managing the system and devices, and interfaces for managing call-related activities. At the present time, the range of telephony-related interfaces is pretty small, with less than a dozen functions. These include functions to create a new call, answer an incoming call, basic conferencing and transfers, and basic session management tasks. There are no functions for presence or instant messaging, voice-recording, call-history lookups, or much else.

However, the XML-over-TCP interface, which Avaya refers to as the Communications Manager API (CMAPI) XML SDK, is much more comprehensive than the SOAP interface, and offers significantly more functionality. For example, the current XML SDK contains 238 CSTA XML Schema Definition (XSD) files and 52 Avaya-specific XSD files, which cumulatively provide a fairly large number of call-control and device-management features, as well as ancillary features like call-recording and playback. However, there are no functions for presence or instant messaging in the XML SDK as of yet.

The lack of an interface for presence and IM features is probably the biggest hole in the Avaya system, although to the company's credit, Application Enablement Services is still a relatively early effort, and this hole will almost certainly be filled sooner rather than later. The overall weakness of the existing WSDL/SOAP interface is somewhat problematic, although the CSTA-derived XML-TCP interface is quite robust, and is indicative of the features that are likely to be evolved into the SOAP interface.SIEMENS HIPATH 8000

Siemens' HiPath 8000 is billed as a carrier-grade, software-based IP PBX solution, and is generally sold into very-large networks. Siemens also has a line of add-on products that it sells for the HiPath 8000 platform, including the OpenScape presence and collaboration platform, the Xpressions unified messaging system, and the ProCenter call-center platform. Currently, these layered products use CSTA or SIP to talk to the IP PBX, but Siemens says that its plan is to eventually provide SOA interfaces that can support all of these products directly.

The current HiPath 8000 v2 software release, which just started shipping, is some-what below that target, but is a good indication of the company's strategic direction. At the moment, the HiPath 8000 provides some basic administrative interfaces for device and user configuration (these were also present in the v1 release), and also has some rudimentary call-management functions for call setup and disconnect, call-history lookups, and address-book tasks.

However, Siemens says that all of the HiPath 8000 internal functions are already represented in XML (some of which is shared with ISV partners for development purposes). Siemens also continues to add the low-level CSTA and XML interfaces into high-level WSDL. In particular, the 2.1 release, due out this summer, is likely to have advanced call-control functions, and may also include presence and messaging interfaces, although Siemens would not commit to product details or a release sched-ule.

Overall, the HiPath 8000 architecture seems to be well-designed for maximum scalability, given that it already uses SIP and CSTA for most of its internal functionality. If Siemens is able to package these functions into a usable WSDL/SOAP interface, it will have a very strong, standards-driven offering in this space.

It's important to recognize that this area is in its early stages, and the industry as a whole is a long way from seeing this kind of promise in a widely available, standardized form. In particular, different functionality and implementation models are still being fleshed out, and several more years of work will be needed before the industry can coalesce around a set of practical and functionally delineated standards.Vendors and standards bodies are pursuing their own functionality targets. Most first-wave products from IP PBX vendors focus on high-level call-control functions that are abstracted from the underlying technology, with the intention of expanding into related technologies over time. Meanwhile, the current crop of standards are generally focused on their particular segments of the telecom industry, and it's only through serendipity if those standards also provide the kind of functionality that's needed by broader markets.

All told, we find it highly unlikely that a single monolithic standard will emerge any-time soon that addresses all of the desired functions, especially one that has a universally applicable level of granularity. Instead, it's most likely that multiple specifications will evolve to address specific functionality at different granularities over the next few years, though some eventual consolidation around a handful of standards is also likely to occur at some point.

Simply put, things are going to get a lot more complicated before they get simpler.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights