XML: Revenge of the Nerds
April 5, 1999
The reality is that most interorganization data-sharing projects begin with two pasty-faced men arguing about how many fields are in a mailing address. Then an overworked, underimaginative project manager asks for five times more documentation than the underlying code will ever require. (I get to say those things because at times, I've been both.)
For years we have heard about the critical, cutting-edge standards that are shaping the networking world. IPSec, IPv6, OSPF and LDAP have received lots of ink in the press and plenty of mindshare from the vendor community. This attention to emerging protocol standards leaves 99 percent of all enterprise IT departments wondering what's in it for them and what difference any emerging standard will have on the systems they are planning to build. Obviously, the enterprise benefits as new protocols come online and drive down cost of ownership, maximize scalability, increase security and improve management. However, there is very little in the protocol development and implementation lifestyle in which the enterprise can participate. Usually it boils down to a vague sense of "This product was brought to you by Protocol X."
There's a Shift Ahead That, at least, is changing for the better. Last month in this space I bemoaned the lack of true Internet-based shopping protocols and warned that merely paying for goods on the Web does not a digital economy make. The future of e-commerce will depend upon protocols such as IOTP (Internet Open Trading Protocol). Let's step back for a second and examine the foundation on which IOTP is based--XML (Extensible Markup Language)--and what it means to you.
Whether open or proprietary, protocols traditionally have been excellent at delivering arbitrary payloads of user data and always will be; there is no real room for innovation here. The ascension of IPv4 was at the expense of OSI, as was proven by IPv6. Alternatively, specialized protocols have transcended expectations of solving a particular problem--take SNMP and network management.
One basic problem is that there is no way to describe the user payload. SQL and ODBC are as close as conventional development has come. SQL is great. It has proved that most, if not all, business problems can be described in a relational database. However, we still don't see organizations exchanging schemas in anything approaching a timely fashion. So while the relational algebra is sound and ODBC has smoothed over many problems with client programming, knowledge of the actual schema and any type of security context is completely missing from conventional development.
Until now, there have been only two alternatives to SQL: The first is ad hoc record layouts (the old reliable delimited, or fixed-length, layouts). The problem with ad hoc conventions is that they are not self-documenting, and they rely on the organizational memory of the project team that created them. Neither characteristic promotes data sharing among organizations, as the demand for shared data becomes greater than ever. The other alternative is a formal grammar based on ASN.1 (Abstract Syntax Notation). ASN provides a universal data exchange language and has been the basis for DCE RPCs and SNMP content for years. However, it has been completely ignored as part of the application development toolkit. It is simply beyond the ability of mainstream developers and has been ignored by tool vendors.
ASN seems destined to remain an embedded data exchange tool used exclusively by vendors. But just as routers and switches have a requirement to exchange specific information about configuration and status without a priori knowledge of who will be receiving and acting on that information, the Internet and expectations of networked businesses demand that the enterprise has the same requirements to exchange information without reinventing the wheel each time.
XML is the logical candidate for intercompany data exchange. And you need to develop the expertise in-house to implement it. We simply cannot afford the ongoing overhead and delays of big efforts such as EDI, and the inflexibility and lack of added value for conventional import/export data flows.
There are two reasons XML should be your default data format: It's not as difficult as you think it is, and it is the best alternative available that "future-proofs" and "partner-proofs" your data. XML does this by building on HTML.
All in the Family HTML and XML share a common intellectual ancestor in SGML (Standard General Markup Language). In fact, a basic XML document is not much different from an HTML document, albeit with very strict grammar. Unlike the intricacies of ASN.1, if you can comprehend HTML, you can comprehend XML. Like HTML, all XML documents are text-based and can be easily manipulated by any of the applications you know and love.
Like HTML, XML takes advantage of the underlying protocol layers for transport, reliability and security. However, besides expanding on HTML's ability to format and publish information for display, XML has been designed for extensibility, which can be used to develop data structures of arbitrary complexity; document databases, astronomical instruments and push channels all can be defined using XML. The time for thousands of enterprisewide import/export applications has come and gone.
Let's return for a moment to the concept of shopping protocols for e-commerce. The problem is how to describe precisely the entities involved: the product, the consumer, the vendor, and the terms and conditions. Moreover, if you want to compare product documents from two sources, how would you accurately perform the comparison on the multiple variables that describe the product?
If the Web can publish price descriptions on any potential product and XML can provide the accompanying structure, then RDF (Resource Description Framework) guides you on how to automate the communication of that data between e-commerce sellers, buyers and everyone in the middle. By definition, an XML document is self-describing and self-validating, as XML's DTDs (Document Type Descriptions) provide the specific type of information needed to parse the enclosed information. By describing XML data with Class and subClassOf tags, the RDF provides the mechanism to map the data directly to an object-oriented language such as Java, C++ or (heaven forbid) Visual Basic.
An e-commerce provider can thereby publish its catalog in explicit, deterministic format that can be queried and compared with that of other suppliers. When combined with explicit digital signatures and time stamping, XML provides the technology that is the crux of implementing a networkwide trading system.
While sites like biddersedge.com have started to collect, parse and present sale information from multiple auction sites, they have to rely on parsing HTML intended for presentation, not for data exchange. It would make the job a lot easier if all auction sites published in XML.
Finally, the most fascinating and hopeful thing about XML is that it has been used for advanced applications: NASA uses it to remotely control and harvest data from telescopes and deep-space cameras in the South Pole and mounted in Boeing 747s. And it has also been used to catalog the database of RFC documents. In fact, Marshal Rose, chief of protocol at Invisible Worlds, wrote an Internet Draft describing how to use XML's DTD to write Internet drafts and RFCs. At the same time XML is within the range of the rank-and-file programmer. Witness Ed Tittel's XML for Dummies book, which illustrates exactly how approachable XML is for average folks, and contains more useful information on XML than its title implies.
XML is no longer a bleeding-edge technology. The browser you are using today supports it. Vendors from Allaire to webMethods support it. Major database companies, including Oracle, Sybase and Informix are now aware that XML is more than a blip on the radar screen. Expect products from them to publish the results of queries as XML and import XML into their tables.
In the words of Rose, "In the '80s, RFC 822 was the universal data exchange language. In the '90s, we took a step sideways with ASN.1. But now it looks like XML will be the data representation language for the next millennium."
Brian Walsh is the founder of bwalsh.com, a Portland, Ore., consulting firm specializing in Internet and client/server product strategies, development and testing. Send your comments on this column to him at firstname.lastname@example.org.