The World Wide Web Consortium (W3C) introduced XML in 1996, and the standard's obvious assets -- structure with flexibility, extensibility, adaptability, simplicity and platform-independence -- soon endeared it to Web developers.
More recently, XML has proved itself a box-office draw as well -- after all, what good is stardom if you can't bring in the bucks? Looking to get in on the trend toward B2B exchanges, industry heavyweights -- and the United Nations -- are pushing their visions of the future. In this Special Report, we present a comprehensive look at XML, from a history of the framework to a primer on the standards now in play. We also go in depth, presenting detailed examples of proper syntax, and look ahead at new implementations such as MathML (Mathematical Markup Language), XForms and Microsoft's SOAP (Simple Object Access Protocol).
In addition, we look at exciting projects from Microsoft, Oracle and the open-source community; these projects combine XML with databases to deliver e-business facts and figures dynamically in a form tailored to the requesting device. So settle in with your popcorn and enjoy the story of a standard that is changing the face of the Internet.
XML: A Metastar is Born
One meaning of the word meta is "beyond, transcending, encompassing." Thus XML, a framework that lets users create their own tags to form a markup language, is accurately defined as a metalanguage.
The initial XML activity started in mid-1996 and culminated in 1998 with a major initiative known as the XML 1.0 recommendation. Discussion and continual work on the recommendation resulted in a second edition, released in October 2000. This was not a new version of XML, it was merely a revision of the first standard based on the errata and discussion compiled over the years. Thus, the standard is still known as XML 1.0.
XML is a vibrant and growing standard, so keeping track of all the changes to the main and related specifications can be difficult. Also, some initial undertakings fell by the wayside because of a lack of interest, while other efforts are developing at varying speeds. We'll explain the specifications as they stand at the time of this writing, but bear in mind that they are still changing. A handy, albeit commercialized, Web site to help keep track of this information is www.xml.com. The W3C Web site has a more comprehensive collection of information, but it is typically expressed in language not easily understood by the casual reader.
HTML Killer?
In its infancy, XML was seen as an HTML killer. However, not only did it fail to replace HTML, XML is being used to express HTML 4.0 in a standard known as XHTML (Extensible HTML). One of the main goals of XHTML is to eliminate the variations in dealing with HTML among browsers (see "XHTML: Crossroads of HTML and XML" for more on XHTML). In addition, after introduction of the standard in 1998, two additional recommendations were released, the namespaces in XML (www.w3.org/TR/1999/REC-xml-names-19990114/) and the style-sheet linking recommendation (www.w3.org/TR/xml-stylesheet/).
The namespaces recommendation, which defines a mechanism for qualifying elements and attribute names, was essential to allow hierarchical usage of XML documents. The closest example in the programming world is scoping. For example, if you are writing a C program and define a global variable called x and then define a variable local to your function that is also called x, only one of the variables is accessible at any given time, depending on your scope.
Namespaces eliminate this problem in XML by letting you specify every element name as a fully qualified name composed of a namespace prefix, a colon and the local name. As in programming, namespaces can be scoped. If not fully specified, the default at the immediately outer scope applies, then the next outer scope and so forth. The code below shows an example borrowed from the namespace recommendation area of the W3C Web site.
<?xml version="1.0"?>
<!-- initially, the default namespace is "books" -->
<book xmlns='urn:loc.gov:books'
xmlns:isbn='urn:ISBN:0-395-36341-6'>
<title>Cheaper by the Dozen</title>
<isbn:number>1568491379</isbn:number>
<notes>
<!-- make HTML the default namespace for some commentary -->
<p xmlns='urn:w3-org-ns:HTML'>
This is a <i>funny</i> book!
</p>
</notes>
</book>
The first line is required and specifies the use of the XML 1.0 recommendation. The second line is a comment, as are all lines enclosed by the <!-- and --> tag pair. The <book>, <title> and <notes> tags are tags that we define. After all, defining our own tags is the main reason for using XML. The xmlns keyword is used to specify the namespace, which is used at the <book> and <p> tags. The <isbn:number> tag fully specifies a qualified name as explained above, with isbn being the namespace prefix.
The other major recommendation that came out of the W3C at the outset addressed style-sheet linking. This is a short recommendation with a huge impact. To understand it, you need to understand CSS (Cascading Style Sheets).
In HTML you can use several elements for formatting output. For example, to make a piece of text appear as bold, you would enclose it within the <b> and </b> tags. This approach has many deficiencies, one of which is that you cannot specify style templates that apply to Web pages. If, for example, your headers are displayed using <h1> tags and you decide to switch them to <h2> tags, you'll have to edit every occurrence of the header to change the surrounding tags. Not having style templates also makes it harder to maintain a consistent look and feel throughout a Web site.
To solve these and similar problems, the CSS specification was born.
A style sheet is a formatting template that can be applied to HTML pages. Although defining style sheets can be a very involved process, it's worth the effort. One of the better resources for information on style sheets is www.hotwired.lycos.com/webmonkey/authoring/ stylesheets/.
The W3C's recommendation made the concept of style sheets available to XML documents. This is important because, unlike HTML, XML does not have any formatting elements, and without style sheets, all XML documents will look more or less the same. The XSL (Extensible Style Language) specification was created to address this.
Each XSL page includes a set of template rules, with each rule matching one or more XML tag. For every rule, a specific style is applied to the matched tag's content. An XSL processor parses XML documents to try to find a matching template rule. When a match is found, the instructions in the body of the rule are applied to produce HTML output.
Most XSL documentation is written in what can best be described as cryptic language; thus, a sample code piece will go a long way toward explaining the concept. The code below shows a sample XSL fragment. The text in the /* */ tags is extra clarification and is not part of the XSL fragment.
/* Basic and required Header Information */
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/
1999/XSL/Transform'>
/* The XSL processor will look for the <bold> tag in the XML source.
If found, the XSL processor should take the content inside the
<bold></bold> tag pair and output it surrounded by <p><b></b></p> tag pairs. */
<xsl:template match="bold">
<P><B><xsl:value-of select="."/></B></P>
</xsl:template>
/* Similar to the <bold> tag above, the <red> tag is looked for
and, if found, replaced with the <P style="color:red"></P> tags. */
<xsl:template match="red">
<P style="color:red"><xsl:value-of select="."/></P>
</xsl:template>
/* Done with the Style Sheet */
</xsl:stylesheet>
What the code fragment tells us is that the XSL processor will look for the <bold> tag in the XML source. If the tag is found, the XSL processor should take the content inside the <bold></bold> tag pair and output it surrounded by a <p><b> tag pair. Similar logic applies to the <red> tag. A sample XML document that can use the XSL code above may look something like that shown below.
<?xml version="1.0"?>
<xslIntro>
<bold> A Test Source XML in bold. </bold>
<red> and some more in the red color </red>
</xslIntro>
The HTML output will, as expected, include the proper formatting tags. The output tags can be normal HTML tags or CSS syntax tags.