Our series takes a look at Chapter 11: XML Tools for Information Appliances.
NanoXML
A NanoXML document is a tree of nanoxml.XMLElement objects. These correspond to the org.w3c.dom.Node interface in the DOM specification.
NanoXML does not implement the DOM interfaces. You build and retrieve document contents through a proprietary API, but an optional SAX 1.0 component exists for document retrieval. This API is covered in this chapter.
Originally written in April 2000, NanoXML has gone through a few iterations. The current release is 1.6.8. The next major release of NanoXML will be 2.0, and it is scheduled to be available in July 2001. The current beta release is promising, although it seems to have lost compatibility with 1.x releases. We will discuss both releases since they differ significantly in library size and features.
The web site for NanoXML is http://nanoxml.sourceforge.net/.
The source code is available under an open source license. The site is maintained by Marc De Scheemaecker, who is the author of the package. I have found him to be very responsive to support questions.
If your target platform is the Java KVM, the latest NanoXML won't do the job because it has dependencies on classes that are not included in the standard Java KVM.
Without a doubt, the greatest feature of NanoXML is also its smallest - its JAR file size. Its JAR file size is second only to MinML, but the size depends upon which version you use and whether or not you choose the optional SAX component. You can get away with XML parsing in as little as 6047 bytes!
Unfortunately, NanoXML suffers from some performance issues and memory usage problems, which we will discuss.
Current Release – Version 1.6.8
What's Supported, What's Not Supported, and What's
Optional
Version 1.6.8 is a non-validating parser. Any reference to a DTD or XML Schema is ignored, although there is support for entity expansion.
Mixed content isn't supported, for example:
<Request>ItemDetail
<ItemId>553</ItemId>
</Request>
will result in an incorrect internal document representation. XML namespaces aren't supported directly, although they won't cause any parsing difficulties. This SOAP envelope, for example, is parsed without problems:
<SOAP:Envelopexmlns:SOAP=
'http://schemas.xmlsoap.org/soap/envelope/'
xmlns:xsi=
'http://www.w3.org/1999/XMLSchema-instance'
xmlns:xsd='http://www.w3.org/1999/XMLSchema'
xmlns:SOAP-ENC='http://schemas.xmlsoap.org/soap/encoding/'
SOAP:encodingStyle=
'http://schemas.xmlsoap.org/soap/encoding/'>
</SOAP:Envelope>
The element <Envelope> is stored literally as <SOAP:Envelope> with no comprehension of the SOAP namespace prefix. It also contains five attributes: xmlns:SOAP, xmlns:xsi, xmlns:xsd, xmlns:SOAP-ENC, and SOAP:encodingStyle. Since document validation isn't supported, namespace URLs are not followed.
Comments are skipped by the parser and not stored internally. The first processing instruction in an XML document:
<?xml version="1.0" encoding="UTF-8"?>
is skipped and also not stored internally. Any subsequent processing instructions will throw a nanoxml.XMLParseException.
A SAX-compatible API can optionally be used with parsing (see Package nanoxml, page 586). If the SAX API is not used, retrieval of elements and attributes is through a completely proprietary API (see public class XMLElement, page 587).
Documents can also be built from scratch and written to any Writer object, or they can be modified using the addChild() and removeChild() methods.
The JAR file size of this release, excluding the optional SAX component, is 6047 bytes. Adding SAX functionality brings the library up to 8618 bytes. But this small size doesn't come without a price. As with most parsers reviewed in this chapter, NanoXML is not XML 1.0 compliant.
You have two choices for parsing: a DOM-style or SAX 1.0 interface. Both choices are multiple-pass parsers, iterating over the same document more than once in order to build an internal representation (this is true even of the SAX interface because it is built on top of the DOM-style interface). This negatively affects performance. Finally, even if the SAX parser is used, an entire document tree is built and kept in memory until the parser object is garbage collected. Not only does this lead to a large memory footprint when parsing large documents, but depending upon the garbage collection mechanism used by your VM, it may severely fragment the heap and prevent subsequent object creation. We discuss this issue in the Java KVM section (page 575).
Parsing large documents with this version of NanoXML may be inappropriate for lightweight clients. However, for relatively small documents, it could be just the thing.