home news blogs forums events research newsletter whitepapers careers


UBM Network Computing
TechWeb
HOT PICKS

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers



  W O R K S H O P

XHTML: Crossroads of HTML and XML

November 13, 2000
By Ahmad Abualsamid

By now we all know what HTML is. The collection of tags was made famous by the Web, or was it the other way around? Tim Berners-Lee is credited with inventing HTML and the Web, but they were not completely original inventions, as the ideas of hypertext and metatags abounded in Gopher and document-management systems that were based on SGML (Standardized General Markup Language).

HTML, however, had the advantage of friendly and straightforward browsers that came of age at the perfect time. HTML was also a snap to use, and HTML browsers were very forgiving, letting HTML coders get sloppy without consequence. Even some HTML editors were not any better and generated improper code, but nobody cared because the browsers accepted pretty much any code. With the proliferation of "bad" HTML code, the browsers became bloated with code just to accommodate the sloppiness.

The X Factor

Then XML (Extensible Markup Language) came into existence, in part to enforce strict coding practices and to ensure both portability and compatibility for marked-up documents. At the time, many people pronounced XML to be HTML's killer and predicted we would see nothing but XML documents on the Web in the near future. Because it lacked browser support, however, XML did not have enough momentum to displace HTML. And XML parsers were inconsistent. Instead, XML shined on the server side, where applications can be coerced or picked to work together.

We now have a new standard, XHTML, that stands a good chance of making the Web a better place. XHTML in its simplest definition is HTML 4.0 expressed using an XML DTD (Document Type Definition). In the big picture, XHTML paves the path for the modularization of code snippets (dare we say "applets"?) that are loaded dynamically to handle various XHTML modules. The goal is to have thin Web appliances that can handle XHTML code uniformly and load only the required modules into memory. Without having to handle code sloppiness and by using modularization, real thin browsers could be created to fit just fine on Web appliances and parse and understand code uniformly.

XHTML 1.0

In January, the World Wide Web Consortium (W3C) published XHTML 1.0, proclaiming the following advantages:

  • Extensibility: The W3C recognizes the definition of the HTML language leaves a lot to be desired. However, the process of extending the language is painful and lengthy, because it must go through committees and discussions. As with any XML application, XHTML is extensible by definition.

  • Interoperability and portability: Mainly because rigorous coding standards are enforced on XHTML programmers, the browsers know exactly what to look for and to expect, thus making it much easier for different browsers to handle code in the same manner.

    Although noble, these goals are hard to attain. For one thing, extensibility did not propel XML to the top of the food chain to displace HTML, so why should XHTML be different? For another, different browsers behave differently no matter what. For example, try to figure out the width of a table in Microsoft Internet Explorer (IE) and then in Netscape Navigator. The browsers measure widths differently, making Web pages inconsistent. Coding standards alone will not change these browsers' different implementations.

    The Differences

    To convert to XHTML, look at the differences between it and HTML:

  • Tags and attributes must be in lowercase. What Unix users knew for a long time is now enforced on all: case matters. XHTML is an application of XML; thus, it is case-sensitive. Therefore, <P> and <p> are different, the uppercase being incorrect. (Why aren't uppercase characters allowed? The character set of XML is ISO 10646; thus, element names can contain non-ASCII characters. For most of those, there are no rules for case conversion or collating. Worse yet, any rules that may exist have to be updated when those character sets are extended, which is frequently at this stage.) For example, many HTML editors automatically generate code like this:

    This is the end of a paragraph.

    <P>

    This is a new paragraph.

    This is wrong on two accounts. First, the element has to be in lowercase. Another reason it is wrong brings us to the second difference between HTML and XHTML:

  • All XHTML elements must be closed. Thus, the proper form of our example in XHTML would be:

    This is the end of a paragraph.

    <p>This is a new paragraph.<p>

    And all this time you thought <p> was a standalone HTML tag. There aren't many of those, but the few standalone tags in HTML also need to be closed. This is done by adding a trailing / character. A <br> tag, for example, is now:

    This is a line in my document that has a break at its end. <br/>

    However, keep in mind that this may break some browsers. Thus, you may want to add a space between the tag and the slash: <br />, which will work under today's browsers and still be XHTML-compatible.

  • The ID attribute replaces the name attribute. In HTML, the name attribute identifies several elements so you can script them. In CSS (Cascading Style Sheets), the ID attribute is used to access HTML elements, such as form elements. To make sure your code works with today's browsers and with existing scripts, you can use both a name and an ID attribute.

  • Attribute values must be quoted, and no minimization is allowed. A common practice among HTML coders is to leave the quotes out when specifying values for elements, especially if the values are numeric. Many HTML editors do that too. Those editors actually take the quotes off even if you manually put them around numeric values. This is not valid in XHTML. Some attributes, such as "checked," could be minimized when using several browsers. This also is not valid. You can't have a dangling attribute:

    <input id="acheckbox" name="acheckbox" checked /> is incorrect.

    <input id="acheckbox" name="acheckbox" checked= "checked" /> is correct.

  • XHTML documents have some mandatory elements. You no longer can have documents that contain nothing but text. An XHTML document needs a DOCTYPE definition, which defines the type of the document for validation purposes; an <html> tag pair; a <head> tag pair; a <title> tag pair contained inside the <head> tags; and a <body> tag pair. The minimal XHTML document looks like this:

    <!DOCTYPE html>

    <html>

    <head>

    <title> This is a minimal XTHML document </title>

    </head>

    </head>

    <body>

    </body>

    </html>




  • PAGE: 1 I 2 I NEXT PAGE
     





    Ready to take that job and shove it?

    Function:

    Keyword(s):

    State:
    SPONSOR
    RECENT JOB POSTINGS
    CAREER NEWS
    Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

    Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.










    InformationWeek U.S. IT Salary Survey 2008
    Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Purchase Today: $299
     
    ROLLING RIGHT ALONG
    Follow key Network Computing Reviews from conception to completion. This Week: Holistic APM.



    Network Computing Reports Emerging Enterprise Podcast Series: Secrets to Success








    TechSearch


    Microsite of the Week


    Powerful Information at Your Fingertips



    App Infrastructure   |   Messaging & Collaboration   |   Network & Systems Mgmt   |   Network Infrastructure   |   Security  |   Storage & Servers   |   Wireless   |   Enterprise Apps
    About Us  |  Contact Us  |  Site Map  |  Media Kit  |   Briefing Centers
    Other Techweb Sites:   InformationWeek Reports  |  Intelligent Enterprise  |  Light Reading  |  InformationWeek
    Techweb  |  Dark Reading  |  Network Computing Germany  |   Byte & Switch  |  bMighty  |  Small Biz Resource  |  InformationWeek Analytics
    Copyright © 2008  United Business Media LLC  |  Privacy Statement  |  Terms of Service  |  Your California Privacy Rights