home
NEWS       BLOGS       FORUMS       NEWSLETTERS       RESEARCH       EVENTS       DIGITAL LIBRARY       CAREERS  
Network Computing Network Computing Powered by InformationWeek Business Technology Network

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers



  W O R K S H O P

XHTML: Crossroads of HTML and XML

November 13, 2000
By Ahmad Abualsamid

By now we all know what HTML is. The collection of tags was made famous by the Web, or was it the other way around? Tim Berners-Lee is credited with inventing HTML and the Web, but they were not completely original inventions, as the ideas of hypertext and metatags abounded in Gopher and document-management systems that were based on SGML (Standardized General Markup Language).

HTML, however, had the advantage of friendly and straightforward browsers that came of age at the perfect time. HTML was also a snap to use, and HTML browsers were very forgiving, letting HTML coders get sloppy without consequence. Even some HTML editors were not any better and generated improper code, but nobody cared because the browsers accepted pretty much any code. With the proliferation of "bad" HTML code, the browsers became bloated with code just to accommodate the sloppiness.

The X Factor

Then XML (Extensible Markup Language) came into existence, in part to enforce strict coding practices and to ensure both portability and compatibility for marked-up documents. At the time, many people pronounced XML to be HTML's killer and predicted we would see nothing but XML documents on the Web in the near future. Because it lacked browser support, however, XML did not have enough momentum to displace HTML. And XML parsers were inconsistent. Instead, XML shined on the server side, where applications can be coerced or picked to work together.

We now have a new standard, XHTML, that stands a good chance of making the Web a better place. XHTML in its simplest definition is HTML 4.0 expressed using an XML DTD (Document Type Definition). In the big picture, XHTML paves the path for the modularization of code snippets (dare we say "applets"?) that are loaded dynamically to handle various XHTML modules. The goal is to have thin Web appliances that can handle XHTML code uniformly and load only the required modules into memory. Without having to handle code sloppiness and by using modularization, real thin browsers could be created to fit just fine on Web appliances and parse and understand code uniformly.

XHTML 1.0

In January, the World Wide Web Consortium (W3C) published XHTML 1.0, proclaiming the following advantages:

  • Extensibility: The W3C recognizes the definition of the HTML language leaves a lot to be desired. However, the process of extending the language is painful and lengthy, because it must go through committees and discussions. As with any XML application, XHTML is extensible by definition.

  • Interoperability and portability: Mainly because rigorous coding standards are enforced on XHTML programmers, the browsers know exactly what to look for and to expect, thus making it much easier for different browsers to handle code in the same manner.

    Although noble, these goals are hard to attain. For one thing, extensibility did not propel XML to the top of the food chain to displace HTML, so why should XHTML be different? For another, different browsers behave differently no matter what. For example, try to figure out the width of a table in Microsoft Internet Explorer (IE) and then in Netscape Navigator. The browsers measure widths differently, making Web pages inconsistent. Coding standards alone will not change these browsers' different implementations.

    The Differences

    To convert to XHTML, look at the differences between it and HTML:

  • Tags and attributes must be in lowercase. What Unix users knew for a long time is now enforced on all: case matters. XHTML is an application of XML; thus, it is case-sensitive. Therefore, <P> and <p> are different, the uppercase being incorrect. (Why aren't uppercase characters allowed? The character set of XML is ISO 10646; thus, element names can contain non-ASCII characters. For most of those, there are no rules for case conversion or collating. Worse yet, any rules that may exist have to be updated when those character sets are extended, which is frequently at this stage.) For example, many HTML editors automatically generate code like this:

    This is the end of a paragraph.

    <P>

    This is a new paragraph.

    This is wrong on two accounts. First, the element has to be in lowercase. Another reason it is wrong brings us to the second difference between HTML and XHTML:

  • All XHTML elements must be closed. Thus, the proper form of our example in XHTML would be:

    This is the end of a paragraph.

    <p>This is a new paragraph.<p>

    And all this time you thought <p> was a standalone HTML tag. There aren't many of those, but the few standalone tags in HTML also need to be closed. This is done by adding a trailing / character. A <br> tag, for example, is now:

    This is a line in my document that has a break at its end. <br/>

    However, keep in mind that this may break some browsers. Thus, you may want to add a space between the tag and the slash: <br />, which will work under today's browsers and still be XHTML-compatible.

  • The ID attribute replaces the name attribute. In HTML, the name attribute identifies several elements so you can script them. In CSS (Cascading Style Sheets), the ID attribute is used to access HTML elements, such as form elements. To make sure your code works with today's browsers and with existing scripts, you can use both a name and an ID attribute.

  • Attribute values must be quoted, and no minimization is allowed. A common practice among HTML coders is to leave the quotes out when specifying values for elements, especially if the values are numeric. Many HTML editors do that too. Those editors actually take the quotes off even if you manually put them around numeric values. This is not valid in XHTML. Some attributes, such as "checked," could be minimized when using several browsers. This also is not valid. You can't have a dangling attribute:

    <input id="acheckbox" name="acheckbox" checked /> is incorrect.

    <input id="acheckbox" name="acheckbox" checked= "checked" /> is correct.

  • XHTML documents have some mandatory elements. You no longer can have documents that contain nothing but text. An XHTML document needs a DOCTYPE definition, which defines the type of the document for validation purposes; an <html> tag pair; a <head> tag pair; a <title> tag pair contained inside the <head> tags; and a <body> tag pair. The minimal XHTML document looks like this:

    <!DOCTYPE html>

    <html>

    <head>

    <title> This is a minimal XTHML document </title>

    </head>

    </head>

    <body>

    </body>

    </html>




  • PAGE: 1 I 2 I NEXT PAGE
     





    Ready to take that job and shove it?

    Function:

    Keyword(s):

    State:
    SPONSOR
    RECENT JOB POSTINGS
    CAREER NEWS
    Aneesh Chopra is looking to other CIOs to advise him on fleshing out a more detailed agenda to best serve the president's IT agenda.

    IT spending is expected to decline by 3.8 percent in 2009 according to Gartner.










    2009 IT Salary Survey: Meager Raises, Solid Prospects
    Though raises are notably smaller than a year ago, and job security’s shrinking, IT careers are looking safer than many others in this economic downturn. Get all the findings in InformationWeek's 2009 IT Salary Survey. Available FREE for a limited time.
     
    ROLLING RIGHT ALONG
    Follow key Network Computing Reviews from conception to completion. This Week: Holistic APM.



    Network Computing Reports Emerging Enterprise Podcast Series: Secrets to Success








    TechSearch


    Microsite of the Week


    Powerful Information at Your Fingertips



    Techweb
    Informationweek Business Technology Network
    InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
    Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
    Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
    space
    TechWeb Events Network
    InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
    Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
    space
    Light Reading Communications Network
    Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
    Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
    space
    Financial Technology Network
    Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
    space
    Microsoft Technology Network
    MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
    space


    App Infrastructure   |   Messaging & Collaboration   |   Network & Systems Mgmt   |   Network Infrastructure   |   Security  |   Storage & Servers   |   Wireless   |   Enterprise Apps
    About Us  |  Contact Us  |  Site Map  |  Technology Marketing Solutions  |  Advertising Contacts  |   Briefing Centers
    Copyright © 2009  United Business Media LLC  |  Privacy Statement  |  Terms of Service