28 February 2001 (Last Updated: 1 July 2002)
Contents
- Introduction: What XHTML is and the different XHTML document types.
- General Rules: The bullet-list to move straight to XHTML.
- Attributes in XHTML: How attributes are specified in XHTML.
- XHTML and tables: Tables are also different in XHTML.
- XHTML and images: Using images in XHTML.
- XHTML and Javascript: About changes to be made to scripts.
- XHTML and CSS: Changes to be made to use stylesheets with XHTML.
- Element Prohibitions: The syntax restrictions imposed by XHTML.
- Resources on the Web: Helpful Links.
A Brief Introduction to XHTML
Extensible HyperText Markup Language (XHTML) is a reformulation of HTML 4.0 to make it XML based. This tutorial deals with the changes to be made to convert HTML documents to valid XHTML. The article is prepared with a view to help and guide you through the conversion process.
The W3C, which is the organization that co-ordinates standardisation of Web protocols, has defined three types of XHTML documents. This is based on the XML Document Type Definition (DTD) that is used by the document. The XHTML DTDs are:
- Strict: Used when the XHTML document is devoid of all formatting tags like
<font>and Cascading Style Sheets (CSS) are used for controlling all presentation aspects. - Transitional: This XHTML DTD allows use of presentation tags in the document. This is a safer mode since most of our pages contain many presentation elements.
- Frameset: Used for XHTML documents that describes frames.
This tutorial covers the important steps to be followed to migrate HTML code to XHTML 1.0 Transitional. A few important reference links are also provided at the end of this article.
General Rules for converting HTML to XHTML
- The first line in the HTML document may be the XML processing instruction:
<?xml version="1.0" encoding="iso-8859-1"?>
W3C recommends that this declaration be included in all XHTML documents, although it is absolutely required only when the character encoding of the document is other than the default Unicode UTF-8 or UTF-16. I said necessary because there can be problems with older browsers which cannot identify this as a valid HTML tag.
- The second line in the XHTML document should be the specification of the document type declaration (DTD) used. The document type declaration for transitional XHTML documents is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The declarations for the strict XHTML DTD is:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The declarations for the frameset XHTML DTD is:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
- XML requires that there must be one and only one root element for a document. Hence, in XHTML, all tags should be enclosed within the <html> tag, ie., <html> should be the root element for the document.
- The starting tag <html> should be modified to include namespace information. The modification is:
<html xmlns="http://www.w3.org/1999/xhtml" lang="EN">Attribute xmlns is the XML namespace with which we associate the XHTML document. The value of the attribute lang is the code for the language of the document as specified in RFC1766. - All XHTML tag elements should be in lower case. That means
<HTML>and<Body>are wrong. They should be rewritten as<html>and<body>respectively. - All XHTML tags should have their end tags. In HTML it is common for paragraphs to have only the starting
<p>tag. In XHTML this is not allowed. You need to end a paragraph with the</p>tag. Example:<p>Hellois wrong; it should be written as<p>Hello</p>. - Empty XHTML tags should be ended with
/>instead of>. The commonly used empty tags in XHTML are:<meta />: for meta information (contained in the head section).<base />: used to specify the base URI and also the target frame for hyperlinks (contained in the head section).<basefont />: used to specify a base font for the document. Note that attribute 'size' is mandatory.<param />: parameters for applets and objects.<link />: to specify external stylesheets and other references.<img />: to include images. Attributes 'src' for the source URI and 'alt' for alternate text are mandatory.<br />: used for forced line break.<hr />: for horizontal rules.<area />: used inside image maps. Attribute 'alt' is mandatory.<input />: used inside forms for input form elements like buttons, textboxes, textareas, checkboxes and radio buttons.
Example: <br clear="all"> is wrong; it should be rewritten as <br clear="all" />. <img src="back.gif" alt="Back"> is wrong; it should be <img src="back.gif" alt="Back" />
- Proper nesting of tags is compulsory in XHTML. Example:
<b><i>This is bold italics</b><i>is wrong. It should be rewritten as<b><i>This is bold italics</i><b>.
Rules for XHTML Attributes
- All XHTML attribute names should be in lower case.
Example:Width="100"andWIDTH="100"are wrong; onlywidth="100"is correct.
SimilarlyonMouseOut="javascript:myFunction();"is wrong; it should be rewritten asonmouseout="javascript:myFunction();". - All attribute-value pairs should be quoted.
Example:width=100is wrong; it should bewidth="100"orwidth='100'. - HTML supports certain attributes which have no values. Examples are
noshadewhich appears in the<hr noshade />tag. XHTML does not allow such empty or compact attributes. The compact attributes generally found in HTML arecompact,nowrap,ismap,declare,noshade,checked,disabled,readonly,multiple,selected,noresizeanddefer. They should always have a value. In XHTML this is done by giving the attribute name itself as the value!
Example:noshadebecomesnoshade="noshade"checkedbecomeschecked="checked". - The
nameattribute is deprecated and will be removed in a future version of XHTML and theidattribute will take its place. So, for HTML tags that need thenameattribute, anidattribute should also be specified with the same value as that forname.
Example:<frame name="myFrame" >becomes<frame name="myFrame" id="myFrame" > - All
& (ampersand)characters in the source code have to be replaced with&, which is the equivalent character entity code. This change should be done in all attribute values and URIs.<br />
Example:Bee&Neewill result in an error if you try to validate it; It should be written asBee<b>&</b>Nee.
<a href="my.asp?action=read&value=1">Go</a> is wrong; it should be coded as <a href="my.asp?action=read<b>&</b>value=1">Go</a>.
XHTML Tables
- For
<table>tag, attributeheightis not supported in XHTML 1.0. Only thewidthis supported. The<td>tag does support theheightattribute. - The
<table>,<tr>and the<td>tag does not support the attributebackgroundwhich is used to specify a background image for the table or the cell. Background images will have to be specified either using thestyleattribute or using external stylesheet. The attributebgcolorfor background color is however supported by these tags.
XHTML Images
- The
altattribute is mandatory. This value of this attribute will be the text that has to be shown in older browsers, text-only browsers (like lynx), and in place of the image when it is not available. Note that<img>is an empty tag.
Example:<img src="back.gif" alt="Back" />
XHTML and Javascript
- The
typeattribute is mandatory for all<script>tags. This value oftypeistext/javascriptfor Javascript. - The use of external scripts is recommended.
Example:
<script type="text/javascript" language="javascript" src="functions.js"></script>
- If you are using internal scripts, enclose it within the starting tag
<![CDATA[and the ending tag]]>. This will mark it as unparsed character data. Otherwise characters like & and < will be treated as start of character entities (like ) and tags (like<b>) respectively.
Example for XHTML Javascript:
<script type="text/javascript" language="Javascript">
<!--
<![CDATA[
document.write('Hello World!');
]]>
//-->
</script>
XHTML and Stylesheets
- The
typeattribute is mandatory for<style>tag. The value oftypeistext/cssfor stylesheets. - The use of external stylesheets is recommended.
Example:<link rel="stylesheet" type="text/css" href="screen.css" />
Enclose internal style definitions within the starting tag<![CDATA[and the ending tag]]>to mark it as unparsed character data.
Example:
<style type="text/stylesheet">
<![CDATA[
.MyClass { color: #000000; }
]]>
</style>
Otherwise the & and < characters will be treated as start of character entities (like ) and tags (like<b>) respectively.
Element Prohibitions in XHTML
The W3C recommendation also prohibits certain XHTML elements from containing
some elements. Those are given below:
<a>cannot contain other<a>elements.<pre>cannot contain the<img>,<object>,<big>,<small>,<sub>, or<sup>elements.<button>cannot contain the<input>,<select>,<textarea>,<label>,<button>,<form>,<fieldset>,<iframe>, or<isindex>elements.<label>cannot contain other<label>elements.<form>cannot contain other<form>elements.
XHTML Resources on the Web
- The W3C Pages on XML: The W3C are the people who work for the formulation and standardisation of Web technologies including XHTML. They are the best place to go.
- W3Schools tutorials and references on XHTML and CSS - the best references are the most handy reference for any Web developer.
- Download the XHTML 1.0 Transitional DTD: The DTD (Document Type Definition) is used to define an XML application. XHTML is also a XML application and all the rules can be found in this well documented DTD.
- HTML-Tidy: Written by Dave Reggett, this tool can will accept any bloated or rotten HTML and make it to adhere to standards. It can also be used to accelerate conversion of HTML to XML or XHTML.
- Chami's HTML-Kit: An excellent HTML editor (not visual, but supports previewing) which supports XHTML. It supports the HTML-Tidy as a plugin. Recommended.
- The W3C Online Validator for XHTML: XHTML documents can be validated online with this W3C Service. Recommended.
- RFC1766: This RFC defines the two-letter tags for the Identification of Languages.
If you found this article useful, please take a moment to sign my guestmap. That will encourage me to write more on XHTML and related topics.
