XHTML vs HTML? - Page 4 - Webmaster Forum - Web Design, Programming and SEO forums

09-28-2007, 02:40 PM

#31

insub2

Status: Member
Join date: Jun 2007
Location:
Expertise:
Software:

Posts: 136

iTrader: 1 / 100%

insub2 is on a distinguished road

Originally Posted by LeeP

I used XHTML, but just had my first Website Fundamentals class at university and they said where going to be using HTML over XHTML because they said it was better to use, but don't quote me on that. They also mentioned using HTML because of the HTML 5 coming out.

did they say why it was better?

p.s. sorry for quoting you.

10-05-2007, 09:29 AM

#32

LeeP

Status: Member
Join date: Jan 2007
Location: West Midlands, UK.
Expertise:
Software:

Posts: 371

iTrader: 5 / 100%

LeeP is on a distinguished road

Originally Posted by insub2

did they say why it was better?

p.s. sorry for quoting you.

XHMTL isn't fully supported by IE.

XHTML 2.0 isn't backwards compatiable.

HTML 4.01 is supported by IE.

HTML 5 is in draft and will support IE.

So basically you want everybody to be able to view your website, so you should use HTML 4.01 and in the future HTML 5.

10-05-2007, 03:39 PM

#33

RaZoR^

Status: Member
Join date: Feb 2006
Location:
Expertise:
Software:

Posts: 191

iTrader: 1 / 100%

RaZoR^ is on a distinguished road

HTML 4.01 is supported by IE.

Hardly. IE7 doesn't come close to doing what it should with HTML 4, let alone when HTML 5 is out of draft. HTML pages work just as poorly as XHTML pages in IE6/7.

XHTML isn't fully supported in IE simply because XML isn't. If you use XHTML without XML, then it'll [probably] just fall back to the SGML parser and the pages will work about the same. In XHTML 1.0 you can even serve the document up as text/html.

10-25-2007, 06:03 AM

#34

jordan23

Status: I'm new around here
Join date: Oct 2007
Location:
Expertise:
Software:

Posts: 3

iTrader: 0 / 0%

jordan23 is on a distinguished road

It's much more appreciated in the world of coding. It's also usually more browser compliant.

ipod software
ipod software

11-06-2007, 12:28 AM

#35

Junglist

Status: Sin Binner
Join date: Nov 2007
Location:
Expertise:
Software:

Posts: 27

iTrader: 0 / 0%

Junglist is on a distinguished road

XHTML always

11-06-2007, 12:52 AM

#36

Village Genius

Status: Geek
Join date: Apr 2006
Location: Denver, CO
Expertise: Software
Software: Chrome, Notepad++

Posts: 6,894

iTrader: 18 / 100%

Village Genius will become famous soon enough

The purpose of the 10 post limit is so members can contribute something before taking something, not spam.

02-03-2008, 02:12 AM

#37

Szandor

Status: Junior Member
Join date: Jan 2008
Location: Växjö, Sweden
Expertise:
Software:

Posts: 45

iTrader: 0 / 0%

Szandor is on a distinguished road

HTML wasn't always about the presentation of data, it was about proper markup and semantics in the beginning, but became corrupt and perverted over time, ending up in a mess of tables and FONT-tags. XHTML is what HTML was meant to be, but I also believe (or rather hope) that HTML 5 will be fully semantic and non-presentational.

I also like the strict rules of XML, making it easier for browsers to read and interpret the code. I write other XML documents as well so it comes naturally to me to use XHTML.

02-08-2008, 11:05 PM

#38

Southern Media

Status: Junior Member
Join date: Sep 2006
Location:
Expertise:
Software:

Posts: 72

iTrader: 0 / 0%

Southern Media is on a distinguished road

Send a message via ICQ to Southern Media

Send a message via AIM to Southern Media

Send a message via MSN to Southern Media

Send a message via Yahoo to Southern Media

Send a message via Skype™ to Southern Media

Why you should use HTML against XHTML?

Maybe you already knew this one.

Sending XHTML as text/html Considered Harmful -*- Mode: text; -*-
=============================================

Author: Ian Hickson <ian@hixie.ch> (Comments welcome.)

Abstract
--------

A number of problems resulting from the use of the text/html MIME type
in conjunction with XHTML content are discussed. It is suggested that
XHTML delivered as text/html is broken and XHTML delivered as text/xml
is risky, so authors intending their work for public consumption
should stick to HTML 4.01, and authors who wish to use XHTML should
deliver their markup as application/xhtml+xml.

Other versions
--------------

Une traduction française est disponible:
http://www.hixie.ch/advocacy/xhtml.fr

The Safari development team posted a blog entry on this topic:
http://webkit.org/blog/?p=68

Context
-------

This was originally written in September 2002 in the context of this
Web log entry:

http://ln.hixie.ch/?start=1031465247&count=1

It has since been regularly updated to correct errors that have been
brought up in various mailing lists and other discussion forums. As of
2007, it is still just as relevant as when it was originally written.

Note that this document compares XHTML 1.0 compliant to appendix C to
HTML 4.01, because that is the only variant of XHTML that may be sent
as text/html.

Terminology: Appendix C refers to the XHTML specification's
so-called "HTML Compatibility Guidelines", which can be found at:
http://www.w3.org/TR/xhtml1/#guidelines

Executive Summary
-----------------

If you use XHTML, you should deliver it with the application/xhtml+xml
MIME type. If you do not do so, you should use HTML4 instead of XHTML.
The alternative, using XHTML but delivering it as text/html, causes
numerous problems that are outlined below.

Unfortunately, IE6 does not support application/xhtml+xml (in fact, it
does not support XHTML at all).

Why using text/html for XHTML is bad
------------------------------------

What usually happens to authors who decide to send XHTML as text/html
is the following:

1. Authors write XHTML that makes assumptions that are only valid for
tag soup or HTML4 browsers, and not XHTML browsers, and send it as
text/html. (The common assumptions are listed below.)

2. Authors find everything works fine.

3. Time passes.

4. Author decides to send the same content as application/xhtml+xml,
because it is, after all, XHTML.

5. Author finds site breaks horribly. (See below for a list of
reasons why.)

6. Author blames XHTML.

Steps 1 to 5 have been seen by every single person I have spoken to
who has switched to using the XHTML MIME type. The only reason step 6
didn't happen in those cases is that they were advanced authors who
understood how to fix their content.

SPECIFIC PROBLEMS

These are the issues that affect documents when they are switched from
text/html to application/xhtml+xml:

* <script> and <style> elements in XHTML sent as text/html have to be
escaped using ridiculously complicated strings.

This is because in XHTML, <script> and <style> elements are #PCDATA
blocks, not #CDATA blocks, and therefore  really _are_
comments tags, and are not ignored by the XHTML parser. To escape
script in an XHTML document which may be handled as either HTML4 or
XHTML, you have to use:

<script type="text/javascript"><![CDATA[//><!]]></script>

To embed CSS in an XHTML document which may be handled as either
HTML4 or XHTML, you have to use:

<style type="text/css"><![CDATA[/*></style>

Yes, it's pretty ridiculous. If documents _aren't_ escaped like
this, then the contents of <script> and <style> elements get
dropped on the floor when parsed as true XHTML.

(This is all assuming you want your pages to work with older
browsers as well as XHTML browsers. If you only care about XHTML
and HTML4 browsers, you can make it a bit simpler.)

* A CSS stylesheet written for an HTML4 document is interpreted
slightly differently in an XHTML context (e.g. the <body> element
is not magical in XHTML, tag names must be written in lowercase in
XHTML). Thus documents change rendering when parsed as XHTML.

* A DOM-based script written for an HTML4 document has subtly
different semantics in an XHTML context (e.g. element names are
case insensitive and returned in uppercase in HTML4, case sensitive
and always lowercase in XHTML; you have to use the namespace-aware
methods in XHTML, but not in HTML4). BUT, if you send your
documents as text/html, then they will use the HTML4 semantics
DESPITE being XHTML! Thus, scripts are highly likely to break when
the document is parsed as XHTML.

* Scripts that use document.write() will not work in XHTML contexts.
(You have to use DOM Core methods.)

* Current browsers are, for text/html content, HTML4 user agents (at
best) and certainly not XHTML user agents. Therefore if you send
them XHTML you are sending them content in a language which is not
native to them, and instead relying on their error handling. Since
this is not defined in any specification, it may vary from one user
agent to the other.

* XHTML documents that use the "/>" notation, as in "<link />" have
very different semantics when parsed as HTML4. So if there was to
be a fully compliant HTML4 user agent, it would be quite correct to
show ">" characters all over the page.

For more details on this see the third bullet point in the section
entitled "The Myth of "HTML-compatible XHTML 1.0 documents"".

COPY AND PASTE

The worst problem, and the main reason (I suspect) for most of the
REALLY invalid XHTML pages out there, is that authors who have no clue
about XHTML simply copy and pasted their DOCTYPE from another
document. So even if you write valid XHTML, by using XHTML, you are
likely to encourage authors who do not know enough to write valid
XHTML to claim to do so.

Why trying to use XHTML and then sending it as text/html is bad
---------------------------------------------------------------

These are not likely to be problems for authors who regularly validate
their pages, but other authors will run into these problems.

* Documents sent as text/html are handled as tag soup [1] by most
browsers.

This is the key. If you send XHTML as text/html, as far as browsers
are concerned, you are just sending them Tag Soup. It doesn't
matter if it validates, they are just going to be treating it the
same was as plain old HTML 3.2 or random HTML garbage.

Since most authors only check their documents using one or two
browsers, rather than using a validator, this means that authors
are not checking for validity, and thus most documents that claim
to be XHTML on the web now are invalid.

See, for example, this study:
http://www.goer.org/Journal/2003/Apr/index.html#results
...but if you don't believe it, feel free to do your own. In any
random sample of documents that appear to claim to be XHTML, the
overwhelming majority of documents are invalid.

Therefore the main advantage of using XHTML, that errors are caught
early because it _has_ to be valid, is lost if the document is then
sent as text/html. (Yes, I said _most_ authors. If you are one of
the few authors who understands how to avoid the issues raised in
this document and does validate all their markup, then this
document probably does not apply to you -- see Appendix B.)

* If you ever switch your documents that claim to be XHTML from
text/html to application/xhtml+xml, then you will in all likelyhood
end up with a considerable number of XML errors, meaning your
content won't be readable by users. (See above: most of these
documents do not validate.)

* If a user saves such an text/html document to disk and later
reopens it locally, triggering the content type sniffing code since
filesystems typically do not include file type information, the
document could be reopened as XML, potentially resulting in
validation errors, parsing differences, or styling differences.
(The same differences as if you start sending the file with an XML
MIME type.)

* The only real advantage to using XHTML rather than HTML4 is that it
is then possible to use XML tools with it. However, if tools are
being used, then the same tools might as well produce HTML4 for you.
Alternatively, the tools could take SGML as input instead of XML.
(SGML is over a decade older than XML and the tools have existed
for years.)

* HTML 4.01 contains everything that XHTML 1.0 contains, so there is
little reason to use XHTML in the real world. It appears the main
reason is simply "jumping on the bandwagon" of using the latest and
(perceived) greatest thing.

The Myth of "HTML-compatible XHTML 1.0 documents"
-------------------------------------------------

RFC 2854 spec refers to "a profile of use of XHTML which is compatible
with HTML 4.01". There is no such thing. Documents that follow the
guidelines in appendix C are not valid HTML 4.01 documents. They just
happen to be close enough that tag soup parsers are able to handle
them just like most of the other pages on the Web.

The simplest examples of this are:

* The "/>" empty tag syntax actually has totally different meaning in
HTML4. (It's the SHORTTAG minimisation feature known as NET, if I
recall the name correctly.) Specifically, the XHTML

<p> Hello <br /> World </p>

...is, if interpreted as HTML4, exactly equivalent to:

<p> Hello <br>> World </p>

...and should really be rendered as:

Hello
> World

* Script and style elements cannot have their contents hidden from
legacy browsers. The following XHTML:

<style type="text/css">

</style>

...is exactly equivalent to the following HTML4:

<style type="text/css">

</style>

...because comments are not ignored in XHTML <style> blocks.

* The "xmlns" attribute is invalid HTML4.

* The XHTML DOCTYPEs are not valid HTML4 DOCTYPEs.

Using XHTML and sending it as text/html is effectively the same, from
an HTML4 point of view, as writing tag soup (see "Why browsers can't
handle XHTML sent as text/html as XML" below).

Note: This is covered by HTMLWG issue XHTML-1.0/6232:
http://hades.mn.aptest.com/cgi-bin/v...20c;user=guest

Why browsers can't handle XHTML sent as text/html as XML
--------------------------------------------------------

* Documents sent as text/html are handled as tag soup by most
browsers. This means that authors are not checking for validity,
and thus most XHTML documents on the web now are invalid. A
conforming XML browser would thus be unable to show as many
documents as current browsers, and would therefore never get enough
marketshare to be relevant.

* It is impossible to reliably autodetect XHTML when sent as
text/html. This is why browsers could not ever treat text/html
documents as XML, even if they did not care about not being usable
(see the first point in this section).

+ You can't sniff for the five characters "<?xml" because:

- The <?xml ... ?> header is optional per Appendix C, and it is
recommended not to include it as it causes IE6 to trigger
quirks mode.

- SGML can also contain PIs (see the example below). (A "PI" is
a "processing instruction", a syntactic construct that begins
with the two characters "<?".)

+ You can't trigger from the DOCTYPE since the W3C might introduce
new XHTML DOCTYPEs in future, so you don't know which DOCTYPEs
to look for. (Not to mention that DOCTYPEs are optional for
well-formed XHTML documents, DOCTYPE parsing is hard, DOCTYPEs
may be hidden in comments, and DOCTYPE sniffing has been called
harmful by many leading figures at the W3C and elsewhere.)

+ You can't trigger off the "<html xmlns" string because it might
be there but hidden in a comment (you'd need a complete XML
parser to step past comments, PIs, internal subsets, etc).

e.g. what language is this text/html document in?:

<?xml this is not?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
[  ]>

This is a comment. This document is not XHTML.
<html xmlns="http://www.w3.org/1999/xhtml"/>
Ok, I'm done now. -->
<html>
<title> Need a title in HTML4! </title>
<p> This is a valid HTML4 document.
</html>

* Even if you could detect XHTML, what do you do with a document that
is not well formed (such as the example above)? If you fall back on
HTML4, then there is no advantage to using an XML processor, and you
might as well always treat it as HTML4.

* The HTML working group said that browsers should not do this:
http://lists.w3.org/Archives/Public/...0Sep/0024.html

The advantages of XHTML
-----------------------

When sent as application/xhtml+xml, XHTML has several advantages:

1. XHTML content will be able to be mixed-and-matched with content
from other well-known namespaces (in particular, MathML). This
is the main advantage for content authors.

2. Browsers will immediately catch well-formedness errors (though
other errors still won't be caught).

3. Tools interacting with XHTML documents are guaranteed a
well-formed document.

However, none of these apply when an XHTML document is sent as
text/html, and since authors feel their pages should be readable on
the most popular Web browser, which does not support
application/xhtml+xml, there is basically no point in using XHTML at
the moment.

Conclusion
----------

There are few advantages to using XHTML if you are sending the content
as text/html, and many disadvantages.

In addition, currently, the majority (over 90% by most counts) of the
browser market is unable to correctly render real XHTML content sent
as text/xml (or other XML MIME types). For example, point IE at:

http://www.mozillaquestquest.com/

Only Mozilla, Mozilla-based browsers such as Netscape 6 and 7, recent
versions of Opera, and Safari, are able to correctly render that site.
(IE6 shows a DOM tree!)

Authors who are not willing to use one of the XML MIME types should
stick to writing valid HTML 4.01 for the time being. Once user agents
that support XML and XHTML sent as one of the XML MIME types are
widespread, then authors may reconsider learning and using XHTML.

(Advanced authors should also see appendix B.)

Further Reading
---------------

I wrote another document on a related matter: people wanting browsers
to treat XHTML documents sent as text/html as XML and not tag soup.

http://www.damowmow.com/playground/xhtml-in-uas.xhtml

Henri Sivonen wrote a similar document asking what is the point of
XHTML:

http://hsivonen.iki.fi/xhtml-the-point/

There are also many mailing list posts on this matter, e.g. on
www-talk. The following post summarises some issues relating to using
text/html for XHTML content containing XML extensions:

http://lists.w3.org/Archives/Public/...yJun/0046.html

Some people have run into the problems this document mentions, for
example:

http://flrant.com/index.php?id=P21

There are also some interesting points made in other posts, for
example:

| > But does Mozilla call its xml parser for http://www.w3.org/ ?
|
| Nope. If it did, it would render the page without any expanded
| character entity references, since Mozilla is not a validating
| parser and thus skips parsing the DTD and thus doesn't know what
|  , · and © are. Not to mention that it would end up
| ignoring the print-media specific section of the stylesheet, which
| uses uppercase element names and thus wouldn't match any of the
| lower case elements (line 138 of the first stylesheet), and it would
| use an unexpected background colour for the page because the
| stylesheet sets the background on <body> and not <html>, which in
| XHTML will result in a different rendering to the equivalent in
| HTML4 (same sheet, line 5).
-- http://lists.w3.org/Archives/Public/...yJun/0004.html

Or this post, near the end of the thread:

| I'm still looking for a good reason to write websites in XHTML _at
| the moment_, given that the majority of web browsers don't grok
| XHTML. The only reason I was given (by Dan Connolly [1]) is that it
| makes managing the content using XML tools easier... but it would be
| just as easy to convert the XML to tag soup or HTML before
| publishing it, so I'm not sure I understand that. And even then,
| having the content as XML for content management is one thing, but
| why does that require a minority of web browsers to have to treat
| the document as XML instead of tag soup? What's the advantage of
| doing that? And even _then_, if the person in control of the content
| is using XML tools and so on, they are almost certainly in control
| of the website as well, so why not do the content type munging on
| the server side instead of campaigning for UA authors to spend their
| already restricted resources on implementing content type sniffing?
|
| [1] http://lists.w3.org/Archives/Public/...yJun/0031.html
-- http://lists.w3.org/Archives/Public/...lAug/0005.html

Appendix A: application/xhtml+xml
---------------------------------

See: http://ln.hixie.ch/?start=1036767231&count=1

Appendix B: Advanced Authors
----------------------------

Some advanced authors are able to send back XHTML as
application/xhtml+xml to browsers that support it, and as text/html to
legacy browsers.

Assuming you are using XHTML 1.0 compliant to Appendix C (or have
otherwise checked that the XHTML 1.0 you send is compatible with Tag
Soup processors), then that's fine. All I am saying in this document
is that sending XHTML as text/html ONLY is harmful.

Note: Sending XHTML 1.1 as text/html is NEVER fine. There is no spec
that allows this. Sending XHTML 2.0 as anything in a production
(non-testing) context is NEVER fine either, since that spec has not
reached CR yet.

Also note that I would personally suggest that even advanced authors
not use XHTML sent as text/html, since many authors copy and paste
markup from others and thus may easily end up copying the valid XHTML
markup but using it as HTML4.

Appendix C: Acknowledgements
----------------------------

Thanks to Nick Boalch for the abstract. Thanks to Dan Connolly for
pedantry that has improved the quality of this document. Thanks to Ted
Shaneyfelt, Quinn Comendant, and many others for suggesting
improvements to the text.

Appendix D: Footnotes
---------------------

[1] The term "handled as tag soup" refers to the fact that browsers
typically are very lenient in their error handling, and do not support
any of the "advanced" SGML features. For example, browsers treat the
string "<br/>" as "<br>" and not "<br>>", the latter being what
HTML4/SGML says they should do. Similarly, real world browsers have no
problem dealing with content such as "<b> foo <i> bar </b> baz </i>"
even though according to the HTML4 spec that is meaningless.

Source: http://hixie.ch/advocacy/xhtml

02-10-2008, 06:19 PM

#39

Addict Web Studios

Status: Junior Member
Join date: Dec 2007
Location:
Expertise:
Software:

Posts: 26

iTrader: 0 / 0%

Addict Web Studios is on a distinguished road

xhtml and css is just a cleaner-faster loading code, and is miles better than standard html. Any coder will know how clean the code is for xhtml and how easy it can be styled with css, ive been coding xhtml and css for a while, and never had any browser problems for either me or people i code for.

02-10-2008, 10:42 PM

#40

Dr John

Status: Junior Member
Join date: May 2005
Location:
Expertise:
Software:

Posts: 77

iTrader: 0 / 0%

Dr John is on a distinguished road

Originally Posted by Addict Web Studios

Wrong.

HTML and css is just as clean and easy to use as XHTML. HTML can be as totally free of presentation mark-up as XHTML. You are confusing the way people used to code their HTML with lots inline styles, and the way when XHTML came out and early adopters all went for no inline styles and css. You can add as many inline styles to your XHTML as you want, and it will render (but may not validate), and have no inline styles in your HTML and lots of css, if you want. It's just that you have seen the early adopters of XHTML immediately reject the inline styles and assumed, wrongly, that HTML could not do that.

It's a frame of mind, and early users of HTML had the wrong frame of mind (mixing content and presentation) while early users of XHTML had the right frame of mind (total separation of content and presentation). But the ideas they used in their XHTML were perfectly applicable to HTML, it was just that most people didn't do it. Take any of your XHTML pages and change the doctype to html 4.01 strict, remove any closing /> that html doesn't require, and low and behold, it will render correctly in most modern browsers. Any errors in rendering will be due to IE bugs, not HTML not being as capable.

The tag soup problem from several years ago was not a feature of HTML, it was a feature of poor, unskilled, careless coders, with some browsers, mainly IE, being more capable of interpreting this erroneous code, and thus those poorer coders produced more dud code, unaware of it's errors. If they had bothered to validate the code, it should have been detected as erroneous. But they didn't bother, as IE worked out what they meant (usually, not always). You can write tag soup in XHTML, it's just that those who imagine it is better are more careful and avoid it, and they do tend to validate their code, the errors are detected and they correct the errors. You can't blame HTML for tag soup and messy code. It was just as wrong in HTML as it is in XHTML.

I've just added this to an XHTML Strict web page <b><i>Manufactured</b></i> and it rendered in bold and italic, exactly as predicted. In IE 7 and in FireFox. Depreciated, inline and tag soup. And invalid XHTML. But it works. Coders write code. They either write it correctly or wrongly.

PS HTML 5 will not be used for at least another 5 years, probably even longer. So forget about that for a while.