On Fri, 2006-01-27 at 16:15, James B. wrote:
Historical reasons, mostly. HTML started out as inspired by SGML, rather
than compliant with SGML. The people who built the first web servers
didn’t know much about SGML, and so they reinvented processing
instructions in an annoyingly incompatible manner. Ever since, web
frameworks have been built on a solid foundation of
don’t-bother-me-with-the-basics-of-SGML/XML-processing. Quite
successfully too, which really grates my cheese.
Do you have any references for this?
Tim Berners-Lee certainly knew about SGML. The original specification
explicitly mentions it. See
Tags used in HTML.
However, HTML was not fully SGML compliant. For example, there is a
sentence in the original spec that says:
“Currently HTML documents are transmitted without the normal SGML
framing tags, but if these are included parsers will ignore them.”
There was also an original test dataset, including this file:
Hypertext HTML formatting example.
If you look at the source, you can see that it is not fully SGML
compliant. For starters, there is no Doctype. Also, there are tags that
contain formatted text that is not wrapped in a CDATA section. Neither
is allowed in SGML.
An interesting thing to note is that the
tag was used to indicate
the end of a paragraph in the test document, though the original spec
said
was a paragraph start tag. I remember that all my early HTML
books said
was an end tag. Unfortunately, I threw those books away
years ago.
Also, I believe the first HTML DTD was for version 2.0, written in 1995.
Here is the link: http://www.w3.org/MarkUp/html-spec/html.dtd. Since
there was no DTD for version 1.0, it could not have been SGML compliant.
In all fairness, I could be wrong about there being no HTML 1.0 DTD.
There are notes from 1992 that talk about the future of HTML and “a new
DTD”, which indicates the existence of an old one. It’s just that I
haven’t found it. Even so, the lack of a requirement for a Doctype would
be enough to render HTML non-compliant. More importantly, it would not
be parseable by SGML parsers.
At the time, loosing the Doctype and CDATA sections, and not supporting
hierarchical chapter and section structures, was probably the right
decision. HTML had to be very simple, or people would not have used it.
If the design had been “better”, we might not have had a web today.
I’m pretty sure Tim Berners-Lee,
Marc Andreessen, etc. knew about SGML, and I do not believe that HTML
ever had PIs.
I have never seen a HTML spec that mentions processing instructions. Nor
is there any need to. Processing instructions can be defined by anyone
who designs a processing application, they are not tied to a specific
DTD or SGML application. (Well, except that some specifications
explicitly defines some PIs, but there is nothing that prevents users of
the DTD to specify more of them.)
I can’t prove that the people who wrote the first web servers did not
know about PIs, but I think it is likely. If they had known, what
possible reason could they have had for deliberately doing something
that was not SGML compliant? (Browser wars and vendor lock in didn’t
become major issues until later.)
Also, the xml-dev list is a good place to read varying, but informed,
opinions on the use of PIs.
For example:
xml-dev - Well-established uses of processing instructions?
I follow the list, though not as carefully now as I did a couple of
years ago. In addition to the applications mentioned in the thread you
refer to, XML editors, like XMetaL and Arbortext Editor make use of
processing instructions. So does many proprietary SGML/XML processing
systems.
/Henrik
–
http://kallokain.blogspot.com/ - Blogging from the trenches of software
development
http://www.henrikmartensson.org/ - Reflections on software development
http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page
http://declan.rubyforge.org/ - The Declan Home Page