From: jzakiya [email protected]
I was checking various websites using this W3C validator:
http://validator.w3.org
In recent months I’ve been meaning to ask, and this seems an
appropriate time: Virtually every Web site I’ve look at, except my
own on my Unix shell site, and even my own PHP output on free
hosting sites that ad advertisements before or after my
PHP-generated valid syntax, fails validation. Google is an
egretious example, the 800 pound gorilla that can do any FKING
thing it wants any time it FKING time it wants because nobody has
the power to correct it. Also, every job-search Web site I’ve
examined so-far (DICE, Monster.Com, craigslist, hotjobs, etc.)
fails validation. If I ever encounter a Web site I didn’t create
myself which passes validation I’m surprised. I’ve created a Web
page documenting some of this horridly non-validating HTTP output
that pretends to be HTML or XHTML:
job-search Web sites, critique
In the apparent war between W3C and the vast majority of commercial
Websites, who is correct?
Is W3C correct, and all these Websites are broken?
Or are these Websites just fine, and W3C is being pedantic or even
anal retentive?
Off and on during the past several years I’ve developed a
non-validating generic SGML/XML parser. The only legal syntax it
doesn’t understand is SGML null-end-tags, because their syntax
conflicts with XML’s use of self-closing tags.
In the course of trying to use it to parse output from Yahoo,
Google, and other major Websites, I’ve needed to modify my parser
to accept lots of INVALID syntax. The most common cases are:
- URLs and other property values that aren’t enclosed in quotes.
- Bare ampersands in query strings of URLs not expressed as &
- Bare brockets as text not expressed as < or >
- Scripts (usually JavaScript) that aren’t contained within comments.
I thought I had covered all such broken syntax, allowing it to be
gracefully parsed to create a DOM (parse tree). But last night
while parsing a job ad downloaded from dice.com I found an open
brocket immediately followed by a space character, which my parser
doesn’t currently handle, so I need to fake some “plausable” parse
for the invalid syntax. Here are the original dice URL and the URL
for applying the W3C validator to it:
http://seeker.dice.com/jobsearch/servlet/JobSearch?op=302&dockey=xml/9/d/9d2e5198c1866bf1fe49cb0e9aa302aa@endecaindex&source=19&FREE_TEXT=PHP&rating=99
= http://tinyurl.com/4kq8s9d
(ad for job for Ruby on Rails
Position ID: 10203112000007802
Dice ID: 10106525)
http://validator.w3.org/check?uri=http%3A%2F%2Fseeker.dice.com%2Fjobsearch%2Fservlet%2FJobSearch%3Fop%3D302%26dockey%3Dxml%2F9%2Fd%2F9d2e5198c1866bf1fe49cb0e9aa302aa@endecaindex%26source%3D19%26FREE_TEXT%3DPHP%26rating%3D99&charset=(detect+automatically)&doctype=Inline&group=0
= http://tinyurl.com/5ucbcxe
(179 Errors, 99 warnings)
…
9. Warning Line 172, Column 18: character “<” is the first character
of a delimiter but occurred as data
for (i = 1; i <= 3; i++) {
10. Warning Line 179, Column 66: cannot generate system identifier
for
general entity “dockey”
…
"/jobsearch/servlet/JobSearch?op=306&dockey=xml/9/d/9d2e5198c1866bf1fe49cb0e9…
…
248. Warning Line 1203, Column 149: character “<” is the first
character of a delimiter but occurred as data
… United States specializing in audit <
http://www.deloitte.com/dtt/section_no…
and just wanted to congratulate the ruby home site as passing:
www.ruby-lang.org
Indeed, I checked it just now, and it still passes. It’s ironic
that the Ruby language Web site passes, but an advertiser for job
for Ruby on rails breaks HTML worse than any other Web page I’ve
ever encountered. Too bad the Ruby folks don’t enforce standards on
their users.
In comparison, those slackers over at www.python.org show 1 error
and 1 warning on their site.
Not an egregious error at all:
1. Error Line 196, Column 30: value of attribute “method” cannot be
“POST”; must be one of “get”, “post”
Should be easy to fix if somebody can get the attention of the Web
author/manager there.
Here are some other sites that pass 100%:
www.msn.com
www.firefox.com
www.mozilla.com
Those three confirmedm firefox&mozilla as expected but msn rather a
pleasant surprise, but:
www.oasis-open.org
Error found while checking this document as XHTML 1.0 Transitional!
Result: 1 Error, 1 warning(s)
Address: http://www.oasis-open.org/home/index.php__________
1. Warning Line 242, Column 40: character “&” is the first character
of a delimiter but occurred as data
… Conference Proceedings & Webcast Available
<div
Another stupid failure-to-convert from text to HTML text, easily
fixed if somebody can get the attention of the Web author/manager
there.
And a shocker, www.openoffice.org has 4 ERRORS! (as of July 7, 2009)
It’s gotten worse since then:
Errors found while checking this document as XHTML 1.0 Transitional!
Result: 10 Errors, 3 warning(s)
Also, some prominent sites that have errors:
www.gmail.com
www.yahoo.com
www.google.com
Yeah, I noticed those already myself. Even worse, JavaScript isn’t
available here (VT100 term on FreeBSD Unix) so gmail doesn’t work
at all here, because it absolutely requires JavaScript. Facebook
likewise, except at least it recognizes te problem and redirects me
to an error page right at the top. Correction: The last twenty
times I tried over the past several years, it redirected me to
“We’re not cool enough to support your web browser.”
But when I tried it just now, for the very first time ever I get a
login form in lynx. Unfortunately I’ve never been able to get a
FaceBook account, even from a public-access Microsoft-IE, so I
can’t test FaceBook login from here. MySpace by comparison worked
fine the last time I tried it from lynx.
But maybe it would be a nice see Merb, Ramaze, Sinatra or…
used to write a little web app to track and list W3C
(non)conformance of sites (if such a project doesn’t already
exist), Let’s out the bad and hail the good!
I have IMO a better idea: If and when http://TinyURL.Com/NewEco
gets enough users, I plan to actually pay users (not cash, just
labor credits, i.e. funny money that can be used to pay for metered
WebSite usage and/or hire others to do contract work) to report
good/bad Web sites (to keep my database up to date) and to pester
managers of bad Web sites to fix their egregious HTML or English
mistakes. The first targets of intense pestering would most likely
be Google.Com, Yahoo.Com, DICE.Com, and Monster.Com. But pestering
by random nobodies wouldn’t convince those big Web sites to fix
their mistakes, so I would use http://TinyURL.Com/RLLink to locate
chains of people (as in “seven degrees of freedom”) from we who are
complaining to they who need to pay attention to our complaints. If
the WebMasters’ best friends start complaining to the WebMasters
that the Web sites is so grossly bad that they (best friends of
the WebMasters) are getting pestered and begged to please pester
the actual WebMaster, maybe they (WebMasters) finally pay attention
to our complaints.
Which brings me back to my current problem with dice.com: The
reason I’m currently working on building software to harvest job
ads from job-search Web sites and filter them to eliminate jobs for
which the user is not qualified, is not just because it’d be useful
to me personally, but also because I believe that will be so useful
to the 25% of adult population that are either unemployed or
underemployed or “no longer in the workforce” or otherwise not
employed to their desires, that users will start flocking to
http://TinyURL.Com/NewEco in order to get access to job-ad
filtering, and will find the service worth the labor-cost needed to
use it, hence will build up user base of people willing to do work
for me in exchange for using my service, such work including
finding chains of people from here to bad WebMasters and pestering
through such chains to friends of the bad WebMasters.
Note: I’m opposed to harassing innocent people. But in any free
society I believe we have a right to “redress of grievience” by
petitionning our de jure (goverment) masters and also by
petitionning our de facto (technological, business) masters such as
Google and DICE. If our masters claim that our petitions for
redress of grievance is “harassing” to them, they are mistaken, and
should “mend their ways” rather than “shoot the messenger”.
I hope that http://TinyURL.Com/NewEco + http://TinyURL.Com/RLLink
will provide a cybernetic means for effective redress of grievience.