W3C Standards Compliant

I was checking various websites using this W3C validator:

and just wanted to congratulate the ruby home site as passing:

www.ruby-lang.org

In comparison, those slackers over at www.python.org show 1 error and
1 warning on their site. :slight_smile:

Here are some other sites that pass 100%:

www.msn.com

It’s nice to see prominent OSS projects taking standards seriously.
But hey, what’s with those Microshaft people?

And a shocker, www.openoffice.org has 4 ERRORS! (as of July 7, 2009)

Also, some prominent sites that have errors:

I did this just for fun.

But maybe it would be a nice see Merb, Ramaze, Sinatra or… used to
write a
little web app to track and list W3C (non)conformance of sites (if
such a project doesn’t already exist), Let’s out the bad and hail the
good!

Peace

From: jzakiya [email protected]
I was checking various websites using this W3C validator:
http://validator.w3.org

In recent months I’ve been meaning to ask, and this seems an
appropriate time: Virtually every Web site I’ve look at, except my
own on my Unix shell site, and even my own PHP output on free
hosting sites that ad advertisements before or after my
PHP-generated valid syntax, fails validation. Google is an
egretious example, the 800 pound gorilla that can do any FKING
thing it wants any time it F
KING time it wants because nobody has
the power to correct it. Also, every job-search Web site I’ve
examined so-far (DICE, Monster.Com, craigslist, hotjobs, etc.)
fails validation. If I ever encounter a Web site I didn’t create
myself which passes validation I’m surprised. I’ve created a Web
page documenting some of this horridly non-validating HTTP output
that pretends to be HTML or XHTML:
job-search Web sites, critique

In the apparent war between W3C and the vast majority of commercial
Websites, who is correct?

Is W3C correct, and all these Websites are broken?

Or are these Websites just fine, and W3C is being pedantic or even
anal retentive?

Off and on during the past several years I’ve developed a
non-validating generic SGML/XML parser. The only legal syntax it
doesn’t understand is SGML null-end-tags, because their syntax
conflicts with XML’s use of self-closing tags.

In the course of trying to use it to parse output from Yahoo,
Google, and other major Websites, I’ve needed to modify my parser
to accept lots of INVALID syntax. The most common cases are:

  • URLs and other property values that aren’t enclosed in quotes.
  • Bare ampersands in query strings of URLs not expressed as &
  • Bare brockets as text not expressed as < or >
  • Scripts (usually JavaScript) that aren’t contained within comments.
    I thought I had covered all such broken syntax, allowing it to be
    gracefully parsed to create a DOM (parse tree). But last night
    while parsing a job ad downloaded from dice.com I found an open
    brocket immediately followed by a space character, which my parser
    doesn’t currently handle, so I need to fake some “plausable” parse
    for the invalid syntax. Here are the original dice URL and the URL
    for applying the W3C validator to it:

http://seeker.dice.com/jobsearch/servlet/JobSearch?op=302&dockey=xml/9/d/9d2e5198c1866bf1fe49cb0e9aa302aa@endecaindex&source=19&FREE_TEXT=PHP&rating=99
= http://tinyurl.com/4kq8s9d
(ad for job for Ruby on Rails
Position ID: 10203112000007802
Dice ID: 10106525)

http://validator.w3.org/check?uri=http%3A%2F%2Fseeker.dice.com%2Fjobsearch%2Fservlet%2FJobSearch%3Fop%3D302%26dockey%3Dxml%2F9%2Fd%2F9d2e5198c1866bf1fe49cb0e9aa302aa@endecaindex%26source%3D19%26FREE_TEXT%3DPHP%26rating%3D99&charset=(detect+automatically)&doctype=Inline&group=0
= http://tinyurl.com/5ucbcxe
(179 Errors, 99 warnings)
…
9. Warning Line 172, Column 18: character “<” is the first character
of a delimiter but occurred as data
for (i = 1; i <= 3; i++) {
10. Warning Line 179, Column 66: cannot generate system identifier
for
general entity “dockey”
…
"/jobsearch/servlet/JobSearch?op=306&dockey=xml/9/d/9d2e5198c1866bf1fe49cb0e9…
…
248. Warning Line 1203, Column 149: character “<” is the first
character of a delimiter but occurred as data
… United States specializing in audit <
http://www.deloitte.com/dtt/section_no…

and just wanted to congratulate the ruby home site as passing:
www.ruby-lang.org

Indeed, I checked it just now, and it still passes. It’s ironic
that the Ruby language Web site passes, but an advertiser for job
for Ruby on rails breaks HTML worse than any other Web page I’ve
ever encountered. Too bad the Ruby folks don’t enforce standards on
their users.

In comparison, those slackers over at www.python.org show 1 error
and 1 warning on their site. :slight_smile:

Not an egregious error at all:
1. Error Line 196, Column 30: value of attribute “method” cannot be
“POST”; must be one of “get”, “post”

Should be easy to fix if somebody can get the attention of the Web
author/manager there.

Here are some other sites that pass 100%:
www.msn.com
www.firefox.com
www.mozilla.com
Those three confirmedm firefox&mozilla as expected but msn rather a
pleasant surprise, but:

www.oasis-open.org
Error found while checking this document as XHTML 1.0 Transitional!
Result: 1 Error, 1 warning(s)
Address: http://www.oasis-open.org/home/index.php__________
1. Warning Line 242, Column 40: character “&” is the first character
of a delimiter but occurred as data
… Conference Proceedings & Webcast Available

<div Another stupid failure-to-convert from text to HTML text, easily fixed if somebody can get the attention of the Web author/manager there.

And a shocker, www.openoffice.org has 4 ERRORS! (as of July 7, 2009)

It’s gotten worse since then:
Errors found while checking this document as XHTML 1.0 Transitional!
Result: 10 Errors, 3 warning(s)

Also, some prominent sites that have errors:
www.gmail.com
www.yahoo.com
www.google.com

Yeah, I noticed those already myself. Even worse, JavaScript isn’t
available here (VT100 term on FreeBSD Unix) so gmail doesn’t work
at all here, because it absolutely requires JavaScript. Facebook
likewise, except at least it recognizes te problem and redirects me
to an error page right at the top. Correction: The last twenty
times I tried over the past several years, it redirected me to
“We’re not cool enough to support your web browser.”
But when I tried it just now, for the very first time ever I get a
login form in lynx. Unfortunately I’ve never been able to get a
FaceBook account, even from a public-access Microsoft-IE, so I
can’t test FaceBook login from here. MySpace by comparison worked
fine the last time I tried it from lynx.

But maybe it would be a nice see Merb, Ramaze, Sinatra or…
used to write a little web app to track and list W3C
(non)conformance of sites (if such a project doesn’t already
exist), Let’s out the bad and hail the good!

I have IMO a better idea: If and when http://TinyURL.Com/NewEco
gets enough users, I plan to actually pay users (not cash, just
labor credits, i.e. funny money that can be used to pay for metered
WebSite usage and/or hire others to do contract work) to report
good/bad Web sites (to keep my database up to date) and to pester
managers of bad Web sites to fix their egregious HTML or English
mistakes. The first targets of intense pestering would most likely
be Google.Com, Yahoo.Com, DICE.Com, and Monster.Com. But pestering
by random nobodies wouldn’t convince those big Web sites to fix
their mistakes, so I would use http://TinyURL.Com/RLLink to locate
chains of people (as in “seven degrees of freedom”) from we who are
complaining to they who need to pay attention to our complaints. If
the WebMasters’ best friends start complaining to the WebMasters
that the Web sites is so grossly bad that they (best friends of
the WebMasters) are getting pestered and begged to please pester
the actual WebMaster, maybe they (WebMasters) finally pay attention
to our complaints.

Which brings me back to my current problem with dice.com: The
reason I’m currently working on building software to harvest job
ads from job-search Web sites and filter them to eliminate jobs for
which the user is not qualified, is not just because it’d be useful
to me personally, but also because I believe that will be so useful
to the 25% of adult population that are either unemployed or
underemployed or “no longer in the workforce” or otherwise not
employed to their desires, that users will start flocking to
http://TinyURL.Com/NewEco in order to get access to job-ad
filtering, and will find the service worth the labor-cost needed to
use it, hence will build up user base of people willing to do work
for me in exchange for using my service, such work including
finding chains of people from here to bad WebMasters and pestering
through such chains to friends of the bad WebMasters.

Note: I’m opposed to harassing innocent people. But in any free
society I believe we have a right to “redress of grievience” by
petitionning our de jure (goverment) masters and also by
petitionning our de facto (technological, business) masters such as
Google and DICE. If our masters claim that our petitions for
redress of grievance is “harassing” to them, they are mistaken, and
should “mend their ways” rather than “shoot the messenger”.

I hope that http://TinyURL.Com/NewEco + http://TinyURL.Com/RLLink
will provide a cybernetic means for effective redress of grievience.

Use of expletives on a public mailing list is pretty sub-optimal
vocabulary, wouldn’t you agree? Since when are companies like Google and
DICE (never heard of DICE) our technological masters? Chips are for
eating, not carrying on our shoulders. In a free society, if a citizen’s
main grievance is a general lack of adherence to adhoc lexical standards
with regards to unregulated public communications, they are doing rather
well in my opinion. Am I wrong in saying that commercial operations can
do what they please and if people don’t like it they can vote with their
feet and/ or wallets?

Additionally, it is all very well to claim the high ground of standards
compliance with plain white pages containing minimal content, and zero
stylistic formatting, however generally speaking the public at large
have moved on and prefer to see things like colour, animation, pictures,
movies, as well as new and interesting interfaces, rather than pages
designed primarily for a non sentient audience. My brain, for instance,
appears to have no trouble parsing an ampersand.

Sam

Use of expletives on a public mailing list is pretty sub-optimal
vocabulary, wouldn’t you agree?

Expletives are simply a concise method of indicating the level of
emotion. And he did self-sensor.

OTOH If you think that the level of emotion is misplaced, then that’s
another matter of course. This does read a little bit like a rant.

My 10c.

Additionally, it is all very well to claim the high ground of standards
compliance with plain white pages containing minimal content, and zero
stylistic formatting, however generally speaking the public at large
have moved on and prefer to see things like colour, animation, pictures,
movies, as well as new and interesting interfaces, rather than pages
designed primarily for a non sentient audience. My brain, for instance,
appears to have no trouble parsing an ampersand.

I don’t see that as answering the question; sorry. Is W3C standards
compliance relevant, and if so, how seriously should we take it?

The pragmatic approach – that if current browsers can read the page,
it’s okay – can only go so far.

On Mar 9, 2011, at 15:21 , Sam D. wrote:

Use of expletives on a public mailing list is pretty sub-optimal vocabulary,
wouldn’t you agree?

Please don’t feed the trolls…

On Thu, Mar 10, 2011 at 12:10 PM, Shadowfirebird
[email protected] wrote:

Please don’t feed the trolls…

Sure. Who are you referring to?

I’m not sure that any of these people are deliberately taking a position in
order to create an argument. Are you aware of some history that I’m not?

A rant about W3C compliance on a Ruby mailing list, including cloaked
website links. You do the math.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Please don’t feed the trolls…

Sure. Who are you referring to?

I’m not sure that any of these people are deliberately taking a position
in order to create an argument. Are you aware of some history that I’m
not?

On 10/03/11 23:10, Shadowfirebird wrote:

I don’t see that as answering the question; sorry. Is W3C standards compliance
relevant, and if so, how seriously should we take it?

The pragmatic approach – that if current browsers can read the page, it’s okay
– can only go so far.

Yes, well I didn’t see much of a question - more of a foaming diatribe,
and a few advertisments. W3C standards compliance is not something you
can rely on. Not something you have ever been able to rely on. Not
something you will ever be able to rely on. Take it as seriously as you
like for your own work (as we have witnessed some do), but don’t rely on
others doing the same for your benefit. The modern (and pragmatic)
approach is to use a data API where available, and perhaps suggest or
develop one where not. Screen scraping is very 1996, and certainly not
something commercial operations are going to go out of their way to
support.

Sam