SEO, MS Word, Compact HTML, and object/data

We have twenty-or-so MS Word 2000 documents that we want to display on
our website.

What we did was convert the MS Word documents to Compact HTML. We then
display a document via an

This all works great except for a bit of a fly in the ointment.

Doing an SEO (Search Engine Optimization) analysis shows that a least
one analyzer does not analyze the contents of “/doc/somedoc.htm”.

I guess it is reasonable not to count the contents of the document
pointed to because it might not even be owned by the displaying page.


But these MS Word 2000 are ours. So does anyone know of a way to
automatically convert the htm file produced so that I can render it
rather than refer to the document via object/data?

On Apr 14, 2011, at 5:41 AM, Ralph S. wrote:

Doing an SEO (Search Engine Optimization) analysis shows that a least
one analyzer does not analyze the contents of “/doc/somedoc.htm”.

I guess it is reasonable not to count the contents of the document
pointed to because it might not even be owned by the displaying page.


But these MS Word 2000 are ours. So does anyone know of a way to
automatically convert the htm file produced so that I can render it
rather than refer to the document via object/data?

If these documents are all alike in internal structure, you could
write a little script using Nokogiri to capture only the id=“whatever”
node containing the page content, and then write that back out as a
sort of partial. OR suck it into an ActiveRecord object and persist it
in your database.

require ‘rubygems’
require ‘nokogiri’
require ‘fileutils’

fp = ‘/path/to/your/file’
#if your starting document is well-formed
doc = Nokogiri::XML(File.read(fp))
#otherwise
#doc = Nokogiri::HTML(File.read(fp))
div = doc.at_css(’#someDiv’)
output = div.to_xhtml

Walter