Re: pretty-print and cleanse RHTML?

Phlip · July 22, 2007, 12:58am

Suppose someone gave us fresh HTML to import as eRB (.rhtml). Such as from
an obsolete PHP project. We ought to upgrade, cleanse, and pretty-print
that HTML like this…

tidy -i -asxhtml old.html > new.rhtml

Below my sig is a program to temporarily replace <% and %>
with and run Tidy. Save it as ‘tidyErb.rb’, and use this
usage line:

usage: ruby tidyErb.rb <filename.rhtml> >output.rhtml

‘filename.rhtml’ and ‘output.rhtml’ may not be the same file. The
program
wastes a file called ‘scratch.html’, with no attempt to avoid any source
files with the same name…

As a convenience, the program reports diagnostics to STDERR. Obey them
(per
assert_tidy), to improve your programs!

Note that Tidy treats as flow-tags not block-tags. (My
verbiage.
is a flow-tag, and

is the cannonical block-tag. Tidy
line-wraps
the former.)

Searching for <% and moving them to their correct indentation (such as
for
<% end %>) is a small price to pay for clean HTML!

Oh, also, review my gsubs to see if they match what Tidy did to your
RHTML’s
comments, and <%%> nested inside attributes. If I return to this
project, I
will just upgrade Tidy…

–
Phlip
http://www.oreilly.com/catalog/9780596510657/
“Test Driven Ajax (on Rails)”
assert_xpath, assert_javascript, & assert_ajax

if ARGV.size != 1 or !File.exist?(filename = ARGV.first)
puts ‘usage: ruby tidyErb.rb <filename.rhtml> >output.rhtml’
exit
end

rhtml = File.read(filename)
escaped = rhtml.gsub(‘<%’, ‘’)
File.open(‘scratch.html’, ‘w’){|f| f.write(escaped) }
system(‘tidy -i -asxhtml -m scratch.html’)
html = File.read(‘scratch.html’)
File.unlink(‘scratch.html’)
html.gsub!(‘<!–’, ‘<!–’) # undo Tidy’s CGI-nanny escapes
html.gsub!(‘–>’, ‘–>’)
html.gsub!(‘%3C!–%’, ‘<!–%’)
html.gsub!(‘%20’, ’ ‘) # TODO nest this gsub?
html.gsub!(’%–%3E’, ‘%–>’)
rhtml = html.gsub(‘%–>’, ‘%>’).gsub(‘<!–%’, ‘<%’)

puts rhtml