Or: TTD is fun.
= ClothRed HTML 2 Textile converter
== What it is
A library to convert HTML into Textile markup for use, for example, with
RedCloth.
== Requirements
All you need is Ruby.
== Get it
Available as a gem on RubyForge:
gem install ClothRed
Or download from RubyForge:
http://rubyforge.org/frs/?group_id=3427
Or get the source:
svn checkout svn://viewvc.rubyforge.mmmultiworks.com/var/svn/clothred
== New in this release:
- Support for some HTML entities
- Support for tables
== Features
This is alpha software, and only a few Textile rules have been
implemented yet:
- font markup and weight (, , …)
- text formatting (, , ,
)
- Support for headings
- Support for paragraphs and
- Support for Textile entities
== Usage
require ‘clothred’
text = ClothRed.new(“Bold HTML!”)
text.to_textile
== Get Help
Feel free to contact me, or peruse the homepage.
–
Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org
Rule of Open-Source Programming #11:
When a developer says he will work on something, he or she means
“maybe”.
I think you will be bitten pretty fast with the current approach of
doing only text substitutions while translating HTML into Textile.
For example, HTML is pretty insensitive to some spaces in between:
<h1>main title</h1> <h2>subtitle</h2>
is just as good as
<h1>main title</h1>
<h2>subtitle</h2>
But in Textile,
h1. main title
h2. subtitle
is not the same as
h1. main title h2. subtitle
or even
h1. main title
h2. subtitle
In the current implementation, ClothRed behaves like that:
$ irb
irb(main):001:0> require ‘rubygems’
=> true
irb(main):002:0> require ‘clothred’
=> true
irb(main):003:0> t = ClothRed.new(“
Foo
Bar
”)
=> “
Foo Bar
”
irb(main):004:0> t.to_textile
=> “h1. Foo h2. Bar”
but that’s pretty easy to work around with the simple patch attached.
I just replaced “” as a substitution to HTML , , … by
“\n\n” producing the necessary paragraph breaks for Textile.
“test/test_headings.rb” had to be fixed and I also wrote
“test/test_misc.rb” as a test script for HTML with more than one tag
in it.
The substitution approach will not work quite right for HTML where
closing tags are missing. The algorithm will never understand when the
tags were closed. So this is somewhat limited currently to XHTML which
demands closing tags.
I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.
Thanks for the inspiring work.
Adriano F…
On Sat, Apr 14, 2007 at 01:24:02AM +0900, Phillip G. wrote:
I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.
Probably, but I’ll have to play with Hpricot first, to see if it can do
what is needed (after a short skim, it allows to convert HTML4 into
XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
dependency free as I can.
I think what you’re going to find is, parsing tag-soup HTML is harder
than
you think - especially if your goal with ClothRed is to parse
arbitrary
tag-soup HTML from arbitrary sources.
Adriano F. wrote:
but that’s pretty easy to work around with the simple patch attached.
I just replaced “” as a substitution to HTML , , … by
“\n\n” producing the necessary paragraph breaks for Textile.
“test/test_headings.rb” had to be fixed and I also wrote
“test/test_misc.rb” as a test script for HTML with more than one tag
in it.
Thanks. I’ll add your patch ASAP, and make a new release.
The substitution approach will not work quite right for HTML where
closing tags are missing. The algorithm will never understand when the
tags were closed. So this is somewhat limited currently to XHTML which
demands closing tags.
And it’s buggy, I’ve noticed, as it ignores self-closing tags like
. I’ll fix that together with your patch.
I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.
Probably, but I’ll have to play with Hpricot first, to see if it can do
what is needed (after a short skim, it allows to convert HTML4 into
XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
dependency free as I can. So maybe I’ll redistribute Hpricot in non-gem
distributions (if that’s possible, haven’t checked Hpricot’s license
yet).
Thanks for the inspiring work.
No problem. I’ll need that library for my own ideas.
–
Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org
Rule of Open-Source Programming #48:
The number of items on a project’s to-do list always grows or remains
constant.