ClothRed 0.3.0 released

Or: TTD is fun.

= ClothRed HTML 2 Textile converter

== What it is

A library to convert HTML into Textile markup for use, for example, with
RedCloth.

== Requirements

All you need is Ruby.

== Get it

Available as a gem on RubyForge:
gem install ClothRed

Or download from RubyForge:
http://rubyforge.org/frs/?group_id=3427

Or get the source:
svn checkout svn://viewvc.rubyforge.mmmultiworks.com/var/svn/clothred

== New in this release:

  • Support for some HTML entities
  • Support for tables

== Features

This is alpha software, and only a few Textile rules have been
implemented yet:

  • font markup and weight (, , …)
  • text formatting (, , ,)
  • Support for headings
  • Support for paragraphs and
  • Support for Textile entities

== Usage

require ‘clothred’

text = ClothRed.new(“Bold HTML!”)
text.to_textile

== Get Help
Feel free to contact me, or peruse the homepage.


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org

Rule of Open-Source Programming #11:

When a developer says he will work on something, he or she means
“maybe”.

I think you will be bitten pretty fast with the current approach of
doing only text substitutions while translating HTML into Textile.

For example, HTML is pretty insensitive to some spaces in between:

<h1>main title</h1> <h2>subtitle</h2>

is just as good as

<h1>main title</h1>
<h2>subtitle</h2>

But in Textile,

h1. main title

h2. subtitle

is not the same as

h1. main title h2. subtitle

or even

h1. main title
h2. subtitle

In the current implementation, ClothRed behaves like that:

$ irb
irb(main):001:0> require ‘rubygems’
=> true
irb(main):002:0> require ‘clothred’
=> true
irb(main):003:0> t = ClothRed.new(“

Foo

Bar

”)
=> “

Foo

Bar


irb(main):004:0> t.to_textile
=> “h1. Foo h2. Bar”

but that’s pretty easy to work around with the simple patch attached.
I just replaced “” as a substitution to HTML , , … by
“\n\n” producing the necessary paragraph breaks for Textile.
“test/test_headings.rb” had to be fixed and I also wrote
“test/test_misc.rb” as a test script for HTML with more than one tag
in it.

The substitution approach will not work quite right for HTML where
closing tags are missing. The algorithm will never understand when the
tags were closed. So this is somewhat limited currently to XHTML which
demands closing tags.

I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.

Thanks for the inspiring work.

Adriano F…

On Sat, Apr 14, 2007 at 01:24:02AM +0900, Phillip G. wrote:

I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.

Probably, but I’ll have to play with Hpricot first, to see if it can do
what is needed (after a short skim, it allows to convert HTML4 into
XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
dependency free as I can.

I think what you’re going to find is, parsing tag-soup HTML is harder
than
you think - especially if your goal with ClothRed is to parse
arbitrary
tag-soup HTML from arbitrary sources.

Adriano F. wrote:

but that’s pretty easy to work around with the simple patch attached.
I just replaced “” as a substitution to HTML , , … by
“\n\n” producing the necessary paragraph breaks for Textile.
“test/test_headings.rb” had to be fixed and I also wrote
“test/test_misc.rb” as a test script for HTML with more than one tag
in it.

Thanks. I’ll add your patch ASAP, and make a new release.

The substitution approach will not work quite right for HTML where
closing tags are missing. The algorithm will never understand when the
tags were closed. So this is somewhat limited currently to XHTML which
demands closing tags.

And it’s buggy, I’ve noticed, as it ignores self-closing tags like
. I’ll fix that together with your patch.

I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.

Probably, but I’ll have to play with Hpricot first, to see if it can do
what is needed (after a short skim, it allows to convert HTML4 into
XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
dependency free as I can. So maybe I’ll redistribute Hpricot in non-gem
distributions (if that’s possible, haven’t checked Hpricot’s license
yet).

Thanks for the inspiring work.

No problem. I’ll need that library for my own ideas. :wink:


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org

Rule of Open-Source Programming #48:

The number of items on a project’s to-do list always grows or remains
constant.