Forum: Ruby ClothRed 0.3.0 released

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2007-04-12 17:54
(Received via mailing list)
Or: TTD is fun.

= ClothRed HTML 2 Textile converter

== What it is

A library to convert HTML into Textile markup for use, for example, with
RedCloth.


== Requirements

All you need is Ruby.

== Get it

Available as a gem on RubyForge:
gem install ClothRed

Or download from RubyForge:
http://rubyforge.org/frs/?group_id=3427

Or get the source:
svn checkout svn://viewvc.rubyforge.mmmultiworks.com/var/svn/clothred

== New in this release:
   * Support for some HTML entities
   * Support for tables

== Features

This is alpha software, and only a few Textile rules have been
implemented yet:
  * font markup and weight (<b>, <strong>, ...)
  * text formatting (<sub>, <sup>, <ins>,<del>)
  * Support for headings
  * Support for paragraphs and <blockquote>
  * Support for Textile entities

== Usage

require 'clothred'

text = ClothRed.new("<b>Bold</b> <em>HTML</em>!")
text.to_textile

== Get Help
Feel free to contact me, or peruse the homepage.

  * http://clothred.rubyforge.org/
  * http://rubyforge.org/projects/clothred/

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org

Rule of Open-Source Programming #11:

When a developer says he will work on something, he or she means
"maybe".
9a1a4c7f4da6961ef3f6503d7ff33a53?d=identicon&s=25 Adriano Ferreira (Guest)
on 2007-04-13 15:42
(Received via mailing list)
I think you will be bitten pretty fast with the current approach of
doing only text substitutions while translating HTML into Textile.

For example, HTML is pretty insensitive to some spaces in between:

    <h1>main title</h1> <h2>subtitle</h2>

is just as good as

    <h1>main title</h1>
    <h2>subtitle</h2>

But in Textile,

    h1. main title

    h2. subtitle

is not the same as

    h1. main title h2. subtitle

or even

    h1. main title
    h2. subtitle

In the current implementation, ClothRed behaves like that:

$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'clothred'
=> true
irb(main):003:0> t = ClothRed.new("<h1>Foo</h1> <h2>Bar</h2>")
=> "<h1>Foo<h1> <h2>Bar</h2>"
irb(main):004:0> t.to_textile
=> "h1. Foo  h2. Bar"

but that's pretty easy to work around with the simple patch attached.
I just replaced "" as a substitution to HTML </h1>, </h2>, ... by
"\n\n" producing the necessary paragraph breaks for Textile.
"test/test_headings.rb" had to be fixed and I also wrote
"test/test_misc.rb" as a test script for HTML with more than one tag
in it.

The substitution approach will not work quite right for HTML where
closing tags are missing. The algorithm will never understand when the
tags were closed. So this is somewhat limited currently to XHTML which
demands closing tags.

I think the suggestion of using a HTML parser (like Hpricot) to do
this conversion will impose itself pretty soon.

Thanks for the inspiring work.

Adriano Ferreira.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2007-04-13 18:24
(Received via mailing list)
Adriano Ferreira wrote:

> but that's pretty easy to work around with the simple patch attached.
> I just replaced "" as a substitution to HTML </h1>, </h2>, ... by
> "\n\n" producing the necessary paragraph breaks for Textile.
> "test/test_headings.rb" had to be fixed and I also wrote
> "test/test_misc.rb" as a test script for HTML with more than one tag
> in it.

Thanks. I'll add your patch ASAP, and make a new release.

> The substitution approach will not work quite right for HTML where
> closing tags are missing. The algorithm will never understand when the
> tags were closed. So this is somewhat limited currently to XHTML which
> demands closing tags.

And it's buggy, I've noticed, as it ignores self-closing tags like <br
/>. I'll fix that together with your patch.

> I think the suggestion of using a HTML parser (like Hpricot) to do
> this conversion will impose itself pretty soon.

Probably, but I'll have to play with Hpricot first, to see if it can do
what is needed (after a short skim, it allows to convert HTML4 into
XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
dependency free as I can. So maybe I'll redistribute Hpricot in non-gem
distributions (if that's possible, haven't checked Hpricot's license
yet).

> Thanks for the inspiring work.

No problem. I'll need that library for my own ideas. ;)

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org

Rule of Open-Source Programming #48:

The number of items on a project's to-do list always grows or remains
constant.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-04-13 21:06
(Received via mailing list)
On Sat, Apr 14, 2007 at 01:24:02AM +0900, Phillip Gawlowski wrote:
> >I think the suggestion of using a HTML parser (like Hpricot) to do
> >this conversion will impose itself pretty soon.
>
> Probably, but I'll have to play with Hpricot first, to see if it can do
> what is needed (after a short skim, it allows to convert HTML4 into
> XHTML1.0, which will make it easier). Trouble is, I want ClothRed as
> dependency free as I can.

I think what you're going to find is, parsing tag-soup HTML is harder
than
you think - especially if your goal with ClothRed is to parse
*arbitrary*
tag-soup HTML from arbitrary sources.
This topic is locked and can not be replied to.