Forum: Redcloth Stripping textile markup

Bfcbd6a9793ec2d58d68dbbe4f0f28e4?d=identicon&s=25 James King (jamesking)
on 2010-08-05 18:49
Hello,

I have come across certain situations where text marked up with textile
syntax needs to be displayed where HTML isn't wanted. For example, in
the title element of a an HTML page.

In these situations, I would like a way of stripping away the textile
markup from a string and displaying it completely "naked" - without HTML
tags and, crucially, without the textile markup too.

So:

>> The _quick_ brown "fox":/fox.html jumps over the *lazy* dog's tail.

becomes

>> The quick brown fox jumps over the lazy dog’s tail.

(N.B. In the example above, I still want my punctuation made pretty
"dog's tail" should still become "dog’s tail" )

What's the best way of doing this? If there isn't an elegant way of
doing it, could Redcloth have a to_plaintext method?

Thanks,

- James
B397b498cc02503a2d86c86176f7fd3e?d=identicon&s=25 Magnus Holm (judofyr)
on 2010-08-05 19:12
(Received via mailing list)
You could use Nokogiri:

    require 'redcloth'
    require 'nokogiri'

    html = RedCloth.new(str).to_html
    plaintext = Nokogiri::HTML.fragment(html).text

// Magnus Holm
Bfcbd6a9793ec2d58d68dbbe4f0f28e4?d=identicon&s=25 James King (jamesking)
on 2010-08-05 19:54
Thanks Magnus!

Magnus Holm wrote:
> You could use Nokogiri:
>
>     require 'redcloth'
>     require 'nokogiri'
>
>     html = RedCloth.new(str).to_html
>     plaintext = Nokogiri::HTML.fragment(html).text
>
> // Magnus Holm

This works nicely. Have also found an example using Hpricot

http://wiki.github.com/hpricot/hpricot/hpricot-challenge (see under
"Strip All HTML Tags").

I'm happy using this, but I'm still left wondering whether it would make
more sense for RedCloth to have an elegant "to_plaintext" method. It
seems better not to go through an HTML intermediate stage.

Best,

- James
A50dcaaf8e545e6cc1fb4e32919be6ad?d=identicon&s=25 Jason Garber (jgarber)
on 2010-08-08 13:24
(Received via mailing list)
That's a simple, elegant solution!  I was going to suggest writing your
own formatter that passed everything through without adding anything,
but that's way too much work!
B397b498cc02503a2d86c86176f7fd3e?d=identicon&s=25 Magnus Holm (judofyr)
on 2010-08-10 22:13
(Received via mailing list)
You should probably add that as a TODO, since I've seen plenty of
sites who uses the Textile source as a simple plaintext-preview (where
a proper formatter would definitely be better).

// Magnus Holm
This topic is locked and can not be replied to.