Forum: Redcloth Stripping textile markup

Posted by James King (jamesking)
on 2010-08-05 18:49
Hello,

I have come across certain situations where text marked up with textile
syntax needs to be displayed where HTML isn't wanted. For example, in
the title element of a an HTML page.

In these situations, I would like a way of stripping away the textile
markup from a string and displaying it completely "naked" - without HTML
tags and, crucially, without the textile markup too.

So:

>> The _quick_ brown "fox":/fox.html jumps over the *lazy* dog's tail.

becomes

>> The quick brown fox jumps over the lazy dog’s tail.

(N.B. In the example above, I still want my punctuation made pretty
"dog's tail" should still become "dog’s tail" )

What's the best way of doing this? If there isn't an elegant way of
doing it, could Redcloth have a to_plaintext method?

Thanks,

- James
Posted by Magnus Holm (judofyr)
on 2010-08-05 19:12
(Received via mailing list)
You could use Nokogiri:

    require 'redcloth'
    require 'nokogiri'

    html = RedCloth.new(str).to_html
    plaintext = Nokogiri::HTML.fragment(html).text

// Magnus Holm
Posted by James King (jamesking)
on 2010-08-05 19:54
Thanks Magnus!

Magnus Holm wrote:
> You could use Nokogiri:
> 
>     require 'redcloth'
>     require 'nokogiri'
> 
>     html = RedCloth.new(str).to_html
>     plaintext = Nokogiri::HTML.fragment(html).text
> 
> // Magnus Holm

This works nicely. Have also found an example using Hpricot

http://wiki.github.com/hpricot/hpricot/hpricot-challenge (see under 
"Strip All HTML Tags").

I'm happy using this, but I'm still left wondering whether it would make 
more sense for RedCloth to have an elegant "to_plaintext" method. It 
seems better not to go through an HTML intermediate stage.

Best,

- James
Posted by Jason Garber (jgarber)
on 2010-08-08 13:24
(Received via mailing list)
That's a simple, elegant solution!  I was going to suggest writing your 
own formatter that passed everything through without adding anything, 
but that's way too much work!
Posted by Magnus Holm (judofyr)
on 2010-08-10 22:13
(Received via mailing list)
You should probably add that as a TODO, since I've seen plenty of
sites who uses the Textile source as a simple plaintext-preview (where
a proper formatter would definitely be better).

// Magnus Holm
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.