Stripping textile markup


#1

Hello,

I have come across certain situations where text marked up with textile
syntax needs to be displayed where HTML isn’t wanted. For example, in
the title element of a an HTML page.

In these situations, I would like a way of stripping away the textile
markup from a string and displaying it completely “naked” - without HTML
tags and, crucially, without the textile markup too.

So:

The quick brown “fox”:/fox.html jumps over the lazy dog’s tail.

becomes

The quick brown fox jumps over the lazy dog’s tail.

(N.B. In the example above, I still want my punctuation made pretty
“dog’s tail” should still become “dog’s tail” )

What’s the best way of doing this? If there isn’t an elegant way of
doing it, could Redcloth have a to_plaintext method?

Thanks,

  • James

#2

You could use Nokogiri:

require 'redcloth'
require 'nokogiri'

html = RedCloth.new(str).to_html
plaintext = Nokogiri::HTML.fragment(html).text

// Magnus H.


#3

Thanks Magnus!

Magnus H. wrote:

You could use Nokogiri:

require 'redcloth'
require 'nokogiri'

html = RedCloth.new(str).to_html
plaintext = Nokogiri::HTML.fragment(html).text

// Magnus H.

This works nicely. Have also found an example using Hpricot

http://wiki.github.com/hpricot/hpricot/hpricot-challenge (see under
“Strip All HTML Tags”).

I’m happy using this, but I’m still left wondering whether it would make
more sense for RedCloth to have an elegant “to_plaintext” method. It
seems better not to go through an HTML intermediate stage.

Best,

  • James

#4

That’s a simple, elegant solution! I was going to suggest writing your
own formatter that passed everything through without adding anything,
but that’s way too much work!


#5

You should probably add that as a TODO, since I’ve seen plenty of
sites who uses the Textile source as a simple plaintext-preview (where
a proper formatter would definitely be better).

// Magnus H.