Improving on: truncate(..) + rendered text?

Hello all,

In a summary page I need to show the 1st 100 chars of textilized
messages.
Problem: truncate(…) would often cut in the middle of html tags =>
random result.

My first idea was to “repair” the broken text with Hpricot (as I use
it elsewhere in the project), but it’s not perfect:

abcd</h would give

abcd</h

(I also use white_list to clean the …)

I guess there are only 2 alternatives:

  • a smart html_truncate(…)
    or
  • “unrender” the text (html => plain text)

Has anybody explored those directions?

TIA

Alain R…

Here is my ‘improved’ truncate that transforms html to text:

In sequence it :

  • sanitizes (with white_list) and remove images
  • strips html tags => you’re only left with plain text
  • truncates
  • simply_format => you get newlines and paragraphs back.

code:

WhiteListHelper.bad_tags = %w(script img)
def strip_and_truncate(text, length = 30, truncate_string = “…”)
if text.nil? then return end
snip = truncate(strip_tags(white_list(text)), length,
truncate_string)
simple_format(snip)
end