Forum: Ruby on Rails truncating html text

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Kevin O. (Guest)
on 2005-12-23 06:00
I've got a fairly basic problem here that I'm hoping there is an easy
solution for.

I have a chunk of html code that I want to truncate to a given length...
say 20 characters or so.

If I use the 'truncate' helper function I end up with unbalanced tags.

For example.

<a href=www.someplace.com>A really long string of words</a>

becomes

<a href=www.someplace.com>A really long...

When run through the 'truncate' function, leaving off the closing tag,
causing untold trouble and chaos.  On top of that, the trunctate
function counts characters in the tag, so you end up getting somewhat
less than what you asked for.

So... is there a way to truncate html text properly?

By this I mean a function or set of functions that returns a chunk of
html with the tags properly closed and where the length of the text
outside the tags is the specified amount.
Ezra Z. (Guest)
on 2005-12-23 10:23
(Received via mailing list)
Kevin-

	How about this:

truncate(html_text.gsub(/(<[^>]+>)/, ''), 20)

	That will just do a naive regex to remove the html tags from
html_text and pass that in to truncate with a length of 20


Cheers-

-Ezra



On Dec 22, 2005, at 8:00 PM, Kevin O. wrote:

>
>
> removed_email_address@domain.invalid
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

-Ezra Z.
WebMaster
Yakima Herald-Republic Newspaper
removed_email_address@domain.invalid
509-577-7732
Andreas S. (Guest)
on 2005-12-23 10:43
Kevin O. wrote:
> I've got a fairly basic problem here that I'm hoping there is an easy
> solution for.
>
> I have a chunk of html code that I want to truncate to a given length...
> say 20 characters or so.
>
> If I use the 'truncate' helper function I end up with unbalanced tags.
>
> For example.
>
> <a href=www.someplace.com>A really long string of words</a>
>
> becomes
>
> <a href=www.someplace.com>A really long...
>
> When run through the 'truncate' function, leaving off the closing tag,
> causing untold trouble and chaos.  On top of that, the trunctate
> function counts characters in the tag, so you end up getting somewhat
> less than what you asked for.
>
> So... is there a way to truncate html text properly?

Try this:
http://www.bigbold.com/snippets/posts/show/295
Kevin O. (Guest)
on 2005-12-23 17:00
>
> Try this:
> http://www.bigbold.com/snippets/posts/show/295

@Ezra... I would like to retain the HTML formatting if possible.
Stripping them out would work, but then the formatting gets lost.  Not
ideal, but functional.

Closing the broken tags might work. I need to see how this works if a
tag gets chopped in half.

Something like "<a href=www.someplace.com>My tag link</a..." might make
that algorithm upset.  I'm still stuck with the fact that the truncated
length will be totally wrong.

Something to work with anyway, Thanks guys.

_Kevin
Justin F. (Guest)
on 2005-12-24 02:10
(Received via mailing list)
Kevin O. wrote:
>
> Something like "<a href=www.someplace.com>My tag link</a..." might make
> that algorithm upset.  I'm still stuck with the fact that the truncated
> length will be totally wrong.
>
> Something to work with anyway, Thanks guys.

Perhaps if you explained *why* you want to truncate an HTML string, that
would help...

regards

   Justin
Andreas S. (Guest)
on 2005-12-24 02:26
Kevin O. wrote:
>>
>> Try this:
>> http://www.bigbold.com/snippets/posts/show/295
>
> @Ezra... I would like to retain the HTML formatting if possible.
> Stripping them out would work, but then the formatting gets lost.  Not
> ideal, but functional.
>
> Closing the broken tags might work. I need to see how this works if a
> tag gets chopped in half.
>
> Something like "<a href=www.someplace.com>My tag link</a..." might make
> that algorithm upset.

A regex for removing open tags from the end should be quite trivial.

> I'm still stuck with the fact that the truncated
> length will be totally wrong.

You'll probably have to write your own truncate function with
String#scan and make it count only non-tag characters.
Kevin O. (Guest)
on 2005-12-24 03:24
Justin F. wrote:
> Perhaps if you explained *why* you want to truncate an HTML string, that
> would help...
>
> regards
>
>    Justin

@Justin:
The goal is to have an 'article' model.  I would like to have the 'list'
view generate a brief excerpt of the article body as a teaser.  For now
the text itself is being generated from text using textile.  I have
considered simply truncating the textile source and then generating html
from that, but you run into similar problems with unbalanced decorations
(sort of like my Christmas tree).

@Andreas, yes, removing the malformed tag at the end is easy.  The rest
of it is a bit tricky, but I am making progress.  It is a good learning
excercise for regex judo.

_Kevin
Justin F. (Guest)
on 2005-12-26 03:56
(Received via mailing list)
Kevin O. wrote:

> The goal is to have an 'article' model.  I would like to have the 'list'
> view generate a brief excerpt of the article body as a teaser.  For now
> the text itself is being generated from text using textile.  I have
> considered simply truncating the textile source and then generating html
> from that, but you run into similar problems with unbalanced decorations
> (sort of like my Christmas tree).

Thanks, that's useful. Have you looked at the feasibility of altering
the textile-to-html conversion, so that it works with a bound on the
number of content characters? On reaching the bound, it would just need
to emit closing tags for all currently unclosed HTML tags.

regards

   Justin
Kevin O. (Guest)
on 2005-12-26 06:18
Justin F. wrote:
>
> Thanks, that's useful. Have you looked at the feasibility of altering
> the textile-to-html conversion, so that it works with a bound on the
> number of content characters? On reaching the bound, it would just need
> to emit closing tags for all currently unclosed HTML tags.
>
> regards
>
>    Justin

Thanks, that's a good suggestion.  This may solve my immediate problem
so long as I continue to use textile.  However, I'm still interested in
finding a more general solution to the problem.

_Kevin
This topic is locked and can not be replied to.