Forum: Ruby on Rails truncating html text

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C8a634a01a2c4508360874bff7fb1a7f?d=identicon&s=25 Kevin Olbrich (Guest)
on 2005-12-23 05:00
I've got a fairly basic problem here that I'm hoping there is an easy
solution for.

I have a chunk of html code that I want to truncate to a given length...
say 20 characters or so.

If I use the 'truncate' helper function I end up with unbalanced tags.

For example.

<a href=www.someplace.com>A really long string of words</a>

becomes

<a href=www.someplace.com>A really long...

When run through the 'truncate' function, leaving off the closing tag,
causing untold trouble and chaos.  On top of that, the trunctate
function counts characters in the tag, so you end up getting somewhat
less than what you asked for.

So... is there a way to truncate html text properly?

By this I mean a function or set of functions that returns a chunk of
html with the tags properly closed and where the length of the text
outside the tags is the specified amount.
8e44c65ac5b896da534ef2440121c953?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2005-12-23 09:23
(Received via mailing list)
Kevin-

	How about this:

truncate(html_text.gsub(/(<[^>]+>)/, ''), 20)

	That will just do a naive regex to remove the html tags from
html_text and pass that in to truncate with a length of 20


Cheers-

-Ezra



On Dec 22, 2005, at 8:00 PM, Kevin Olbrich wrote:

>
>
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ezra@yakima-herald.com
509-577-7732
3a83969376c805ef5b6042191fdb0ff3?d=identicon&s=25 Andreas S. (andreas)
on 2005-12-23 09:43
Kevin Olbrich wrote:
> I've got a fairly basic problem here that I'm hoping there is an easy
> solution for.
>
> I have a chunk of html code that I want to truncate to a given length...
> say 20 characters or so.
>
> If I use the 'truncate' helper function I end up with unbalanced tags.
>
> For example.
>
> <a href=www.someplace.com>A really long string of words</a>
>
> becomes
>
> <a href=www.someplace.com>A really long...
>
> When run through the 'truncate' function, leaving off the closing tag,
> causing untold trouble and chaos.  On top of that, the trunctate
> function counts characters in the tag, so you end up getting somewhat
> less than what you asked for.
>
> So... is there a way to truncate html text properly?

Try this:
http://www.bigbold.com/snippets/posts/show/295
C8a634a01a2c4508360874bff7fb1a7f?d=identicon&s=25 Kevin Olbrich (Guest)
on 2005-12-23 16:00
>
> Try this:
> http://www.bigbold.com/snippets/posts/show/295

@Ezra... I would like to retain the HTML formatting if possible.
Stripping them out would work, but then the formatting gets lost.  Not
ideal, but functional.

Closing the broken tags might work. I need to see how this works if a
tag gets chopped in half.

Something like "<a href=www.someplace.com>My tag link</a..." might make
that algorithm upset.  I'm still stuck with the fact that the truncated
length will be totally wrong.

Something to work with anyway, Thanks guys.

_Kevin
Eea7ad39737b0dbf3de38874e0a6c7d8?d=identicon&s=25 Justin Forder (Guest)
on 2005-12-24 01:10
(Received via mailing list)
Kevin Olbrich wrote:
>
> Something like "<a href=www.someplace.com>My tag link</a..." might make
> that algorithm upset.  I'm still stuck with the fact that the truncated
> length will be totally wrong.
>
> Something to work with anyway, Thanks guys.

Perhaps if you explained *why* you want to truncate an HTML string, that
would help...

regards

   Justin
3a83969376c805ef5b6042191fdb0ff3?d=identicon&s=25 Andreas S. (andreas)
on 2005-12-24 01:26
Kevin Olbrich wrote:
>>
>> Try this:
>> http://www.bigbold.com/snippets/posts/show/295
>
> @Ezra... I would like to retain the HTML formatting if possible.
> Stripping them out would work, but then the formatting gets lost.  Not
> ideal, but functional.
>
> Closing the broken tags might work. I need to see how this works if a
> tag gets chopped in half.
>
> Something like "<a href=www.someplace.com>My tag link</a..." might make
> that algorithm upset.

A regex for removing open tags from the end should be quite trivial.

> I'm still stuck with the fact that the truncated
> length will be totally wrong.

You'll probably have to write your own truncate function with
String#scan and make it count only non-tag characters.
C8a634a01a2c4508360874bff7fb1a7f?d=identicon&s=25 Kevin Olbrich (Guest)
on 2005-12-24 02:24
Justin Forder wrote:
> Perhaps if you explained *why* you want to truncate an HTML string, that
> would help...
>
> regards
>
>    Justin

@Justin:
The goal is to have an 'article' model.  I would like to have the 'list'
view generate a brief excerpt of the article body as a teaser.  For now
the text itself is being generated from text using textile.  I have
considered simply truncating the textile source and then generating html
from that, but you run into similar problems with unbalanced decorations
(sort of like my Christmas tree).

@Andreas, yes, removing the malformed tag at the end is easy.  The rest
of it is a bit tricky, but I am making progress.  It is a good learning
excercise for regex judo.

_Kevin
Eea7ad39737b0dbf3de38874e0a6c7d8?d=identicon&s=25 Justin Forder (Guest)
on 2005-12-26 02:56
(Received via mailing list)
Kevin Olbrich wrote:

> The goal is to have an 'article' model.  I would like to have the 'list'
> view generate a brief excerpt of the article body as a teaser.  For now
> the text itself is being generated from text using textile.  I have
> considered simply truncating the textile source and then generating html
> from that, but you run into similar problems with unbalanced decorations
> (sort of like my Christmas tree).

Thanks, that's useful. Have you looked at the feasibility of altering
the textile-to-html conversion, so that it works with a bound on the
number of content characters? On reaching the bound, it would just need
to emit closing tags for all currently unclosed HTML tags.

regards

   Justin
C8a634a01a2c4508360874bff7fb1a7f?d=identicon&s=25 Kevin Olbrich (Guest)
on 2005-12-26 05:18
Justin Forder wrote:
>
> Thanks, that's useful. Have you looked at the feasibility of altering
> the textile-to-html conversion, so that it works with a bound on the
> number of content characters? On reaching the bound, it would just need
> to emit closing tags for all currently unclosed HTML tags.
>
> regards
>
>    Justin

Thanks, that's a good suggestion.  This may solve my immediate problem
so long as I continue to use textile.  However, I'm still interested in
finding a more general solution to the problem.

_Kevin
This topic is locked and can not be replied to.