Forum: Ruby on Rails Ruby "htmlentities" replacement: code review please!

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Cf7e5e4b5ca573eec214191fac420a2f?d=identicon&s=25 Dave Silvester (Guest)
on 2006-01-18 17:42
(Received via mailing list)
Hi Railers,

For some time now I've been looking for a decent Rails equivalent of
PHP's
"htmlentities" command, because ERB's html_escape (or more commonly
called as
just "h", eg. <%=h @somevariable %> ) just doesn't go far enough for me.

Back in PHP land, I actually had an extended version of the htmlentities
command to deal with all kinds of crazy characters that appear if you
copy
and paste into a CMS from Word.  So anyway, given the apparent lack of a
function to do this in Rails, and because I'm not particularly impressed
with
ERB's html_escape, I decided to do something about it and roll my own.

Here's my code: http://rafb.net/paste/results/9lI1hc62.html

It defines a htmlentities function, designed to be included in a helper,
or in
your environment (that's what I'm doing with it).  Then it overrides the
h
function to call htmlentities instead of ERB's html_escape, hopefully
meaning
you can just drop this in and your app will start using it.

I'm offering this up for several reasons:

1. Because I know I'm not the only person who wants such functionality,
and
can't seem to find anyone else who has written this yet.

2. For code review - is there anything wrong with it?  Anything missing?
Anything that could be done more efficiently?  I'm hardly a Ruby / Rails
guru, so would really appreciate some second opinions here!

I haven't used this on a production site yet (only wrote it this
morning) but
my thinking is that this code coupled with Rails' caching should solve
the
problem in a nice and efficient manner.  This probably isn't the ideal
solution, as in for sites in non-western alphabets (eg. Japanese,
Hebrew,
Arabic etc.) it probably doesn't help, but the thinking is that this
command
should be at least as good as PHP's htmlentities command for most
Western
alphabet users... I hope!  :-)

So, any comments?  Feel free to use the code anywhere you like, but
possibly
just wait until a few other folks have looked it over, just in case
there's
something heinously wrong with it!  It comes with no guarantee of
suitability
for any purpose whatsoever, and you use it entirely at your own risk.

Regardless, if folks think it's useful, I'll put it online somewhere
more
permanent than RAFB NoPaste!

Cheers,

~Dave

--

Dave Silvester
Rent-A-Monkey Website Development
Web: http://www.rentamonkey.com/
4bd34a2216dc8bdbf1f017f64e4d59e8?d=identicon&s=25 Kyle Maxwell (Guest)
on 2006-01-19 00:32
(Received via mailing list)
On 1/18/06, Dave Silvester <dave@rentamonkey.com> wrote:
> ERB's html_escape, I decided to do something about it and roll my own.
> 1. Because I know I'm not the only person who wants such functionality, and
> Arabic etc.) it probably doesn't help, but the thinking is that this command
>
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

You might want to post this to rails-core.

--
Kyle Maxwell
Chief Technologist
E Factor Media // FN Interactive
kyle@efactormedia.com
1-866-263-3261
Dce47278b4de2597c378881b482d9cb6?d=identicon&s=25 Izidor Jerebic (Guest)
on 2006-01-19 12:43
(Received via mailing list)
On 18.1.2006, at 17:05, Dave Silvester wrote:

> For some time now I've been looking for a decent Rails equivalent
> of PHP's
> "htmlentities" command, because ERB's html_escape (or more commonly
> called as
> just "h", eg. <%=h @somevariable %> ) just doesn't go far enough
> for me.

Why not? If you have your encodings set-up correctly in web pages or
http headers, everything should just work. What doesn't work?

Besides, your solution assumes one specific single-byte encoding
(there are many single-byte western encodings). This is very wrong...


izidor
Cf7e5e4b5ca573eec214191fac420a2f?d=identicon&s=25 Dave Silvester (Guest)
on 2006-01-19 14:56
(Received via mailing list)
On Thursday 19 Jan 2006 11:05, Izidor Jerebic wrote:
> Why not? If you have your encodings set-up correctly in web pages or
> http headers, everything should just work. What doesn't work?
> Besides, your solution assumes one specific single-byte encoding
> (there are many single-byte western encodings). This is very wrong...

OK, fair enough, I do see your point - maybe this is the wrong approach
to the
problem, and a throwback to my days of PHP.

I nearly always use iso-8859-1 for my sites, which means that when
people
paste text into any kind of CMS from Word (with it's curly quotes, long
hyphens etc.) unless I strip those out or replace them, the page
contains
invalid characters.

I guess I should just use UTF-8 instead and ditch my allegiance to
iso-8859-1.

See, this is why I asked for the code review... thanks for the shove in
the
right direction, and d'oh that for some stupid reason I didn't think of
that
sooner, guess I was just stuck in my old ways there!  :-D

~Dave

--

Dave Silvester
Rent-A-Monkey Website Development
Web: http://www.rentamonkey.com/
Ad7805c9fcc1f13efc6ed11251a6c4d2?d=identicon&s=25 Alex Young (Guest)
on 2006-01-19 15:02
(Received via mailing list)
Dave Silvester wrote:
> I guess I should just use UTF-8 instead and ditch my allegiance to iso-8859-1.
>
You'd probably be better off shifting to CP1252 (or 1250, come to think
of it) - it's pretty much ISO-8859-1 with multibyte tacked on the side,
so your existing data should be OK.
This topic is locked and can not be replied to.