Special characters nightmare

dubstep · May 4, 2011, 1:05pm

Hi,

I working on an application that is localized in several languages and
I’m facing some problems with special characters.

I have a name field that can take inputs such as:

Validación
Tom’s card

So those two string above contain ’ and ´ spcecial characters.

In the application I need to feed those to a javascript function using
onclick, so I use:

onclick=“remove(’<%= card.name %>’)”

Doing it that way, the HTML results in:

onclick=“remove(‘Validación’)”
onclick=“remove(‘Tom’s card’)”

You can see that the first one is ok, but the second on is not because
of the ’ in between the m and the s which causes the javascript
to fail.

To avoid that I can use:

onclick=“remove(’<%= CGI::escape(tag.name) %>’)”

And that results in:

onclick=“remove(‘Tom%27s+card’)”
onclick=“remove(‘Validaci%C3%B3n’)”

The name in the remove function is used to show a confimation so I have
show the name back to the user.

So I use javascript method unescape(card_name.replace(/+/g, " ")) to
convert the %27 and + back to ’ and space to show the original. And that
goes ok.

BUT now the other one is converted to ValidaciÃ³n
Note the Ã³

I can’t find a way to fix one without breaking the other.
Any suggestions to solve this problem?

Cheers.

comopasta · May 4, 2011, 3:16pm

On Wed, 04 May 2011 13:05:26 +0200, comopasta Gr [email protected]
wrote:

onclick=“remove(‘Tom%27s+card’)”
onclick=“remove(‘Validaci%C3%B3n’)”
:
BUT now the other one is converted to ValidaciÃ³n
Note the Ã³

There is two parts to this. Firstly you are using Percent-encoding (aka
URL encoding) which maps spaces to plus signs and characters to %XY
tokens
wiht X and Y being a hexadecimal digit. This works as intended. Secondly
you are encoding the characters according to utf-8 which maps “'” to %27
and “ó” to %C3%B3. However, for displaying ISO-8859-15, Windows-1252
(aka
CP 1252) is used which interprets %C3 as Â and %B3 as ³.

Try either switching to utf-8 or - if this is not possible - using %F3
as
the proper encoding of “ó” for the given character set.

Jupp

comopasta · May 4, 2011, 3:30pm

Hi,

onclick=“remove(‘Tom’s card’)”

You can see that the first one is ok, but the second on is not because
of the ’ in between the m and the s which causes the javascript
to fail.

One way to avoiding this is escaping the apostrophe character (’) by
using
a backslash character ().

Please try following code:
onclick=“remove(’<%= card.name.gsub( “’” ){ “\’” } %>’)”

comopasta · May 4, 2011, 3:48pm

Moreover, you should escape the double quotation characters (") and
the ampersand characters (&) in a HTML attribute string enclosed by
double quotation characters.
So you should use a ERB::Util.h method [1].

require ‘ERB’
include ERB::Util

onclick=“remove(‘<%=h card.name.gsub( "’” ){ “\'” } %>')"

(note the h method next to ‘=’)

[1]
http://www.ruby-doc.org/stdlib/libdoc/erb/rdoc/classes/ERB/Util.html#M000869

Regards,

comopasta · May 4, 2011, 4:31pm

Sorry, there is a mistake.

incorrect : require ‘ERB’
correct : require ‘erb’

comopasta · May 5, 2011, 12:32pm

Y. NOBUOKA wrote in post #996593:

Hi,

onclick=“remove(‘Tom’s card’)”

You can see that the first one is ok, but the second on is not because
of the ’ in between the m and the s which causes the javascript
to fail.

One way to avoiding this is escaping the apostrophe character (’) by
using
a backslash character ().

Please try following code:
onclick=“remove(’<%= card.name.gsub( “’” ){ “\’” } %>’)”

That I’ve done and it goes ok without disturbing the accents in spanish.

Thanks a lot.

comopasta · May 5, 2011, 12:35pm

Josef ‘Jupp’ Schugt wrote in post #996590:

On Wed, 04 May 2011 13:05:26 +0200, comopasta Gr [email protected]
wrote:

onclick=“remove(‘Tom%27s+card’)”
onclick=“remove(‘Validaci%C3%B3n’)”
:
BUT now the other one is converted to ValidaciÃ³n
Note the Ã³

There is two parts to this. Firstly you are using Percent-encoding (aka
URL encoding) which maps spaces to plus signs and characters to %XY
tokens
wiht X and Y being a hexadecimal digit. This works as intended. Secondly
you are encoding the characters according to utf-8 which maps “'” to %27
and “ó” to %C3%B3. However, for displaying ISO-8859-15, Windows-1252
(aka
CP 1252) is used which interprets %C3 as Â and %B3 as ³.

Try either switching to utf-8 or - if this is not possible - using %F3
as
the proper encoding of “ó” for the given character set.

Jupp

Hi, thanks for the tip. As replied earlier I managed to solve the issue
with gsub. I just wanted to comment that utf-8 should be in use already.
At least based on the meta info generated:

Thanks again.