Forum: Rails I18n Help with str.sub(pattern, replacement) and french characters

Aab060b3da7fa08ba0b531e2204c6469?d=identicon&s=25 Mitchell Gould (mitch_newbie)
on 2011-01-23 16:04
I am trying to search my database which has product names with french
accents.  They are encoded using the html entity codes such as &eactue;
etc.

If a user enters a word with a french accent in the search box I must
convert it to the html entity code so it can be found in the database.

So I thought to use str.sub(pattern, replacement) => new_str

However when I try this using product.sub('é','é') for example it
results in the following:

find(:all, :select => 'product_id, name', :order => "name", :conditions
=> ["name like ? and locale =?", "%#{product.sub('é','é')}%",
I18n.locale])

When I enter 'é' in the seach box I get the following:

   SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')

so it does not replace the 'é' with '&eactue'

But if  I change the letter from 'é' to 'e'  and do a search for 'e' I
get the following:

   SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')

so the replacement works.

Can anyone explain why it won't work for the character with french
accent?

Thank you in advance.
Mitch
4a551074ddba4460f95d011c47190d0e?d=identicon&s=25 Henrik --- (malesca)
on 2011-01-23 16:28
(Received via mailing list)
Ideally, you shouldn't have HTML entities in the database. If you need
them
in your HTML (and you don't, if you set an explicit encoding, except for
things like &<>) then you should add them outside the database.

If you do have "" stored in the database, not as an entity, I believe
MySQL's "LIKE" will be accent-insensitive by default, unless you use
"COLLATE utf8_bin" (google for details).

Note that if you use "sub", you will only replace the first occurrence
in
the string. You probably want "gsub".

And if you do something like "blh".sub("", "&eacute;") and it doesn't
replace the "", the issue could be how the "" is represented. In UTF-8,
accented characters can be represented either composed as a single glyph
("latin small letter e with acute") or decomposed as two glyphs: "latin
small letter e" + "combining acute accent". So if your string contains
the
first type of  and your sub/gsub tries to replace the other type, it
won't
work. You can normalize the string to ensure everything is composed or
decomposed, but it would be better not to have entities in the database.
Aab060b3da7fa08ba0b531e2204c6469?d=identicon&s=25 Mitchell Gould (mitch_newbie)
on 2011-01-24 00:14
Henrik --- wrote in post #976928:
> Ideally, you shouldn't have HTML entities in the database. If you need
> them
> in your HTML (and you don't, if you set an explicit encoding, except for
> things like &<>) then you should add them outside the database.
>
> If you do have "" stored in the database, not as an entity, I believe
> MySQL's "LIKE" will be accent-insensitive by default, unless you use
> "COLLATE utf8_bin" (google for details).
>
> Note that if you use "sub", you will only replace the first occurrence
> in
> the string. You probably want "gsub".
>
> And if you do something like "blh".sub("", "&eacute;") and it doesn't
> replace the "", the issue could be how the "" is represented. In UTF-8,
> accented characters can be represented either composed as a single glyph
> ("latin small letter e with acute") or decomposed as two glyphs: "latin
> small letter e" + "combining acute accent". So if your string contains
> the
> first type of  and your sub/gsub tries to replace the other type, it
> won't
> work. You can normalize the string to ensure everything is composed or
> decomposed, but it would be better not to have entities in the database.

Ok I will take your advise and remove the html entities from the
database.
The reason I put them in was because even with explicit encoding I was
not getting the characters to show properly. I was getting a black
triangle with a question mark.

Could you assist me on how to encode the web page so that it shows the
accents. I thought you just use UTF-8?

Thanks for your help. I really appreciate it.
7e565a1681dc0137f43826aa840686af?d=identicon&s=25 "Andrés Mejía" <andmej@gmail.com> (Guest)
on 2011-01-24 00:20
(Received via mailing list)
Yes, encode the file in UTF-8 and add a tag like this on your head
section:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
4a551074ddba4460f95d011c47190d0e?d=identicon&s=25 Henrik --- (malesca)
on 2011-01-24 00:22
(Received via mailing list)
On Mon, Jan 24, 2011 at 00:14, Mitchell Gould <lists@ruby-forum.com>
wrote:
>
> Ok I will take your advise and remove the html entities from the
> database.
> The reason I put them in was because even with explicit encoding I was
> not getting the characters to show properly. I was getting a black
> triangle with a question mark.
>
> Could you assist me on how to encode the web page so that it shows the
> accents. I thought you just use UTF-8?
>

Check what your browser thinks the encoding is. Check that UTF-8 is
declared
in the HTTP headers or a meta element (and if they disagree, I'm not
entirely sure what goes - research that).
http://htmlpurifier.org/docs/enduser-utf8.html has some info.

Also ensure the font you're using can handle that glyph. I would guess
most
fonts can display . But if everything else looks right, try some
standard
font like Times and see what happens.
Aab060b3da7fa08ba0b531e2204c6469?d=identicon&s=25 Mitchell Gould (mitch_newbie)
on 2011-01-24 08:39
I removed some HTML entities from my database to test the effect. I made
sure my web page is UTF-8 encoded.

Now instead of "électronique" I get  name: "\xC9lectronic"  where
the"\xC9" displays like a black triangle with a "?" in it.

I also changed the font to times.

I read up and learned that MYSQL might be delivering the characters in a
format other than UTF-8.

I changed my database, table, and field to be UTF-8.

I still get the same problem as stated above.

What gives?

Thanks in advance

MItch
Aab060b3da7fa08ba0b531e2204c6469?d=identicon&s=25 Mitchell Gould (mitch_newbie)
on 2011-01-24 08:55
Hi,
I figured it all out.  I need to explicitly tell Rails that the database
is

using utf8 encoding by putting the following in the database.yml file



  encoding: utf8

now it displays perfectly.

I hope this is still in line with best practices as I don't want to mess
this up again.

Thanks

MItch
B08130c062fb41479f7e1c3b33e75a7c?d=identicon&s=25 top 3. (top_3)
on 2013-08-09 06:19
منتديات توب عرب للمبدعين احساس منتديات عامة
افضل المنتديات العربية
http://www.top3rab.mrsaal.com
http://www.top3rab.mrsaal.com/forums/toparab10
http://www.top3rab.mrsaal.com/forums/toparab14
http://www.top3rab.mrsaal.com/forums/toparab41
http://www.top3rab.mrsaal.com/forums/toparab47
http://www.top3rab.mrsaal.com/forums/toparab54
http://www.top3rab.mrsaal.com/forums/toparab49
http://www.top3rab.mrsaal.com/forums/toparab49
http://www.top3rab.mrsaal.com/forums/toparab48
http://www.top3rab.mrsaal.com/forums/toparab48
http://www.top3rab.mrsaal.com/forums/toparab50
http://www.top3rab.mrsaal.com/forums/toparab51
http://www.top3rab.mrsaal.com/forums/toparab52
http://www.top3rab.mrsaal.com/forums/toparab53
http://www.top3rab.mrsaal.com/forums/toparab53
http://www.top3rab.mrsaal.com/forums/toparab55
http://www.top3rab.mrsaal.com/forums/toparab56
http://www.top3rab.mrsaal.com/forums/toparab57
http://www.top3rab.mrsaal.com/forums/toparab57
http://www.top3rab.mrsaal.com/forums/toparab58
http://www.top3rab.mrsaal.com/forums/toparab59
http://www.top3rab.mrsaal.com/forums/toparab60
http://www.top3rab.mrsaal.com/forums/toparab61
http://www.top3rab.mrsaal.com/forums/toparab62
http://www.top3rab.mrsaal.com/forums/toparab62
http://www.top3rab.mrsaal.com/forums/toparab64
http://www.top3rab.mrsaal.com/forums/toparab65
http://www.top3rab.mrsaal.com/forums/toparab66
http://www.top3rab.mrsaal.com/forums/toparab67
http://www.top3rab.mrsaal.com/forums/toparab67
http://www.top3rab.mrsaal.com/forums/toparab68
http://www.top3rab.mrsaal.com/forums/toparab69
http://www.top3rab.mrsaal.com/forums/toparab69
http://www.top3rab.mrsaal.com/forums/toparab70/
http://www.top3rab.mrsaal.com/forums/toparab71
http://www.top3rab.mrsaal.com/forums/toparab72
http://www.top3rab.mrsaal.com/forums/toparab72
http://www.top3rab.mrsaal.com/forums/toparab73
http://www.top3rab.mrsaal.com/forums/toparab74
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab75
http://www.top3rab.mrsaal.com/forums/toparab76
http://www.top3rab.mrsaal.com/forums/toparab76
http://www.top3rab.mrsaal.com/forums/toparab77
http://www.top3rab.mrsaal.com/forums/toparab78
http://www.top3rab.mrsaal.com/forums/toparab79
http://www.top3rab.mrsaal.com/forums/toparab79
http://www.top3rab.mrsaal.com/forums/toparab80
http://www.top3rab.mrsaal.com/forums/toparab80
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab81
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab82
http://www.top3rab.mrsaal.com/forums/toparab42
http://www.top3rab.mrsaal.com/forums/toparab43
http://www.top3rab.mrsaal.com/forums/toparab43
http://www.top3rab.mrsaal.com/forums/toparab45
http://www.top3rab.mrsaal.com/forums/toparab46
http://www.top3rab.mrsaal.com/forums/toparab44
http://www.top3rab.mrsaal.com/forums/toparab44
This topic is locked and can not be replied to.