I am trying to search my database which has product names with french
accents. They are encoded using the html entity codes such as &eactue;
etc.
If a user enters a word with a french accent in the search box I must
convert it to the html entity code so it can be found in the database.
So I thought to use str.sub(pattern, replacement) => new_str
However when I try this using product.sub('é','é') for example it
results in the following:
find(:all, :select => 'product_id, name', :order => "name", :conditions
=> ["name like ? and locale =?", "%#{product.sub('é','é')}%",
I18n.locale])
When I enter 'é' in the seach box I get the following:
SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')
so it does not replace the 'é' with '&eactue'
But if I change the letter from 'é' to 'e' and do a search for 'e' I
get the following:
SELECT product_id, name FROM `product_descriptions` WHERE (name like
'%é%' and locale ='en')
so the replacement works.
Can anyone explain why it won't work for the character with french
accent?
Thank you in advance.
Mitch
on 2011-01-23 16:04
on 2011-01-23 16:28
Ideally, you shouldn't have HTML entities in the database. If you need
them
in your HTML (and you don't, if you set an explicit encoding, except for
things like &<>) then you should add them outside the database.
If you do have "" stored in the database, not as an entity, I believe
MySQL's "LIKE" will be accent-insensitive by default, unless you use
"COLLATE utf8_bin" (google for details).
Note that if you use "sub", you will only replace the first occurrence
in
the string. You probably want "gsub".
And if you do something like "blh".sub("", "é") and it doesn't
replace the "", the issue could be how the "" is represented. In UTF-8,
accented characters can be represented either composed as a single glyph
("latin small letter e with acute") or decomposed as two glyphs: "latin
small letter e" + "combining acute accent". So if your string contains
the
first type of and your sub/gsub tries to replace the other type, it
won't
work. You can normalize the string to ensure everything is composed or
decomposed, but it would be better not to have entities in the database.
on 2011-01-24 00:14
Henrik --- wrote in post #976928: > Ideally, you shouldn't have HTML entities in the database. If you need > them > in your HTML (and you don't, if you set an explicit encoding, except for > things like &<>) then you should add them outside the database. > > If you do have "" stored in the database, not as an entity, I believe > MySQL's "LIKE" will be accent-insensitive by default, unless you use > "COLLATE utf8_bin" (google for details). > > Note that if you use "sub", you will only replace the first occurrence > in > the string. You probably want "gsub". > > And if you do something like "blh".sub("", "é") and it doesn't > replace the "", the issue could be how the "" is represented. In UTF-8, > accented characters can be represented either composed as a single glyph > ("latin small letter e with acute") or decomposed as two glyphs: "latin > small letter e" + "combining acute accent". So if your string contains > the > first type of and your sub/gsub tries to replace the other type, it > won't > work. You can normalize the string to ensure everything is composed or > decomposed, but it would be better not to have entities in the database. Ok I will take your advise and remove the html entities from the database. The reason I put them in was because even with explicit encoding I was not getting the characters to show properly. I was getting a black triangle with a question mark. Could you assist me on how to encode the web page so that it shows the accents. I thought you just use UTF-8? Thanks for your help. I really appreciate it.
on 2011-01-24 00:20
Yes, encode the file in UTF-8 and add a tag like this on your head section: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
on 2011-01-24 00:22
On Mon, Jan 24, 2011 at 00:14, Mitchell Gould <lists@ruby-forum.com> wrote: > > Ok I will take your advise and remove the html entities from the > database. > The reason I put them in was because even with explicit encoding I was > not getting the characters to show properly. I was getting a black > triangle with a question mark. > > Could you assist me on how to encode the web page so that it shows the > accents. I thought you just use UTF-8? > Check what your browser thinks the encoding is. Check that UTF-8 is declared in the HTTP headers or a meta element (and if they disagree, I'm not entirely sure what goes - research that). http://htmlpurifier.org/docs/enduser-utf8.html has some info. Also ensure the font you're using can handle that glyph. I would guess most fonts can display . But if everything else looks right, try some standard font like Times and see what happens.
on 2011-01-24 08:39
I removed some HTML entities from my database to test the effect. I made sure my web page is UTF-8 encoded. Now instead of "électronique" I get name: "\xC9lectronic" where the"\xC9" displays like a black triangle with a "?" in it. I also changed the font to times. I read up and learned that MYSQL might be delivering the characters in a format other than UTF-8. I changed my database, table, and field to be UTF-8. I still get the same problem as stated above. What gives? Thanks in advance MItch
on 2011-01-24 08:55
Hi, I figured it all out. I need to explicitly tell Rails that the database is using utf8 encoding by putting the following in the database.yml file encoding: utf8 now it displays perfectly. I hope this is still in line with best practices as I don't want to mess this up again. Thanks MItch
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.