Ferret for professionals

Hello,

I’m trying to setup ferret search engine(what i did successfully for
searching english phrases, words)

But i doesn’t work for UTF-8 symbols!!!

I tried to find the solution for this problem:

  1. i add to enviroment.rb followinf lines:
    ENV[‘LANG’] = ‘de_DE.UTF-8@euro’
    ENV[‘LC_TIME’] = ‘C’
    require ‘acts_as_ferret’
  2. next i using scaffold added few posts(my model, that i index this
    way:
    acts_as_ferret :fields => [ :name, :post ]

English phrases searching well, but for german it get empty result
array.

I don’t know what to do, so I asked to professional, who do this!!

Thanks for your replies!

Hey …

i just set the followings:

ENV[‘LC_CTYPE’] = ‘en_US.UTF-8’
Ferret.locale = “en_US.UTF-8”

see http://bugs.omdb.org/browser/trunk/config/environment.rb

searching for utf-8 works great, i’ve indexed characters
in german, english, asian languages, hebrew and other…

Ben

Benjamin K. wrote:

Hey …

i just set the followings:

ENV[‘LC_CTYPE’] = ‘en_US.UTF-8’
Ferret.locale = “en_US.UTF-8”

see http://bugs.omdb.org/browser/trunk/config/environment.rb

searching for utf-8 works great, i’ve indexed characters
in german, english, asian languages, hebrew and other…

Ben

Thak you for your reply, but even this can’t help me, please check my
sources in attachment.

Here is 2 controlles - add(for adding new posts) and search

Thanks

On Sun, Sep 02, 2007 at 12:03:59PM +0200, Igor K. wrote:

ENV[‘LC_TIME’] = ‘C’
require ‘acts_as_ferret’
2) next i using scaffold added few posts(my model, that i index this
way:
acts_as_ferret :fields => [ :name, :post ]

English phrases searching well, but for german it get empty result
array.

Does this apply for searching via a web form, or in unit tests? Maybe
non-ascii characters in your queries get garbled before ferret even gets
to see them?

The same applies for the content you save.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Sun, Sep 02, 2007 at 12:03:59PM +0200, Igor K. wrote:

ENV[‘LC_TIME’] = ‘C’
require ‘acts_as_ferret’
2) next i using scaffold added few posts(my model, that i index this
way:
acts_as_ferret :fields => [ :name, :post ]

Does this apply for searching via a web form, or in unit tests? Maybe
non-ascii characters in your queries get garbled before ferret even gets
to see them?

No, i use UTF8 in database, controllers, environment.rb
When i do search like Post.find(:all, :conditions => [‘name like ?’,
‘SOME-UTF8-TEXT’]) it founds.

Have anybody looked at my test project in attachment?

Yes, I did. After changing the line setting ENV[‘LANG’]
to ‘en_US.UTF-8’ in config/environment.rb (because I don’t have
the de_DE.UTF-8@euro locale installed on my system) it works perfectly.

Before it didn’t, and I couldn’t even enter german umlauts on the rails
console.

So you should make sure the locale you set LANG to also exists on your
system. ‘dpkg-reconfigure locales’ can be used on Debian/Ubuntu to check
which locales you have and enable more if you like.

Thank you for your replies,

I will check it!

Maybe it depends on operating system? I have Win XP Home SP2 russian.
Where i can get a list of locations for other countries(‘en_US.UTF-8’,
de_DE.UTF-8)???

Thanks

On Tue, Sep 04, 2007 at 11:28:11AM +0200, Igor K. wrote:

non-ascii characters in your queries get garbled before ferret even gets
to see them?

No, i use UTF8 in database, controllers, environment.rb
When i do search like Post.find(:all, :conditions => [‘name like ?’,
‘SOME-UTF8-TEXT’]) it founds.

Have anybody looked at my test project in attachment?

Yes, I did. After changing the line setting ENV[‘LANG’]
to ‘en_US.UTF-8’ in config/environment.rb (because I don’t have
the de_DE.UTF-8@euro locale installed on my system) it works perfectly.

Before it didn’t, and I couldn’t even enter german umlauts on the rails
console.

So you should make sure the locale you set LANG to also exists on your
system. ‘dpkg-reconfigure locales’ can be used on Debian/Ubuntu to check
which locales you have and enable more if you like.

cheers,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

On Tue, Sep 04, 2007 at 04:40:15PM +0200, Igor K. wrote:

which locales you have and enable more if you like.

Thank you for your replies,

I will check it!

Maybe it depends on operating system? I have Win XP Home SP2 russian.
Where i can get a list of locations for other countries(‘en_US.UTF-8’,
de_DE.UTF-8)???

imho you should be in a utf8 environment by default then - however I
don’t have a windows to check this. Maybe for starters just start irb
and have a look at ENV[‘LANG’] ?

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

imho you should be in a utf8 environment by default then - however I
don’t have a windows to check this. Maybe for starters just start irb
and have a look at ENV[‘LANG’] ?

Jens

Hello,
Thanks for replies :slight_smile: you really helps me
I have tried to change:

  • windows locale to Russia
  • environment.rb
    ENV[‘LC_CTYPE’] = ‘ru_RU.UTF-8’
    Ferret.locale = “ru_RU.UTF-8”
  • and what was important to change database locale from UTF8-unicode to
    UTF8-general.

But as i espect it was no finish of problems:

  • because it index data incorrect, for example some data came to index
    db. The other problem is when - some objects it can find and some
    not(russian objects),
    when i search by ‘*’ i can get all objects, but with ‘**’ i get only
    part of them.

what version of ferret-server, acts-as-ferret, locale in MySQL database
do you have?

Thanks

Thank you very much for answers,

but i still have problems

HTTP/HTML
content-type delivered by web server
should be ‘Content-Type: text/html; charset=utf-8’
html content-type meta tag
should be ‘’

for this i did:
application.rhtml

application.rb before_filter :set_charset

def set_charset
if request.xhr?
headers[“Content-Type”] = “text/javascript; charset=utf-8”
else headers[“Content-Type”] = “text/html; charset=utf-8”
end
end

Mysql settings:
In your application’s console, execute the following:
r = Post.connection.execute “SHOW VARIABLES LIKE ‘character%’”
r.each {|r|puts r}
This should result in something like this:

character_set_client
utf8
character_set_connection
utf8
character_set_database
utf8

Here i gen results:
[“character_set_client”, “latin1”]
[“character_set_connection”, “latin1”]
[“character_set_database”, “utf8”]
[“character_set_filesystem”, “binary”]
[“character_set_results”, “latin1”]
[“character_set_server”, “latin1”]
[“character_set_system”, “utf8”]
[“character_sets_dir”, “C:\InstantRails\mysql\share\charsets\”]

If your output differs, try the following:

set ‘encoding: utf8’ in environment.rb
(this affects the character_set_connection and character_set_client
values)

in environement.rb i add:
ENV[‘LC_CTYPE’] = ‘ru_RU.UTF8’
ENV[‘LANG’] = ‘ru_RU.UTF8’
$KCODE = ‘u’
require ‘acts_as_ferret’

set ‘default-character-set=utf8’ in the mysqld section of the mysql
configuration (/etc/mysql/my.cnf on linux). After restart of the
server,
newly created databases will be utf8 by default, and new tables in
these
databases will inherit this setting. Maybe it’s possible to change the
character set of existing databases/tables, too, however your data
will
have to be converted, too. The per database setting imho is only a
default setting applied to new tables.

What else to do?
Thanks

On Wed, Sep 05, 2007 at 11:16:20AM +0200, Igor K. wrote:

If your output differs, try the following:

set ‘encoding: utf8’ in environment.rb
(this affects the character_set_connection and character_set_client
values)

Sorry, my mistake. You need to set
encoding: utf8
in database.yml for your db connections.

Looking at the output above this should fix your problem.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Sorry, my mistake. You need to set
encoding: utf8
in database.yml for your db connections.

No problem, anyway you really helps me :slight_smile:

now i got
[“character_set_client”, “utf8”]
[“character_set_connection”, “utf8”]
[“character_set_database”, “utf8”]
[“character_set_filesystem”, “binary”]
[“character_set_results”, “utf8”]
[“character_set_server”, “latin1”]
[“character_set_system”, “utf8”]
[“character_sets_dir”, “C:\InstantRails\mysql\share\charsets\”]

is it okay -> [“character_set_server”, “latin1”]???

Looking at the output above this should fix your problem.

On Wed, Sep 05, 2007 at 08:57:43AM +0200, Igor K. wrote:

  • windows locale to Russia
  • environment.rb
    ENV[‘LC_CTYPE’] = ‘ru_RU.UTF-8’
    Ferret.locale = “ru_RU.UTF-8”
  • and what was important to change database locale from UTF8-unicode to
    UTF8-general.

But as i espect it was no finish of problems:

  • because it index data incorrect, for example some data came to index
    db. The other problem is when - some objects it can find and some
    not(russian objects),

Have you checked with the Ferret browser to make sure it really indexed
incorret values? Have you rebuilt your index after changing these locale
things?

when i search by ‘*’ i can get all objects, but with ‘**’ i get only
part of them.

I’m not sure what Ferret does with a query like ‘**’…

what version of ferret-server, acts-as-ferret, locale in MySQL database
do you have?

I used the aaf from inside your app when checking it out, and ferret
0.11.4.

Here’s a small checklist I use for making sure everything is UTF-8:

HTTP/HTML
content-type delivered by web server
should be ‘Content-Type: text/html; charset=utf-8’
html content-type meta tag
should be ‘’

Mysql settings:
In your application’s console, execute the following:
r = Post.connection.execute “SHOW VARIABLES LIKE ‘character%’”
r.each {|r|puts r}
This should result in something like this:

character_set_client
utf8
character_set_connection
utf8
character_set_database
utf8

If your output differs, try the following:

set ‘encoding: utf8’ in environment.rb
(this affects the character_set_connection and character_set_client
values)

set ‘default-character-set=utf8’ in the mysqld section of the mysql
configuration (/etc/mysql/my.cnf on linux). After restart of the
server,
newly created databases will be utf8 by default, and new tables in
these
databases will inherit this setting. Maybe it’s possible to change the
character set of existing databases/tables, too, however your data
will
have to be converted, too. The per database setting imho is only a
default setting applied to new tables.

with all these settings in place, everything should be fine on the UTF-8
front :slight_smile:

Cheers,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

On Wed, Sep 05, 2007 at 11:32:35AM +0200, Igor K. wrote:

[“character_set_filesystem”, “binary”]
[“character_set_results”, “utf8”]
[“character_set_server”, “latin1”]
[“character_set_system”, “utf8”]
[“character_sets_dir”, “C:\InstantRails\mysql\share\charsets\”]

is it okay → [“character_set_server”, “latin1”]???

yeah, as long as your app’s database is set to utf8.

ferret-browser is started with

ferret-browser path/to/index

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

ferret-browser is started with

ferret-browser path/to/index

Seems to be that ferret works incorrect
please check my sources and screenshot 22.gif.

ps: when i start server i get an warning
C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123:
warning: parenthesize argument(s) for future version

and how to start Ferret browser ???

On Wed, Sep 05, 2007 at 12:00:03PM +0200, Igor K. wrote:

ferret-browser is started with

ferret-browser path/to/index

Seems to be that ferret works incorrect

in general I don’t think so :slight_smile:

please check my sources and screenshot 22.gif.

where?

ps: when i start server i get an warning
C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123:
warning: parenthesize argument(s) for future version

nothing to worry about…

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Seems to be that ferret works incorrect
please check my sources and screenshot 22.gif.

ps: when i start server i get an warning
C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123:
warning: parenthesize argument(s) for future version

here ia attachment

can’t see what’s wrong with the screenshot (besides the missing images
for yes/no).

did you check the index contents?

What do you mean?

please check another attachment, why i can see only id field?

On Wed, Sep 05, 2007 at 12:06:50PM +0200, Igor K. wrote:

Attachments:
http://www.ruby-forum.com/attachment/228/ferrettest3.rar

can’t see what’s wrong with the screenshot (besides the missing images
for yes/no). did you check the index contents?

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa