Ruby Forum Ferret > ferret for professionals

Posted by Igor K. (demoversion)
on 02.09.2007 12:03
Hello,

I'm trying to setup ferret search engine(what i did successfully for
searching english phrases, words)

But i doesn't work for UTF-8 symbols!!!

I tried  to find the solution for this problem:
1) i add to enviroment.rb followinf lines:
  ENV['LANG'] = 'de_DE.UTF-8@euro'
  ENV['LC_TIME'] = 'C'
  require 'acts_as_ferret'
2) next i using scaffold added few posts(my model, that i index this
way:
    acts_as_ferret :fields => [ :name, :post ]

English phrases searching well, but for german it get empty result
array.

I don't know what to do, so I asked to professional, who do this!!

Thanks for your replies!
Posted by Benjamin Krause (Guest)
on 02.09.2007 13:28
(Received via mailing list)
Hey ..

i just set the followings:

ENV['LC_CTYPE'] = 'en_US.UTF-8'
Ferret.locale = "en_US.UTF-8"

see http://bugs.omdb.org/browser/trunk/config/environment.rb

searching for utf-8 works great, i've indexed characters
in german, english, asian languages, hebrew and other..

Ben
Posted by Igor K. (demoversion)
on 03.09.2007 10:08
Attachment: ferrettest2.rar (82,1 KB)
Benjamin Krause wrote:
> Hey ..
> 
> i just set the followings:
> 
> ENV['LC_CTYPE'] = 'en_US.UTF-8'
> Ferret.locale = "en_US.UTF-8"
> 
> see http://bugs.omdb.org/browser/trunk/config/environment.rb
> 
> searching for utf-8 works great, i've indexed characters
> in german, english, asian languages, hebrew and other..
> 
> Ben

Thak you for your reply, but even this can't help me, please check my
sources in attachment.

Here is 2 controlles - add(for adding new posts) and search

Thanks
Posted by Jens Kraemer (Guest)
on 04.09.2007 09:57
(Received via mailing list)
On Sun, Sep 02, 2007 at 12:03:59PM +0200, Igor K. wrote:
>   ENV['LC_TIME'] = 'C'
>   require 'acts_as_ferret'
> 2) next i using scaffold added few posts(my model, that i index this
> way:
>     acts_as_ferret :fields => [ :name, :post ]
> 
> English phrases searching well, but for german it get empty result
> array.

Does this apply for searching via a web form, or in unit tests? Maybe
non-ascii characters in your queries get garbled before ferret even gets
to see them?

The same applies for the content you save.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 04.09.2007 11:28
Jens Kraemer wrote:
> On Sun, Sep 02, 2007 at 12:03:59PM +0200, Igor K. wrote:
>>   ENV['LC_TIME'] = 'C'
>>   require 'acts_as_ferret'
>> 2) next i using scaffold added few posts(my model, that i index this
>> way:
>>     acts_as_ferret :fields => [ :name, :post ]
>> 

> Does this apply for searching via a web form, or in unit tests? Maybe
> non-ascii characters in your queries get garbled before ferret even gets
> to see them?

No, i use UTF8 in database, controllers, environment.rb
When i do search like Post.find(:all, :conditions => ['name like ?', 
'SOME-UTF8-TEXT']) it founds.

Have anybody looked at my test project in attachment?
Posted by Jens Kraemer (Guest)
on 04.09.2007 15:51
(Received via mailing list)
On Tue, Sep 04, 2007 at 11:28:11AM +0200, Igor K. wrote:
> > non-ascii characters in your queries get garbled before ferret even gets
> > to see them?
> 
> No, i use UTF8 in database, controllers, environment.rb
> When i do search like Post.find(:all, :conditions => ['name like ?', 
> 'SOME-UTF8-TEXT']) it founds.
> 
> Have anybody looked at my test project in attachment?

Yes, I did. After changing the line setting ENV['LANG']
to 'en_US.UTF-8' in config/environment.rb (because I don't have
the de_DE.UTF-8@euro locale installed on my system) it works perfectly.

Before it didn't, and I couldn't even enter german umlauts on the rails
console.

So you should make sure the locale you set LANG to also exists on your
system. 'dpkg-reconfigure locales' can be used on Debian/Ubuntu to check
which locales you have and enable more if you like.

cheers,
Jens


--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 04.09.2007 16:40
> Yes, I did. After changing the line setting ENV['LANG']
> to 'en_US.UTF-8' in config/environment.rb (because I don't have
> the de_DE.UTF-8@euro locale installed on my system) it works perfectly.
> 
> Before it didn't, and I couldn't even enter german umlauts on the rails
> console.
> 
> So you should make sure the locale you set LANG to also exists on your
> system. 'dpkg-reconfigure locales' can be used on Debian/Ubuntu to check
> which locales you have and enable more if you like.
> 
>
Thank you for your replies,

I will check it!

Maybe it depends on operating system? I have Win XP Home SP2 russian.
Where i can get a list of locations for other countries('en_US.UTF-8', 
de_DE.UTF-8)???

Thanks
Posted by Jens Kraemer (Guest)
on 04.09.2007 16:44
(Received via mailing list)
On Tue, Sep 04, 2007 at 04:40:15PM +0200, Igor K. wrote:
> > which locales you have and enable more if you like.
> > 
> >
> Thank you for your replies,
> 
> I will check it!
> 
> Maybe it depends on operating system? I have Win XP Home SP2 russian.
> Where i can get a list of locations for other countries('en_US.UTF-8', 
> de_DE.UTF-8)???

imho you should be in a utf8 environment by default then - however I
don't have a windows to check this. Maybe for starters just start irb
and have a look at ENV['LANG'] ?

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 08:57
> 
> imho you should be in a utf8 environment by default then - however I
> don't have a windows to check this. Maybe for starters just start irb
> and have a look at ENV['LANG'] ?
> 
> Jens
> 
Hello,
Thanks for replies :) you really helps me
I have tried to change:
- windows locale to Russia
- environment.rb
 ENV['LC_CTYPE'] = 'ru_RU.UTF-8'
 Ferret.locale = "ru_RU.UTF-8"
- and what was important to change database locale from UTF8-unicode to 
UTF8-general.

But as i espect it was no finish of problems:
- because it index data incorrect, for example some data came to index 
db. The other problem is when - some objects it can find and some 
not(russian objects),
when i search by '*' i can get all objects, but with '**' i get only 
part of them.

what version of ferret-server, acts-as-ferret, locale in MySQL database 
do you have?

Thanks
Posted by Jens Kraemer (Guest)
on 05.09.2007 10:28
(Received via mailing list)
On Wed, Sep 05, 2007 at 08:57:43AM +0200, Igor K. wrote:
> - windows locale to Russia
> - environment.rb
>  ENV['LC_CTYPE'] = 'ru_RU.UTF-8'
>  Ferret.locale = "ru_RU.UTF-8"
> - and what was important to change database locale from UTF8-unicode to 
> UTF8-general.
> 
> But as i espect it was no finish of problems:
> - because it index data incorrect, for example some data came to index 
> db. The other problem is when - some objects it can find and some 
> not(russian objects),

Have you checked with the Ferret browser to make sure it really indexed
incorret values? Have you rebuilt your index after changing these locale
things?

> when i search by '*' i can get all objects, but with '**' i get only 
> part of them.

I'm not sure what Ferret does with a query like '**'...

> what version of ferret-server, acts-as-ferret, locale in MySQL database 
> do you have?

I used the aaf from inside your app when checking it out, and ferret 
0.11.4.

Here's a small checklist I use for making sure everything is UTF-8:

HTTP/HTML
  content-type delivered by web server
    should be 'Content-Type: text/html; charset=utf-8'
  html content-type meta tag
    should be '<meta http-equiv="content-type" 
content="text/html;charset=utf-8" />'

Mysql settings:
  In your application's console, execute the following:
  r = Post.connection.execute "SHOW VARIABLES LIKE 'character%'"
  r.each {|r|puts r}
  This should result in something like this:

  character_set_client
  utf8
  character_set_connection
  utf8
  character_set_database
  utf8

If your output differs, try the following:

set 'encoding: utf8' in environment.rb
  (this affects the character_set_connection and character_set_client
  values)

set 'default-character-set=utf8' in the mysqld section of the mysql
  configuration (/etc/mysql/my.cnf on linux). After restart of the 
server,
  newly created databases will be utf8 by default, and new tables in 
these
  databases will inherit this setting. Maybe it's possible to change the
  character set of existing databases/tables, too, however your data 
will
  have to be converted, too. The per database setting imho is only a
  default setting applied to new tables.


with all these settings in place, everything should be fine on the UTF-8
front :-)


Cheers,
Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 11:16
Thank you very much for answers,

but i still have problems

> HTTP/HTML
>   content-type delivered by web server
>     should be 'Content-Type: text/html; charset=utf-8'
>   html content-type meta tag
>     should be '<meta http-equiv="content-type" 
> content="text/html;charset=utf-8" />'

for this i did:
application.rhtml
  <meta http-equiv="content-type" content="text/html;charset=utf-8" />
application.rb
  before_filter :set_charset

   def set_charset
     if request.xhr?
       headers["Content-Type"] = "text/javascript; charset=utf-8"
       else headers["Content-Type"] = "text/html; charset=utf-8"
      end
    end

> 
> Mysql settings:
>   In your application's console, execute the following:
>   r = Post.connection.execute "SHOW VARIABLES LIKE 'character%'"
>   r.each {|r|puts r}
>   This should result in something like this:
> 
>   character_set_client
>   utf8
>   character_set_connection
>   utf8
>   character_set_database
>   utf8
> 

Here i gen results:
["character_set_client", "latin1"]
["character_set_connection", "latin1"]
["character_set_database", "utf8"]
["character_set_filesystem", "binary"]
["character_set_results", "latin1"]
["character_set_server", "latin1"]
["character_set_system", "utf8"]
["character_sets_dir", "C:\\InstantRails\\mysql\\share\\charsets\\"]

> If your output differs, try the following:
> 
> set 'encoding: utf8' in environment.rb
>   (this affects the character_set_connection and character_set_client
>   values)
> 

in environement.rb i add:
ENV['LC_CTYPE'] = 'ru_RU.UTF8'
ENV['LANG'] = 'ru_RU.UTF8'
$KCODE = 'u'
require 'acts_as_ferret'


> set 'default-character-set=utf8' in the mysqld section of the mysql
>   configuration (/etc/mysql/my.cnf on linux). After restart of the 
> server,
>   newly created databases will be utf8 by default, and new tables in 
> these
>   databases will inherit this setting. Maybe it's possible to change the
>   character set of existing databases/tables, too, however your data 
> will
>   have to be converted, too. The per database setting imho is only a
>   default setting applied to new tables.
> 

What else to do?
Thanks
Posted by Jens Kraemer (Guest)
on 05.09.2007 11:18
(Received via mailing list)
On Wed, Sep 05, 2007 at 11:16:20AM +0200, Igor K. wrote:
> 
> > If your output differs, try the following:
> > 
> > set 'encoding: utf8' in environment.rb
> >   (this affects the character_set_connection and character_set_client
> >   values)

Sorry, my mistake. You need to set
encoding: utf8
in database.yml for your db connections.

Looking at the output above this should fix your problem.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 11:32
> Sorry, my mistake. You need to set
> encoding: utf8
> in database.yml for your db connections.

No problem, anyway you really helps me :)

now i got
["character_set_client", "utf8"]
["character_set_connection", "utf8"]
["character_set_database", "utf8"]
["character_set_filesystem", "binary"]
["character_set_results", "utf8"]
["character_set_server", "latin1"]
["character_set_system", "utf8"]
["character_sets_dir", "C:\\InstantRails\\mysql\\share\\charsets\\"]


is it okay -> ["character_set_server", "latin1"]???

> Looking at the output above this should fix your problem.
> 
Posted by Igor K. (demoversion)
on 05.09.2007 11:33
and how to start  Ferret browser ???
Posted by Jens Kraemer (Guest)
on 05.09.2007 11:46
(Received via mailing list)
On Wed, Sep 05, 2007 at 11:32:35AM +0200, Igor K. wrote:
> ["character_set_filesystem", "binary"]
> ["character_set_results", "utf8"]
> ["character_set_server", "latin1"]
> ["character_set_system", "utf8"]
> ["character_sets_dir", "C:\\InstantRails\\mysql\\share\\charsets\\"]
> 
> 
> is it okay -> ["character_set_server", "latin1"]???

yeah, as long as your app's database is set to utf8.

ferret-browser is started with

ferret-browser path/to/index

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 12:00
> ferret-browser is started with
> 
> ferret-browser path/to/index
> 

Seems to be that ferret works incorrect
please check my sources and screenshot 22.gif.

ps: when i start server i get an warning
C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123: 
warning: parenthesize argument(s) for future version
Posted by Igor K. (demoversion)
on 05.09.2007 12:06
Attachment: ferrettest3.rar (81,7 KB)
> 
> Seems to be that ferret works incorrect
> please check my sources and screenshot 22.gif.
> 
> ps: when i start server i get an warning
> C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123: 
> warning: parenthesize argument(s) for future version

here ia attachment
Posted by Jens Kraemer (Guest)
on 05.09.2007 12:07
(Received via mailing list)
On Wed, Sep 05, 2007 at 12:00:03PM +0200, Igor K. wrote:
> > ferret-browser is started with
> > 
> > ferret-browser path/to/index
> > 
> 
> Seems to be that ferret works incorrect

in general I don't think so :-)

> please check my sources and screenshot 22.gif.

where?

> ps: when i start server i get an warning
> C:/ruby/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.1/lib/ferret_server.rb:123: 
> warning: parenthesize argument(s) for future version

nothing to worry about...

Jens


--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Jens Kraemer (Guest)
on 05.09.2007 12:10
(Received via mailing list)
On Wed, Sep 05, 2007 at 12:06:50PM +0200, Igor K. wrote:
> Attachments:
> http://www.ruby-forum.com/attachment/228/ferrettest3.rar

can't see what's wrong with the screenshot (besides the missing images
for yes/no). did you check the index contents?

  Jens


--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 12:28
Attachment: 333.GIF (7,7 KB)
> can't see what's wrong with the screenshot (besides the missing images
> for yes/no).


> did you check the index contents?

What do you mean?

please check another attachment, why i can see only id field?
Posted by Jens Kraemer (Guest)
on 05.09.2007 12:35
(Received via mailing list)
On Wed, Sep 05, 2007 at 12:28:53PM +0200, Igor K. wrote:
> 
> > can't see what's wrong with the screenshot (besides the missing images
> > for yes/no).
> 
> 
> > did you check the index contents?
> 
> What do you mean?
> 
> please check another attachment, why i can see only id field?

ah, of course. aaf by default only stores the id in the index, the rest
of the data is just indexed (so records can be found, but you cannot
retrieve their original contents from the index). use something like

acts_as_ferret :fields => {
  :name => { :store => :yes },
  :body => { :store => :yes }
}

then you'll see the data in ferret-browser.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Posted by Igor K. (demoversion)
on 05.09.2007 12:57
Attachment: 1a.rar (48,6 KB)
Ohhhh :),

it is not working anyway, please check attachment.

in console as you can see i searching by symbol that one word in 
database starts.

i don't know what to do?

locale in system - russian
Posted by Vitaliy Khudenko (vit)
on 09.05.2008 15:29
Igor, hi!

Seems I've got the same problem. Have you finally solved it?

My working environment is:
- WinXP Pro SP2 rus
- NetBeans 6.1
- MySQL 5.0.45 (with all possible encoding params set to UTF-8)
- RoR 2.0.2 (database.yml has 'encoding: utf8' string at proper places)
- ferret 0.11.5 (i.e. newest)
- acts_as_ferret (newest stable version installed inside of my RoR 
project).

My code example: http://pastie.caboo.se/194268

When I point my browser http://localhost:3000/people I get 2 persons in 
the list while it's expected to have zero.

I also tried to experiment with all those things mentioned above in this 
thread (locales, enviroment.rb and so on) - nothing positive.

My friend who has Mac + Leopard OS has just tested my code - on his 
machine the code worked as expected. So I think it is an OS-dependent 
issue of ferret or aaf...