More sorting problems with untokenized index

I’m having problems sorting on untokenized fields. I have one field that
sorts fine, but there are others that seem to sort on a different field.
Here’s the index description:

acts_as_ferret
:remote=>true,:fields=>{:name=>{:boost=>2},:name_for_sort=>{:index =>
:untokenized},
:city=>{:boost=>2}, :city_for_sort=>{:index=>:untokenized},
:state=>{:boost=>2}, :state_for_sort=>{:index=>:untokenized},
:tag_list=>{:boost=>0},:tag_list_for_sort=>{:boost=>0},
:date_summary=>{:boost=>1},
:date_for_range=>{:boost=>0},
:start_date=>{:boost=>0}}

When I sort on name_for_sort it works fine.

City_for_sort however causes problems. Here is a random offset. There
are 16,000 records, so I wouldn’t expect so much disparity:

Event.find_by_contents(“marathon”,:sort=>“city_for_sort”,:offset => 100).map(&:city_for_sort)
=> [“laguna hills”, “burlington”, “buffalo”, “sun valley”, “ottawa”,
“alexandria”, “green bay”, “cleveland”, “aurora denver lakewood”,
“corpus christi”]

and a later batch:

Event.find_by_contents(“marathon”,:sort=>“city_for_sort”,:offset => 400).map(&:city_for_sort)
=> [“ocean shores”, “austin”, “boca raton”, “sauvie island”, “crested
butte”, “austin”, “portland”, “avery”, “leadville”, “houston”]

Notice that name works:

Event.find_by_contents(“marathon”,:sort=>“name_for_sort”,:offset => 400).map(&:name_for_sort)
=> [“Columbus Marathon”, “Columbus Marathon”, “Columbus Marathon”,
“Columbus Marathon”, “Columbus Marathon”, “Columbus Marathon Relay”,
“Columbus Marathon Relay”, “Columbus Marathon Relay”, “Comcast Baltimore
Marathon”, “Comcast Baltimore Marathon”]

Notice however that it appears to be sorting on a range column, even
when we ask for city_for_sort:

Event.find_by_contents(“marathon”,:sort=>“city_for_sort”,:offset => 400).map(&:date_for_range)
=> [“20060709”, “20060708”, “20060708”, “20060704”, “20060704”,
“20060704”, “20060704”, “20060702”, “20060701”, “20060701”]

Does anyone have an idea what could cause this? I’ve rebuild the index
several times and it doesn’t help.

I’ve also noticed that the default field list doesn’t include the
columns:

using index in script/…/config/…/config/…/index/development/event
default field list: [:state, :start_date, :name, :tag_list, :city,
:tag_list_for_sort, :date_for_range, :date_summary]

When I look at ferret-browser, it does show the city_for_sort column. I
can browse the values in order and its parameters match those of
name_for_sort which works.

I’m completely stumped.

Hi!

Just to rule out the possibility of aaf being the culprit here, could
you try your queries using Ferret directly on the index? Shut down your
app and the DRb server before, just to be sure :wink:

Then you could also try to use the sort API instead of string sort
(check out the Sort and SortField classes in Ferret’s API).

The fact that your untokenized fields do not appear in the default field
list is ok, the default field list lists the field used by aaf for
searching (when no specific fieldnames are used in your queries) and
excludes untokenized fields (searching these fields could lead to less
search results than you expected, if you are interested in why this is
the case - this has been discussed on this list a few months ago).

Your date_for_range and tag_list_for_sort fields should also be
untokenized, however that won’t solve your problem I guess.

Jens

On Fri, Jul 13, 2007 at 05:33:47AM +0200, Mike Mangino wrote:

:tag_list=>{:boost=>0},:tag_list_for_sort=>{:boost=>0},

=> [“laguna hills”, “burlington”, “buffalo”, “sun valley”, “ottawa”,

when we ask for city_for_sort:
I’ve also noticed that the default field list doesn’t include the
I’m completely stumped.


Posted via http://www.ruby-forum.com/.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

Hi!

Just to rule out the possibility of aaf being the culprit here, could
you try your queries using Ferret directly on the index? Shut down your
app and the DRb server before, just to be sure :wink:

Sure

Then you could also try to use the sort API instead of string sort
(check out the Sort and SortField classes in Ferret’s API).

Good suggestion. Sort with a string is still broken when I use ferret
directly. Sort with auto type is as well:

total_hits =
fidx.search_each(“marathon”,:sort=>Ferret::Search::Sort.new([Ferret::Search::SortField.new(:city_for_sort)]),:offset=>400)
do |hit,score|
?> doc = fidx[hit]

results << doc[:id]
end
=> 1887

Event.find(results).map(&:city_for_sort)
=> [“ocean shores”, “austin”, “boca raton”, “sauvie island”, “crested
butte”, “austin”, “portland”, “avery”, “leadville”, “houston”]

However, when I specify a string sort, that seems to fix it:

Event.find(results).map(&:city_for_sort)
=> [“bellevue”, “baton rouge”, “basalt”, “bend”, “bellevue”, “bedford”,
“bend”, “berlin”, “berlin”, “beijing, china”]

total_hits = fidx.search_each(“marathon”,:sort=>Ferret::Search::Sort.new([Ferret::Search::SortField.new(:city_for_sort,:type=>:string)]),:offset=>400) do |hit,score|
?> doc = fidx[hit]

results << doc[:id]
end
=> 1887

There were some fields with the text “0”. I wonder if it was guessing
the wrong type of index? I cleaned up that data and I’m rebuilding the
index now.

[snip]

Jens

Jens K. wrote:

On Fri, Jul 13, 2007 at 02:51:00PM +0200, Mike Mangino wrote:

Jens K. wrote:

[…]

There were some fields with the text “0”. I wonder if it was guessing
the wrong type of index? I cleaned up that data and I’m rebuilding the
index now.

That sounds like a really good explanation to me.

If only it were true :slight_smile:

Event.find(:all).select {|e| /^[0-9]+$/.match(e.city_for_sort)}
=> []
Event.find(:all).select {|e| /^[0-9.]+$/.match(e.city_for_sort)}
=> []

but the problem still exists.

According to
http://ferret.davebalmain.com/trac/browser/trunk/c/src/sort.c, it looks
like that should fix it.

When I use the Sort and SortField I get the DRB error I reported
previously because it can’t marshall those objects:

Event.find_by_contents(“marathon”,:sort=>Ferret::Search::arch::SortField.new(:city_for_sort)]),:offset=>400).map(&:city_for_sort)

DRb::DRbConnError: DRb::DRbServerNotFound

Is there an easy fix for this?

Jens

On Fri, Jul 13, 2007 at 02:51:00PM +0200, Mike Mangino wrote:

Jens K. wrote:

[…]

There were some fields with the text “0”. I wonder if it was guessing
the wrong type of index? I cleaned up that data and I’m rebuilding the
index now.

That sounds like a really good explanation to me.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Fri, Jul 13, 2007 at 05:14:22PM +0200, Mike Mangino wrote:

index now.

DRb::DRbConnError: DRb::DRbServerNotFound

Is there an easy fix for this?

Do you already use acts_as_ferret’s trunk?

If yes and this problem still exists, your best bet is to extend
local_index.rb and add a custom search method that constructs your sort
objects based on additional parameters (that are no sort objects, to
avoid the drb probs) you hand it over. This method will be then
reachable via the ferret_index property of your model class.

Okay. I did this previously. Here is my change:

def find_id_by_contents(query, options = {})
  if  (sort=options[:sort])
    sort = [sort] unless sort.is_a?(Array)
    sort_fields = sort.map do |field|
      term,direction,sort_type = field.split(/\s+/)
      direction ||= "asc"
      sort_type ||= "auto"
      Ferret::Search::SortField.new(term,:reverse=>direction.match(/desc/i),:type=>sort_type.to_sym)
    end
    options[:sort]=Ferret::Search::Sort.new(sort_fields)
  end
  ...

That fixes the sort ordering and allows you to specify the type in the
sort.

I can roll that up into a patch if you would like that for inclusion.

Mike

Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

On Fri, Jul 13, 2007 at 06:05:30PM +0200, Mike Mangino wrote:

Do you already use acts_as_ferret’s trunk?
def find_id_by_contents(query, options = {})

That fixes the sort ordering and allows you to specify the type in the
sort.

I can roll that up into a patch if you would like that for inclusion.

yes, please post it to acts_as_ferret’s trac :slight_smile:

thanks,
Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

yes, please post it to acts_as_ferret’s trac :slight_smile:

http://projects.jkraemer.net/acts_as_ferret/ticket/155

thanks,
Jens


Jens Kr�mer
webit! Gesellschaft f�r neue Medien mbH
Schnorrstra�e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Mike

On Fri, Jul 13, 2007 at 05:14:22PM +0200, Mike Mangino wrote:

index now.

DRb::DRbConnError: DRb::DRbServerNotFound

Is there an easy fix for this?

Do you already use acts_as_ferret’s trunk?

If yes and this problem still exists, your best bet is to extend
local_index.rb and add a custom search method that constructs your sort
objects based on additional parameters (that are no sort objects, to
avoid the drb probs) you hand it over. This method will be then
reachable via the ferret_index property of your model class.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa