Ruby Forum Ferret > Pagination, sorting and conditions: the combination is breaking my search results...

Posted by Max Williams (max-williams)
on 25.04.2008 12:46
(Received via mailing list)
Hi

I have a problem with a search where i want to get some results 
according to
some conditions, sort the results, and then paginate over the sorted
collection.

My search looks like this:

      @results  = TeachingObject.find_with_ferret(search_term,
                        #(ferret) options
                        {:page => options[:page],
                        :per_page => options[:per_page],
                        :sort => 
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )},
                        #find options
                        { :conditions => ["id in (?)", @ids] } )

where @ids is an array of ids from which the results must come (ie a
collection of 'allowed' results of which @results will be a subset): 
often
the search term is set to * to get all of this collection in @results.
':asset_count' is an untokenized ferret field that stores integers.

Through debugging and experimenting, i've observed the following:

   - The overall results set, without sorting/pagination, is correct
   (therefore :conditions is being taken into account).
   - If :per_page is set to be so large that no pagination is required, 
then
   the sorting occurs properly (therefore sorting is being taken into 
account)
   - If :per_page is reduced so that pagination is required, then the
   sorting of the overall set breaks: it seems as if the results are 
ordered by
   id, then paginated.
   - However, on every individual page, the results are sorted properly 
for
   that page, ie each page-size subset is internally sorted.
   - If i sort on a different untokenized field,  the problem persists.

It seems as if the pagination is happening and THEN the sorting is
happening, which obviously doesn't give the expected results.  This is 
just
a theory on my part though.

Can anyone tell me how to fix this problem?  I've been gnashing my teeth
over it for over a day now and can't find any solutions...

thanks
max
Posted by Jens Kraemer (Guest)
on 25.04.2008 14:00
(Received via mailing list)
Hi Max,

thanks for your detailed report. Might well be that I broke one or more
of the various combinations of pagination / sorting / active record
conditions (where you might specify :order, too, btw) in trunk.

I'll look into it asap.

Cheers,
Jens

On Fri, Apr 25, 2008 at 11:45:30AM +0100, Max Williams wrote:
>                         {:page => options[:page],
> 
>    that page, ie each page-size subset is internally sorted.
> max
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold
Posted by Max Williams (max-williams)
on 25.04.2008 14:36
(Received via mailing list)
Fantastic, thanks Jens.

BTW, I can't :order by :asset_count as it's a method, rather than an
instance variable.  I wish AR::find would let me order by method 
returns.

While we're talking, i just tried another way of searching and found 
another
bit of weirdness:  instead of find_with_ferret, i tried using

ActsAsFerret::find(term, class_array) instead, like so:

       #sorting/pagination is broken here
       @results = ActsAsFerret::find(search_term, [TeachingObject],
                      #(ferret) options
                      { :page => options[:page],
                        :per_page => options[:per_page],
                        :sort => 
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )
                      },
                      #find options - need to specify conditions for 
each
searched class individually
                      {:conditions => { :teaching_object => ["id in 
(?)",
@ids] } }
                   )

This gave exactly the same results as the previous search.  However, 
when i
added another class to the search, it works!

       #this works!
       @results = ActsAsFerret::find(search_term, [TeachingObject,
LearningObject],
                      #(ferret) options
                      { :page => options[:page],
                        :per_page => options[:per_page],
                        :sort => 
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )
                      },
                      #find options - need to specify conditions for 
each
searched class individually
                      {:conditions => { :teaching_object => ["id in 
(?)",
@ids],
                                               :learning_object => ["id 
in
(?)", @ids] } }
                   )

So, this works while the previous doesn't.  It so happens in this case 
that
LearningObject is a 'sister' class of TeachingObject (they both extend a
class called Resource), where both are saved in a table called resources
using STI, and at the moment i don't actually have any LearningObject
records, so adding LearningObject doesn't harm my results.  Obviously 
though
this isn't a nice workaround.

Sorry to pile bug reports on you, i just mention it in case it's 
relevant.
:/

Thanks a lot
max

2008/4/25 Jens Kraemer <kraemer@webit.de>:
Posted by Sheldon Maloff (sheldonmaloff)
on 02.05.2008 04:24
Hello Jens,

I think I know what's going on here, because our descending sort 
searches are broken too and I have started to investigate what's causing 
the problem and trying to fix it. I have a January '08 version of the 
trunk. I believe it's changed quite a lot since that time.

Jens, I don't think it's anything you "broke" but rather an artifiact of 
how MySQL works. At least, I'm using MySQL and this is the behaviour I 
see.

I created 6 records, whose ids are 1 to 6 in my database. I am 
paginating on every 5 records. In my reverse sort I would expect to see 
records 6, 5, 4, 3, 2 on page 1 of the results. And id 1 on page 2 of 
the results.

What I see is a method called ar_find_by_contents. It calls 
find_id_by_content that returns an array that in turn calls ferret. The 
array that comes back from ferret is actually correctly sorted:

6 0.928179502487183
5 0.928179502487183
4 0.928179502487183
3 0.928179502487183
2 0.928179502487183
1 0.928179502487183

The first number is the id, the second is the rank.

Now what happens is ar_find_by_contents calls retrieve_records. And 
retrieve_records produces a SELECT statement like so:

SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 0, 5

It took me a while to figure out that things are being passed around as 
a hash, and hence the wacky order of the ids in the IN clause. Now the 
problem with this statement  is that MySQL doesn't return records in the 
order that the ids appear in the IN clause. MySQL returns records in the 
order of the Primary Key on the table, which happens to be the id 
column. So MySQL is returning records 1, 2, 3, 4, 5, 6, in that order. 
Then the LIMIT clause kicks in and truncates the results to 1 through 5.

Now the rest of ar_find_by_contents valiantly tries to order the AR 
results with the rank returned by ferret (my first table above). The 
problem is, record 6, the youngest, is no longer in the results because 
LIMIT took it out. So AAF sorts records 1 through 5 descending.

Following along we can see how page two returns only record 6. On page 
two, the limit changes to

SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 5, 5

Once again, My SQL returns records 1, 2, 3, 4, 5, 6, but this time the 
limit returns only the last record, id 6. And then AAF sorts that 
descending.

I working on a patch for the version I have by making MySQL return only 
the correct set of records in the first place. In other words, ensuring 
that the only ids present in the IN clause are the ones that should 
appear on page 1 of the results, or page 2, or pane N.

So my  AR query for page 1 looks like

SELECT * FROM model WHERE id IN (6, 5, 4, 3, 2) LIMIT 0, 5

and the AR query for page 2 looks like

SELECT * FROM model WHERE id IN (1) LIMIT 0, 5

I got it working, but in the process have made every other search, not 
work. Funny. I'm sure I'll figure it out.

Anyway, Jens, that's the gist of the problem at least how it relates to 
MySQL. Other databases may vary.

Regards
Sheldon Maloff
veer.com


Jens Kraemer wrote:
> Hi Max,
> 
> thanks for your detailed report. Might well be that I broke one or more
> of the various combinations of pagination / sorting / active record
> conditions (where you might specify :order, too, btw) in trunk.
> 
> I'll look into it asap.
> 
> Cheers,
> Jens
Posted by Karl Meisterheim (kmeister2000)
on 05.05.2008 04:29
I have been having the exact same problem with a custom search
application I've been working on.

When more than one page of results are returned, sorting sorts only the
items on the current page and if you go to the next page of results, it
is no longer sorted.  (for example, when sorting by a date).

If there's anyway I can help test your fix please let me know, I'm eager
to get a solution to this bug.

Thank you.


Sheldon Maloff wrote:
> Hello Jens,
> 
> I think I know what's going on here, because our descending sort 
> searches are broken too and I have started to investigate what's causing 
> the problem and trying to fix it. I have a January '08 version of the 
> trunk. I believe it's changed quite a lot since that time.
> 
> Jens, I don't think it's anything you "broke" but rather an artifiact of 
> how MySQL works. At least, I'm using MySQL and this is the behaviour I 
> see.
> 
> I created 6 records, whose ids are 1 to 6 in my database. I am 
> paginating on every 5 records. In my reverse sort I would expect to see 
> records 6, 5, 4, 3, 2 on page 1 of the results. And id 1 on page 2 of 
> the results.
> 
Posted by Jeff Webb (Guest)
on 09.07.2008 06:08
(Received via mailing list)
Sheldon/Jens,

Any chance that this has been figured out and patched?

Great analysis Sheldon!

Jeff


On Thu, May 1, 2008 at 7:24 PM, Sheldon Maloff <smaloff@veer.com> wrote:

>
> 5 0.928179502487183
> SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 0, 5
> results with the rank returned by ferret (my first table above). The
> descending.
> and the AR query for page 2 looks like
> Sheldon Maloff
> > I'll look into it asap.
> >
> > Cheers,
> > Jens
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>



--

Jeff Webb
jeff@boowebb.com
http://boowebb.com/