Forum: Ferret Pagination, sorting and conditions: the combination is breaking my search results...

73c04e9ef9ca435c5b19a2e765ae6d20?d=identicon&s=25 Max Williams (max-williams)
on 2008-04-25 12:46
(Received via mailing list)
Hi

I have a problem with a search where i want to get some results
according to
some conditions, sort the results, and then paginate over the sorted
collection.

My search looks like this:

      @results  = TeachingObject.find_with_ferret(search_term,
                        #(ferret) options
                        {:page => options[:page],
                        :per_page => options[:per_page],
                        :sort =>
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )},
                        #find options
                        { :conditions => ["id in (?)", @ids] } )

where @ids is an array of ids from which the results must come (ie a
collection of 'allowed' results of which @results will be a subset):
often
the search term is set to * to get all of this collection in @results.
':asset_count' is an untokenized ferret field that stores integers.

Through debugging and experimenting, i've observed the following:

   - The overall results set, without sorting/pagination, is correct
   (therefore :conditions is being taken into account).
   - If :per_page is set to be so large that no pagination is required,
then
   the sorting occurs properly (therefore sorting is being taken into
account)
   - If :per_page is reduced so that pagination is required, then the
   sorting of the overall set breaks: it seems as if the results are
ordered by
   id, then paginated.
   - However, on every individual page, the results are sorted properly
for
   that page, ie each page-size subset is internally sorted.
   - If i sort on a different untokenized field,  the problem persists.

It seems as if the pagination is happening and THEN the sorting is
happening, which obviously doesn't give the expected results.  This is
just
a theory on my part though.

Can anyone tell me how to fix this problem?  I've been gnashing my teeth
over it for over a day now and can't find any solutions...

thanks
max
C9dd93aa135988cabf9183d3210665ca?d=identicon&s=25 Jens Kraemer (Guest)
on 2008-04-25 14:00
(Received via mailing list)
Hi Max,

thanks for your detailed report. Might well be that I broke one or more
of the various combinations of pagination / sorting / active record
conditions (where you might specify :order, too, btw) in trunk.

I'll look into it asap.

Cheers,
Jens

On Fri, Apr 25, 2008 at 11:45:30AM +0100, Max Williams wrote:
>                         {:page => options[:page],
>
>    that page, ie each page-size subset is internally sorted.
> max
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer@webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold
73c04e9ef9ca435c5b19a2e765ae6d20?d=identicon&s=25 Max Williams (max-williams)
on 2008-04-25 14:36
(Received via mailing list)
Fantastic, thanks Jens.

BTW, I can't :order by :asset_count as it's a method, rather than an
instance variable.  I wish AR::find would let me order by method
returns.

While we're talking, i just tried another way of searching and found
another
bit of weirdness:  instead of find_with_ferret, i tried using

ActsAsFerret::find(term, class_array) instead, like so:

       #sorting/pagination is broken here
       @results = ActsAsFerret::find(search_term, [TeachingObject],
                      #(ferret) options
                      { :page => options[:page],
                        :per_page => options[:per_page],
                        :sort =>
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )
                      },
                      #find options - need to specify conditions for
each
searched class individually
                      {:conditions => { :teaching_object => ["id in
(?)",
@ids] } }
                   )

This gave exactly the same results as the previous search.  However,
when i
added another class to the search, it works!

       #this works!
       @results = ActsAsFerret::find(search_term, [TeachingObject,
LearningObject],
                      #(ferret) options
                      { :page => options[:page],
                        :per_page => options[:per_page],
                        :sort =>
Ferret::Search::SortField.new(:asset_count,
:type => :integer, :reverse => true )
                      },
                      #find options - need to specify conditions for
each
searched class individually
                      {:conditions => { :teaching_object => ["id in
(?)",
@ids],
                                               :learning_object => ["id
in
(?)", @ids] } }
                   )

So, this works while the previous doesn't.  It so happens in this case
that
LearningObject is a 'sister' class of TeachingObject (they both extend a
class called Resource), where both are saved in a table called resources
using STI, and at the moment i don't actually have any LearningObject
records, so adding LearningObject doesn't harm my results.  Obviously
though
this isn't a nice workaround.

Sorry to pile bug reports on you, i just mention it in case it's
relevant.
:/

Thanks a lot
max

2008/4/25 Jens Kraemer <kraemer@webit.de>:
96519a64eb06e187e71ab794d61f3e7c?d=identicon&s=25 Sheldon Maloff (sheldonmaloff)
on 2008-05-02 04:24
Hello Jens,

I think I know what's going on here, because our descending sort
searches are broken too and I have started to investigate what's causing
the problem and trying to fix it. I have a January '08 version of the
trunk. I believe it's changed quite a lot since that time.

Jens, I don't think it's anything you "broke" but rather an artifiact of
how MySQL works. At least, I'm using MySQL and this is the behaviour I
see.

I created 6 records, whose ids are 1 to 6 in my database. I am
paginating on every 5 records. In my reverse sort I would expect to see
records 6, 5, 4, 3, 2 on page 1 of the results. And id 1 on page 2 of
the results.

What I see is a method called ar_find_by_contents. It calls
find_id_by_content that returns an array that in turn calls ferret. The
array that comes back from ferret is actually correctly sorted:

6 0.928179502487183
5 0.928179502487183
4 0.928179502487183
3 0.928179502487183
2 0.928179502487183
1 0.928179502487183

The first number is the id, the second is the rank.

Now what happens is ar_find_by_contents calls retrieve_records. And
retrieve_records produces a SELECT statement like so:

SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 0, 5

It took me a while to figure out that things are being passed around as
a hash, and hence the wacky order of the ids in the IN clause. Now the
problem with this statement  is that MySQL doesn't return records in the
order that the ids appear in the IN clause. MySQL returns records in the
order of the Primary Key on the table, which happens to be the id
column. So MySQL is returning records 1, 2, 3, 4, 5, 6, in that order.
Then the LIMIT clause kicks in and truncates the results to 1 through 5.

Now the rest of ar_find_by_contents valiantly tries to order the AR
results with the rank returned by ferret (my first table above). The
problem is, record 6, the youngest, is no longer in the results because
LIMIT took it out. So AAF sorts records 1 through 5 descending.

Following along we can see how page two returns only record 6. On page
two, the limit changes to

SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 5, 5

Once again, My SQL returns records 1, 2, 3, 4, 5, 6, but this time the
limit returns only the last record, id 6. And then AAF sorts that
descending.

I working on a patch for the version I have by making MySQL return only
the correct set of records in the first place. In other words, ensuring
that the only ids present in the IN clause are the ones that should
appear on page 1 of the results, or page 2, or pane N.

So my  AR query for page 1 looks like

SELECT * FROM model WHERE id IN (6, 5, 4, 3, 2) LIMIT 0, 5

and the AR query for page 2 looks like

SELECT * FROM model WHERE id IN (1) LIMIT 0, 5

I got it working, but in the process have made every other search, not
work. Funny. I'm sure I'll figure it out.

Anyway, Jens, that's the gist of the problem at least how it relates to
MySQL. Other databases may vary.

Regards
Sheldon Maloff
veer.com


Jens Kraemer wrote:
> Hi Max,
>
> thanks for your detailed report. Might well be that I broke one or more
> of the various combinations of pagination / sorting / active record
> conditions (where you might specify :order, too, btw) in trunk.
>
> I'll look into it asap.
>
> Cheers,
> Jens
990b8e5a8fe507b03b901bd4e0a23a91?d=identicon&s=25 Karl Meisterheim (kmeister2000)
on 2008-05-05 04:29
I have been having the exact same problem with a custom search
application I've been working on.

When more than one page of results are returned, sorting sorts only the
items on the current page and if you go to the next page of results, it
is no longer sorted.  (for example, when sorting by a date).

If there's anyway I can help test your fix please let me know, I'm eager
to get a solution to this bug.

Thank you.


Sheldon Maloff wrote:
> Hello Jens,
>
> I think I know what's going on here, because our descending sort
> searches are broken too and I have started to investigate what's causing
> the problem and trying to fix it. I have a January '08 version of the
> trunk. I believe it's changed quite a lot since that time.
>
> Jens, I don't think it's anything you "broke" but rather an artifiact of
> how MySQL works. At least, I'm using MySQL and this is the behaviour I
> see.
>
> I created 6 records, whose ids are 1 to 6 in my database. I am
> paginating on every 5 records. In my reverse sort I would expect to see
> records 6, 5, 4, 3, 2 on page 1 of the results. And id 1 on page 2 of
> the results.
>
B41cfdb6193be973bc40b619e711909d?d=identicon&s=25 Jeff Webb (Guest)
on 2008-07-09 06:08
(Received via mailing list)
Sheldon/Jens,

Any chance that this has been figured out and patched?

Great analysis Sheldon!

Jeff


On Thu, May 1, 2008 at 7:24 PM, Sheldon Maloff <smaloff@veer.com> wrote:

>
> 5 0.928179502487183
> SELECT * FROM model WHERE id IN (6, 1, 2, 3, 4, 5) LIMIT 0, 5
> results with the rank returned by ferret (my first table above). The
> descending.
> and the AR query for page 2 looks like
> Sheldon Maloff
> > I'll look into it asap.
> >
> > Cheers,
> > Jens
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk@rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>



--

Jeff Webb
jeff@boowebb.com
http://boowebb.com/
36feb4959db6ab8259a44962f0fa761f?d=identicon&s=25 Jens Krämer (jkraemer)
on 2008-07-09 11:50
(Received via mailing list)
Hi folks,

On 09.07.2008, at 06:01, Jeff Webb wrote:
> Sheldon/Jens,
>
> Any chance that this has been figured out and patched?

I just committed a fix to this problem to trunk, so pagination with
sorting and AR conditions should work now.


cheers,
Jens

--
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database
73c04e9ef9ca435c5b19a2e765ae6d20?d=identicon&s=25 Max Williams (max-williams)
on 2008-07-18 13:12
(Received via mailing list)
Hi Jens/all

I've been playing with latest version, and it seems that the combination
of
AR conditions, paginating and AR order (eg ":order => "name") seems to
be
fine now, even on our server, which was were it was broken before (it
always
worked locally, weirdly).

However, i've just tried it with results that are sorted by ferret on
the
basis of their boost score, and it seems broken.

I'm sorry i don't have any tests set up but if i walk through an example
can
anyone see something that maybe i'm doing wrong?  Or shed any light?

I use AR conditions to filter the results through a set of ids that an
individual user is allowed to view.  I just have a method
User#allowed_ids
for this that returns an array of integers.  For the purpose of
illustration, though, let's say that the allowed ids are everything
between
1000 & 2000.
allowed_ids = (1000..2000).to_a
=> [1000, 1001, 1002, etc, 2000]

For a particular search (on 'rhythm') i get these results back - the
order
is the order calculated by ferret/aaf on the basis of boost values: i've
collected them by id for clarity.  In this example i've set a big
per_page
to get all the results.

>> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page =>
1000 }, {} ).collect(&:id)
=> [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411, 1834, 702,
1827, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 2129, 2130, 2131, 1858,
1859, 1860, 1861, 1865, 2132, 2141, 2345, 2350, 2352, 2353, 2356, 2360,
2362, 2366, 2368, 2371, 2372, 2373, 2376, 2377, 2378, 2384, 2385, 2389,
2407, 2414, 2417, 2419, 2427, 2428, 2438, 2439, 2458, 2459, 2473, 2474,
2475, 2477, 2478, 2133, 2349, 2361, 2363, 2365, 2370, 2375, 2383, 2386,
2392, 2415, 2430, 2431, 2440, 2441, 2442, 2472, 2142, 3751, 2161, 1236]

Now, when i apply the condition of only being in the allowed ids, i'd
expect
the remaining ids to be in the same order as above, and that is in fact
the
case:

>> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page =>
1000 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id)
=> [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688,
1679,
1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236]

Just to check, let's do the original again (with no AR conditions) and
just
do set intersection with the allowed_ids
>> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page =>
1000 }, {} ).collect(&:id) & allowed_ids
=> [1830, 1831, 1826, 1832, 1825, 1833, 1834, 1827, 1689, 1680, 1688,
1679,
1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236]

OK - looks good.  But, when i try to paginate into some actual pages,
the
order breaks:  first, without AR conditions:

>> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page =>
10 }, {} ).collect(&:id)
=> [4038, 698, 4039, 1830, 1831, 1826, 1832, 1825, 1833, 411]

That all seems to be in order - i get the first ten results from the big
list, above.

Now, if we were to apply the allowed_ids condition here, we'd expect the
results to start
[1830, 1831, 1826, 1832, 1825, 1833, ...] - right?  Because we should
have
the same ordering applied to the remaining resources, and then we get
the
first ten for page 1.  But, the ordering is different -

>> ActsAsFerret::find("rhythm", [TeachingObject],{ :page => 1, :per_page =>
10 }, {:conditions => ["id in (?)", allowed_ids] } ).collect(&:id)
=> [1826, 1825, 1689, 1680, 1688, 1679, 1686, 1684, 1676, 1236]

So, it seems that pagination + ferret score ordering + AR conditions is
a
bad combination

Again, sorry to not supply test cases but we don't use them (i know, i
know!!!).  Can anyone shed any light?

thanks - max

2008/7/9 Jens Kraemer <jk@jkraemer.net>:
B41cfdb6193be973bc40b619e711909d?d=identicon&s=25 Jeff Webb (Guest)
on 2008-07-19 00:29
(Received via mailing list)
Max,

Here is what I changed in the current stable version of AAF in my local
plugin install to get conditions and pagination working. I have not
fully
tested it except for my particular use cases but it seems to work out.
Sorting is not factored into these changes.

Basically if there is no pagination OR no AR conditions you can let
ferret
do all the work for you. If you have both of these then you need to
return
all results, merge with ferret IDs, then do your offset and limit. I
replicated what Jens has done on the AAF trunk.

Hope this works for you. I have not heard if Jens is going to add this
to
the current branch as a hotfix.

replace the existing ar_by_contents method in class_methods.rb with:

####################################
#changes to ar_find_by_contents AAF 0.4.3#
####################################

    def ar_find_by_contents(q, options = {}, find_options = {})
      result_ids = {}
      has_conditions = !find_options[:conditions].blank? || caller.find{
|call| call =~ %r{active_record/associations} }

      # odd case - cannot do pagination combo with AR & Ferret
      # must retrieve all then paginate after
      if options[:per_page] && has_conditions
        late_paginate = true
        offset  = find_options.delete(:offset)
        limit   = find_options.delete(:limit)
        options.delete(:page)
        options.delete(:per_page)
        find_options.delete(:offset)
        find_options.delete(:limit)
        options[:limit] = :all
      end

      total_hits = find_id_by_contents(q, options) do |model, id, score,
data|
        # stores ids, index and score of each hit for later ordering of
        # results
        result_ids[id] = [ result_ids.size + 1, score ]
      end

      result = retrieve_records( { self.name => result_ids },
find_options )

      if has_conditions
        # what we got from the database is our full result set, so take
it's
size
        total_hits = result.length

        if late_paginate
          result = result[offset..offset+limit-1]
        end
      end

      [ total_hits, result ]
    end


Jeff



On Fri, Jul 18, 2008 at 4:11 AM, Max Williams
<toastkid.williams@gmail.com>
wrote:

> I'm sorry i don't have any tests set up but if i walk through an example
> For a particular search (on 'rhythm') i get these results back - the order
> 2407, 2414, 2417, 2419, 2427, 2428, 2438, 2439, 2458, 2459, 2473, 2474,
> 1686, 1684, 1676, 1858, 1859, 1860, 1861, 1865, 1236]
>
> the same ordering applied to the remaining resources, and then we get the
> know!!!).  Can anyone shed any light?
>>>
>> --
>>
> http://rubyforge.org/mailman/listinfo/ferret-talk
>



--

Jeff Webb
jeff@boowebb.com
http://boowebb.com/
73c04e9ef9ca435c5b19a2e765ae6d20?d=identicon&s=25 Max Williams (max-williams)
on 2008-07-19 12:52
(Received via mailing list)
Hi Jeff - thanks a lot for looking at this.

I discovered that since last august, will paginate lets you paginate any
arbitrary array - so you can paginate an existing set of results.  so
what
i'm doing now is this
- get the results using ferret find, passing through the AR conditions
and a
large per page which means there's no pagination, effectively.  The
sorting
(by ferret score) is correct.
- cache the results with memcache, using params (minus the page k-v
pair) as
a cache key - this means that the retrieve will be slow the first time
but
very quick if the user just wants to look at another page
- then paginate the results, simply doing "@all_results.paginate(:page
=>
params[:page], :per_page => 20)"

This is pretty similar to your patch i think - get results with ferret
and
then paginate - and it's working fine for me.  The size of my result set
returned by ferret is rarely more than 150 big, so getting all results
isn't
a problem.  Obviously it's not very scaleable though.

thanks!
max

2008/7/18 Jeff Webb <jeff@boowebb.com>:
A3dc91112e326dfbb3a02849a2c86a97?d=identicon&s=25 Jesson Lee (jessonlee)
on 2008-08-15 09:57
Max Williams wrote:
> Hi Jeff - thanks a lot for looking at this.
>
> I discovered that since last august, will paginate lets you paginate any
> arbitrary array - so you can paginate an existing set of results.  so
> what
> i'm doing now is this
> - get the results using ferret find, passing through the AR conditions
> and a
> large per page which means there's no pagination, effectively.  The
> sorting
> (by ferret score) is correct.
> - cache the results with memcache, using params (minus the page k-v
> pair) as
> a cache key - this means that the retrieve will be slow the first time
> but

>
> 2008/7/18 Jeff Webb <jeff@boowebb.com>:

Beauty Equipment: http://www.beautymachine.net
E96cc17b783bb2543dc0268368c250b5?d=identicon&s=25 Andrew Lippert (alippert)
on 2009-06-30 00:15
Jens Krämer wrote:
> Hi folks,
>
> On 09.07.2008, at 06:01, Jeff Webb wrote:
>> Sheldon/Jens,
>>
>> Any chance that this has been figured out and patched?
>
> I just committed a fix to this problem to trunk, so pagination with
> sorting and AR conditions should work now.
>
>
> cheers,
> Jens
>
> --
> Jens Kr�mer
> http://www.jkraemer.net/ - Blog
> http://www.omdb.org/     - The new free film database

Jens,

I am currently running the latest ferret (0.11.6), acts_as_ferret(0.4.4)
and will_paginate(2.3.11) running against a mysql database. I have a
search with conditions (member.published_at IS NOT NULL) intended to be
ordered by ferret score and paginated. However, every attempt to display
the results in sequence order by ferret_score is failing.

For example, if I have 150 matching records and a page length of 30
records, the first 30 records returned are the oldest 30 records of the
150 sorted by score (ultimately looks like 'order by id asc,
ferret_score desc'). The next 30 records returned include those with
scores that are higher then the last record in the first set of 30, but
newer records (have a larger primary key) and are hence returned in the
second page of results.

So, it appears that you cannot paginate a ferret score based search with
this combination of plugins. Is that true or am I doing something
woefully wrong?

Andrew Lippert
A99ccfa84810c6016a9d3dc3429a2e00?d=identicon&s=25 Tony Primerano (primerano)
on 2010-02-10 14:50
running ferret (0.11.6), aaf (0.4.4 or 0.4.5) and will_paginate (2.3.6)
this problem still exists.

Simple find with pagination works fine
Domain.find_with_ferret('search term', {:page => 1, :per_page => 10})

As soon as I add a condition the results are no longer ordered by ferret
score
Domain.find_with_ferret('search term', {:page => 1, :per_page => 10},
:conditions => [])

Is  Jeff Webb's code above recommended to fix this issue?

Tony Primerano
A99ccfa84810c6016a9d3dc3429a2e00?d=identicon&s=25 Tony Primerano (primerano)
on 2010-02-10 16:00
Tony Primerano wrote:
> running ferret (0.11.6), aaf (0.4.4 or 0.4.5) and will_paginate (2.3.6)
> this problem still exists.
>
> Simple find with pagination works fine
> Domain.find_with_ferret('search term', {:page => 1, :per_page => 10})
>
> As soon as I add a condition the results are no longer ordered by ferret
> score
> Domain.find_with_ferret('search term', {:page => 1, :per_page => 10},
> :conditions => [])
>
> Is  Jeff Webb's code above recommended to fix this issue?
>
> Tony Primerano

FYI..  my condition limited the search results if a user was not an
admin.  To work around this bug I added a field with a fixed piece of
text.  If the user is not an admin I do "Search Term NOT (f00bar1234)"
to filter the admin only results.

When an admin does a search I leave off the "NOT (f00bar1234)"

Not ideal but it seems to work.
This topic is locked and can not be replied to.