Baffling sort problem

I had sort-by-date working almost perfectly with my app. It was
behaving as expected for most data, but had a few hiccups with
certain data. I investigated and discovered that the correct data was
storing this in my ferret index: “1999-10-18 00:00:00” and the
incorrect data was storing this: “Mon Oct 18 00:00:00 EDT
1999” (oops…)

So I of course had to fix the incorrect data, and I figured while i
was at it, I would normalize and minimize everything to this format:
“19991018000000”.

Now it seems that sorting on this column does not work at all.

I have not changed how the data is stored in the index, it has always
been:

:search_date => {:term_vectors => :no, :index
=> :untokenized, :store => :yes }

Any ideas?

Thanks.
John

On Mar 31, 2007, at 8:07 PM, John B. wrote:

I investigated and discovered that the correct data was
storing this in my ferret index: “1999-10-18 00:00:00” and the
incorrect data was storing this: “Mon Oct 18 00:00:00 EDT
1999” (oops…)

So I of course had to fix the incorrect data, and I figured while i
was at it, I would normalize and minimize everything to this format:
“19991018000000”.

Now it seems that sorting on this column does not work at all.

I just normalized everything to the “1999-10-18 00:00:00” format, and
it is working again.

My guess is that ferret is treating the data differently if it is
only numeric characters? I’ve been using ferret for quite some time
and have never come accros a type issue like this.

Also, on that same model, I have another ferret field, configured the
very same way, that is always a number; sorting works perfectly.
However, those numbers are much smaller (number of messages in the
discussion thread).

So maybe ferret has a problem with big numbers?

Anyway, I’m glad it’s working again, but would be very interested to
know what the problem was.

Cheers,
John

(shamelessly bumping this thread up…)

On Mar 31, 2007, at 8:30 PM, John B. wrote:

Anyway, I’m glad it’s working again, but would be very interested to
know what the problem was.

Does anyone have any insight into what may be causing this behavior?

Thanks,
John

On 4/4/07, John B. [email protected] wrote:

(shamelessly bumping this thread up…)

Shame on you. :smiley:

Bumping this up won’t help anything, because they’re probably only one
person, David, who can answer you question, and he seems to be in and
out, with large periods of time with no net connectivity. It’s not
that people aren’t willing to help, it’s just that most of us can’t.

-ryan

On Apr 5, 2007, at 10:45 PM, David B. wrote:

Now, if you don’t specify the sort type, Ferret will try and determine
the sort type for you. It will first try to parse the field as an
integer and then as a float before defaulting the a string type. My
guess is that the reason John’s sort isn’t working is that Ferret is
detecting an integer field so it is trying to sort by integer but the
integers don’t fit in a 4 byte integer, hence the problem.

Hope that explains it. If not, let me know and I’ll try and make it a
little clearer.

That explains it very well, thanks. I wasn’t aware of the default
sort behavior. It might be worth it to mention that (more?)
prominently in the documentation.

Thanks again,
John

On 4/5/07, Ryan K. [email protected] wrote:

On 4/4/07, John B. [email protected] wrote:

(shamelessly bumping this thread up…)

Shame on you. :smiley:

Bumping this up won’t help anything, because they’re probably only one
person, David, who can answer you question, and he seems to be in and
out, with large periods of time with no net connectivity. It’s not
that people aren’t willing to help, it’s just that most of us can’t.

And I’m back again. :slight_smile:

You can actually specify how you want fields to be sorted, ie whether
you want to sort by string, bytes, integer or float;

sort_field = SortField.new(:field_name, {:type => :float, :reverse 

=> true})
hits = index.search(query, :sort => Sort.new([sort_field,
SortField::SCORE]))

So, John, in your case you will want to set the type to :string or
even better :byte. Sorting by :byte basically does a strcmp, ignoring
locale and encoding, making it faster than sorting by :string.
Actually, sorting by integer would be even better and can be done if
you store the dates with day precision (eg 19770905). Unfortunately
19991018000000 won’t fit into a single integer so it won’t work for
this precision.

Now, if you don’t specify the sort type, Ferret will try and determine
the sort type for you. It will first try to parse the field as an
integer and then as a float before defaulting the a string type. My
guess is that the reason John’s sort isn’t working is that Ferret is
detecting an integer field so it is trying to sort by integer but the
integers don’t fit in a 4 byte integer, hence the problem.

Hope that explains it. If not, let me know and I’ll try and make it a
little clearer. I’m in a bit of a rush to get through all the emails
on the list to see if there are any issues I need to deal with before
putting out another release.

Cheers,
Dave

On Apr 5, 2007, at 10:45 PM, David B. wrote:

putting out another release.
That explains it very well, thanks. I wasn’t aware of the default
sort behavior. It might be worth it to mention that (more?)
prominently in the documentation.

Thanks again,
John

On 4/9/07, John Joseph B. [email protected] wrote:

That explains it very well, thanks. I wasn’t aware of the default
sort behavior. It might be worth it to mention that (more?)
prominently in the documentation.

Point taken. I’ve added this to SortField:

  • Note 1: Care should be taken when using the :auto sort-type since
  • numbers will occur before other strings in the index so if you are
    sorting
  • a field with both numbers and strings (like a title field which
    might have
  • “24” and “Prison Break”) then the sort_field will think it is
    sorting
  • integers when it really should be sorting strings.
  • Note 2: When sorting by integer, integers are only 4 bytes so
    anything
  • larger will cause strange sorting behaviour.

Plus this where the :sort parameter is mentioned;

  • :sort:: A Sort object or sort string describing how the
    field
  •              should be sorted. A sort string is made up of field 
    

names

  •              which cannot contain spaces and the word "DESC" if 
    

you

  •              want the field reversed, all seperated by commas. 
    

For

  •              example; "rating DESC, author, title". Note that 
    

Ferret

  •              will try to determine a field's type by looking at 
    

the

  •              first term in the index and seeing if it can be 
    

parsed as

  •              an integer or a float. Keep this in mind as you may 
    

need

  •              to specify a fields type to sort it correctly. For 
    

more

  •              on this, see the documentation for SortField
    

Let me know if you have any suggestions where else you might expect to
see something about this.

Cheers,
Dave