Forum: Ferret Problem with large index file

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A8a89a63970db7bb0f5b1407cc47bbe4?d=identicon&s=25 Jeffrey Gelens (Guest)
on 2007-02-26 07:18
Hello,

Ferret created a 4.5GB> index file.
$ 4534029210 2007-02-26 12:46 _el.cfs

The creation of the index went smoothly. Searching through this index
also works fine. However whenever I try to get the contents of an
indexed document I get an error when the document number is above
621108:

irb(main):080:0> searcher[621108].load
IOError: IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:289 - fsi_seek_i
        seeking pos -1206037603: <Invalid argument>

As you can see it is seeking on a negative position. I did a strace on
this with the following results:

_llseek(3, 18446744072766697140, 0xbfc555e0, SEEK_SET) = -1 EINVAL
(Invalid argument)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(2, "./service.cgi:40:in `[]\'", 24./service.cgi:40:in `[]') = 24
write(2, ": ", 2: )                       = 2
write(2, "IO Error occured at <except.c>:7"..., 43IO Error occured at
<except.c>:79 in xraise) = 43
write(2, " (", 2 ()                       = 2
write(2, "IOError", 7IOError)                  = 7
write(2, ")\n", 2)
)                      = 2
write(2, "Error occured in fs_store.c:289 "..., 90Error occured in
fs_store.c:289 - fsi_seek_i
        seeking pos -942854476: <Invalid argument>

The lseek() on 18446744072766697140 is over the maximum of long. That's
why lseek is probably giving this error.

How can I fix this?
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-02-26 17:41
(Received via mailing list)
On 2/26/07, Jeffrey Gelens <jgelens@gmail.com> wrote:
> irb(main):080:0> searcher[621108].load
> write(2, "./service.cgi:40:in `[]\'", 24./service.cgi:40:in `[]') = 24
>
> The lseek() on 18446744072766697140 is over the maximum of long. That's
> why lseek is probably giving this error.
>
> How can I fix this?

Actually 18446744072766697140 is too big for even a 64bit long (or a
long long on 32bit systems) so I'd love to know where that number is
coming from. It is obviously a bug somewhere else. Unfortunately it
would be impractical for you to send me the index. If it is possible
to give me access to your server I should be able to sort this out
though. Otherwise, I'll look into it, but I can't promise anything.

Dave
A8a89a63970db7bb0f5b1407cc47bbe4?d=identicon&s=25 Jeffrey Gelens (Guest)
on 2007-02-27 02:42
David Balmain wrote:
> Actually 18446744072766697140 is too big for even a 64bit long (or a
> long long on 32bit systems) so I'd love to know where that number is
> coming from. It is obviously a bug somewhere else. Unfortunately it
> would be impractical for you to send me the index. If it is possible
> to give me access to your server I should be able to sort this out
> though. Otherwise, I'll look into it, but I can't promise anything.
>
> Dave

I can't give access to the server as its a company server, sorry.
Is there a possibility that the index somehow got corrupted? At the
moment I am recreating the index, which takes several days. I'll report
on the findings when it's done.
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-02-27 04:49
(Received via mailing list)
On 2/27/07, Jeffrey Gelens <jgelens@gmail.com> wrote:
> I can't give access to the server as its a company server, sorry.
> Is there a possibility that the index somehow got corrupted? At the
> moment I am recreating the index, which takes several days. I'll report
> on the findings when it's done.

It could be a corrupt index but I doubt it. I think it is more likely
a bug somewhere else. I have built indexes of this size before without
problem though. Perhaps if you could give me an idea of what type of
data you are putting in the index I could try and rebuild a similar
index here to diagnose the problem. ie. how many documents, how many
fields, what are the field settings (eg stored, untokenized,
term_vectors etc), how large are the fields on average and what sort
of data (eg numbers dates english language, code etc) and also what
analyzer are you using. This should give me enough information to
build a very similar index here and hopefully reproduce the problem.

Cheers,
Dave

PS: send it to me privately if you prefer
A8a89a63970db7bb0f5b1407cc47bbe4?d=identicon&s=25 Jeffrey Gelens (Guest)
on 2007-03-05 08:52
I recreated the index with this option :max_merge_docs => 100000 and it
seems to work great.
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-03-06 04:16
(Received via mailing list)
Hi Jeffrey,

That's great to hear. If you have a chance, could you try copying the
index (cp -r) and then opening the copy and optimizing it. Then let me
know if you are still getting the same problem you were getting
before. I understand if this is too much trouble. 5Gb is a lot of data
to be playing around with.

Cheers,
Dave
A8a89a63970db7bb0f5b1407cc47bbe4?d=identicon&s=25 Jeffrey Gelens (Guest)
on 2007-03-06 06:51
After optimization the exact same problem occurs.

Greetings,
Jeffrey Gelens

David Balmain wrote:
> Hi Jeffrey,
>
> That's great to hear. If you have a chance, could you try copying the
> index (cp -r) and then opening the copy and optimizing it. Then let me
> know if you are still getting the same problem you were getting
> before. I understand if this is too much trouble. 5Gb is a lot of data
> to be playing around with.
>
> Cheers,
> Dave
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2007-03-06 13:26
(Received via mailing list)
On 3/6/07, Jeffrey Gelens <jgelens@gmail.com> wrote:
> After optimization the exact same problem occurs.

Thanks Jeffrey, I'll keep looking into this. I'm glad your index works
for the moment though.

Cheers,
Dave
This topic is locked and can not be replied to.