Any fast way to update non-indexed fields?

sserdyuk · June 20, 2006, 5:34pm

Hi,

From looking at Ruby sources it seems that every update method deletes
and reinserts documents. It makes sense if indexed fields are changed
but what if it is not the case? It would speed up update a lot indexes
did not have to be updated twice for nothing. Any quick way to do it?

–
Sergei S.
Red Leaf Software LLC
web: http://redleafsoft.com

sserdyuk · June 20, 2006, 8:09pm

Sergei S. wrote:

Hi,

From looking at Ruby sources it seems that every update method deletes
and reinserts documents. It makes sense if indexed fields are changed
but what if it is not the case? It would speed up update a lot indexes
did not have to be updated twice for nothing. Any quick way to do it?

I’m not an expert with Lucene, but I believe that’s how Lucene indexes
work - there is no update, only create and delete.

sserdyuk · June 21, 2006, 3:33am

On 6/21/06, ryan king [email protected] wrote:

Sergei S. wrote:

Hi,

From looking at Ruby sources it seems that every update method deletes
and reinserts documents. It makes sense if indexed fields are changed
but what if it is not the case? It would speed up update a lot indexes
did not have to be updated twice for nothing. Any quick way to do it?

I’m not an expert with Lucene, but I believe that’s how Lucene indexes
work - there is no update, only create and delete.

It is in fact the way Lucene works. The main problem with the update
method in Ferret is that for each update it needs to open an
IndexReader to read and delete the old doc, then close it and open and
IndexWriter to open a new doc. In the version of Ferret I’m working on
now you’ll be able to do updates directly on the IndexWriter so it
should be a lot faster.

As for just updating the stored-unindexed fields, I’ll have to think
about it. It’ll add a bit of complexity to the merge process which I’m
not to keen on. But it is certainly possible. Sergei, what type of
field is it that you need to update? And to everyone else on the list,
is this a common action? That is, do you often need to update
non-indexed fields?

Cheers,
Dave

sserdyuk · June 22, 2006, 4:45pm

Reported as a bug: http://ferret.davebalmain.com/trac/ticket/69

If I were to wish for something in coming Ferret, I’d wish “stability”.
I am getting seg_faults every other time I am doing this:

sserdyuk · June 22, 2006, 4:35pm

They are stored non-indexed fields. In my case I wanted to have some
stock data in searchable index. This is not top priority, as I can
really have a second index or a database and do lookups by :id.

If I were to wish for something in coming Ferret, I’d wish “stability”.
I am getting seg_faults every other time I am doing this:

def self.internal_field_values(fieldname)
term_enum = @@reader.terms_from(Ferret::Index::Term.new(fieldname,
“”));
out = []
while term_enum.term and (term_enum.term.field == fieldname) # seg
faults here
out << term_enum.term.text
break unless term_enum.next?
end
out
end

As for just updating the stored-unindexed fields, I’ll have to think
about it. It’ll add a bit of complexity to the merge process which I’m
not to keen on. But it is certainly possible. Sergei, what type of
field is it that you need to update? And to everyone else on the list,
is this a common action? That is, do you often need to update
non-indexed fields?

Cheers,
Dave

sserdyuk · June 27, 2006, 12:59am

[resending… for some reason, this didn’t go through this morning…]

On Jun 22, 2006, at 7:45 AM, Sergei S. wrote:

If I were to wish for something in coming Ferret, I’d wish
“stability”.
I am getting seg_faults every other time I am doing this:

Dave, I see you’ve done some work with Valgrind, but I’m not sure how
much. To catch errors and memory leaks with KinoSearch, I wrote up a
simple script that runs the whole test suite under Valgrind. The
test suite takes around 15 minutes to run that way instead of 9
seconds (on the one box where I have Valgrind available), so I only
run it rarely – always when preparing a release, and sometimes when
debugging new or refactored C code. Some of the code in KinoSearch’s
test suite doesn’t even produce output; it’s just there to exercise
an area where there might be memory problems.

Do you have something like that going on with Ferret? It’s been
extremely helpful for me. I don’t think I’ve seen a single segfault
bug report since KinoSearch was released, though I have missed a
couple memory leaks because the Valgrind output can be a little hard
to interpret (there are a few harmless items in Perl that look like
memory leaks to Valgrind, which makes real leaks harder to spot).

Marvin H.
Rectangular Research
http://www.rectangular.com/

sserdyuk · June 27, 2006, 12:59am

[resending… for some reason, this didn’t go through earlier…]

On Jun 22, 2006, at 7:45 AM, Sergei S. wrote:

If I were to wish for something in coming Ferret, I’d wish
“stability”.
I am getting seg_faults every other time I am doing this:

Dave, I see you’ve done some work with Valgrind, but I’m not sure how
much. To catch errors and memory leaks with KinoSearch, I wrote up a
simple script that runs the whole test suite under Valgrind. The
test suite takes around 15 minutes to run that way instead of 9
seconds (on the one box where I have Valgrind available), so I only
run it rarely – always when preparing a release, and sometimes when
debugging new or refactored C code. Some of the code in KinoSearch’s
test suite doesn’t even produce output; it’s just there to exercise
an area where there might be memory problems.

Do you have something like that going on with Ferret? It’s been
extremely helpful for me. I don’t think I’ve seen a single segfault
bug report since KinoSearch was released, though I have missed a
couple memory leaks because the Valgrind output can be a little hard
to interpret (there are a few harmless items in Perl that look like
memory leaks to Valgrind, which makes real leaks harder to spot).

Marvin H.
Rectangular Research
http://www.rectangular.com/

sserdyuk · June 28, 2006, 2:07am

On 6/23/06, Marvin H. [email protected] wrote:

simple script that runs the whole test suite under Valgrind. The
couple memory leaks because the Valgrind output can be a little hard
to interpret (there are a few harmless items in Perl that look like
memory leaks to Valgrind, which makes real leaks harder to spot).

Hi Marvin,

I do use Valgrind. In fact the reason I have been so quiet on the list
lately is I’ve been working really hard on cleaning up the code in
Ferret so that I can realease a more stable version. The tool I need
to make more use of is gcov. The problem is that some areas of the
code just aren’t getting exercised enough.

Cheers,
Dave