Starting from scratch

I have the following models:

===
class Person < ActiveRecord::Base
has_many :person_organisations, :dependent => true
has_many :organisations, :through => :person_organisations

has_many :person_categories, :dependent => true
has_many :categories, :through => :person_categories
end

class Category < ActiveRecord::Base
has_many :person_categories, :dependent => true
has_many :persons, :through => :person_categories
end

class Organisation < ActiveRecord::Base
has_many :documents

has_many :person_organisations, :dependent => true
has_many :persons, :through => :person_organisations
end

class Document < ActiveRecord::Base
belongs_to :organisation

has_many :document_topics, :dependent => true
has_many :topics, :through => :document_topics
end

class Topic < ActiveRecord::Base
has_many :document_topics, :dependent => true
has_many :documents, :through => :document_topics
end

I’d like to be able to search for:

  • A person by using an organisation name or by using a category name (as
    well as the person attributes - surname etc.).

  • A document using topic names and organisation names (as well as the
    document attributes - title etc.).

My first attempt at this is here (Index only partially built - Ferret - Ruby-Forum)

How does this look:

===
class Person < ActiveRecord::Base
acts_as_ferret :additional_fields => [:organisation_names,
:category_titles]

has_many :person_categories, :dependent => true
has_many :categories, :through => :person_categories

has_many :person_organisations, :dependent => true
has_many :organisations, :through => :person_organisations

def organisation_names
organisations.collect { |organisation| organisation.name }.join ’ ’
end

def category_titles
categories.collect { |category| categories.name }.join ’ ’
end

end

class Category < ActiveRecord::Base
has_many :person_categories, :dependent => true
has_many :persons, :through => :person_categories
end

class Organisation < ActiveRecord::Base
acts_as_ferret

has_many :documents

has_many :person_organisations, :dependent => true
has_many :contacts, :through => :person_organisations
end

class Document < ActiveRecord::Base
acts_as_ferret :additional_fields => [:organisation_name]

belongs_to :organisation

def organisation_name
return organisation.name
end
end

My rebuild is still ending prematurely with only half of the data
indexed.

On Tue, Nov 21, 2006 at 06:08:17PM +0100, Matthew Planchant wrote:

has_many :person_organisations, :dependent => true
end

return organisation.name

end
end

My rebuild is still ending prematurely with only half of the data
indexed.

ok, so what exactly do you do to rebuild your index, and what searches
do
you run to check for completeness of your indexes ?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

My rebuild is still ending prematurely with only half of the data
indexed.

ok, so what exactly do you do to rebuild your index, and what searches
do you run to check for completeness of your indexes ?

ruby script/console production
Person.rebuild_index
Organisation.rebulid_index
Document.rebuild_index

Then I try a find_by_contents on some of the people (using any of the
fields i.e. surname). I can find people up to about two thirds of the
way through the data (in id order) but the final third aren’t found.

If I edit, say, the last record (which previously could not be found
using a search) then I can see in the log that the edited data is added
to the index then when I search for it it is found.

So for some reason it looks as though a large part of the data isn’t
being added to the index.

On Wed, Nov 22, 2006 at 12:10:45AM +0100, Matthew Planchant wrote:

Then I try a find_by_contents on some of the people (using any of the
fields i.e. surname). I can find people up to about two thirds of the
way through the data (in id order) but the final third aren’t found.

ok, to keep things simple please keep trying with only the Person class
for now. what does the log look like if you do Person.rebuild_index ?

What happens if you do the same (with the same data) in development
mode (maybe on another machine) ?

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

OK. Here is what is happening. When the indexing starts it selects the
first 1000 records to add to the index. These seem to be added to the
index. When it has added the 1000th record another select appears in the
log file to get the rest of records (There are 1561 records in the
table):

SELECT * FROM (SELECT TOP 561 * FROM (SELECT TOP 1561 * FROM persons) AS
tmp1 ) AS tmp2e

However this select doesn’t get the records from 1001 to 1561. It get 1
to 1000. So these first 500 or so records are added to the index twice
but the final 500 are never added.

Jens K. wrote:

On Wed, Nov 22, 2006 at 12:10:45AM +0100, Matthew Planchant wrote:

Then I try a find_by_contents on some of the people (using any of the
fields i.e. surname). I can find people up to about two thirds of the
way through the data (in id order) but the final third aren’t found.

ok, to keep things simple please keep trying with only the Person class
for now. what does the log look like if you do Person.rebuild_index ?

Good idea. I’ll give this a go. I’ll try:

class Person < ActiveRecord::Base
acts_as_ferret
end

What happens if you do the same (with the same data) in development
mode (maybe on another machine) ?

Same thing happens in development mode.

Matthew Planchant wrote:

However this select doesn’t get the records from 1001 to 1561. It get 1
to 1000. So these first 500 or so records are added to the index twice
but the final 500 are never added.

Small mistake here I should read:

However this select doesn’t get the records from 1001 to 1561. It gets 1
to 561. So these first 500 or so records are added to the index twice
but the final 500 are never added.

I have a model with only ~350 records this seems to have been added as
it should have been. I can find all the records from searching. The
problem seems to occur when there are more then 1000 records (the
batch_size in rebuild_index).

On Thu, Nov 23, 2006 at 11:55:12AM +0100, Matthew Planchant wrote:

I have a model with only ~350 records this seems to have been added as
it should have been. I can find all the records from searching. The
problem seems to occur when there are more then 1000 records (the
batch_size in rebuild_index).

so does setting the batch size to a higher value, say 10000, work for
you ?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

By changing batch_size in rebuild_index to 2000 I’ve been able to index
all my records (I only have ~1500 records).

There may be a problem with this patch
http://projects.jkraemer.net/acts_as_ferret/ticket/24

I’m using MS SQL Server. Should this make a difference?

I think we have the problem.

http://dev.rubyonrails.org/ticket/6254

On Thu, Nov 23, 2006 at 11:28:45AM +0100, Matthew Planchant wrote:

OK. Here is what is happening. When the indexing starts it selects the
first 1000 records to add to the index. These seem to be added to the
index. When it has added the 1000th record another select appears in the
log file to get the rest of records (There are 1561 records in the
table):

SELECT * FROM (SELECT TOP 561 * FROM (SELECT TOP 1561 * FROM persons) AS
tmp1 ) AS tmp2e

ehm, what kind of database is this ? looks really strange :wink:

However this select doesn’t get the records from 1001 to 1561. It get 1
to 1000. So these first 500 or so records are added to the index twice
but the final 500 are never added.

is it possible the :limit and :offset options of ActiveRecord are not
supported or buggy for your kind of database ?

what do you get when calling
Person.find(:all, :limit => 1000, :offset => 1000)
on the console ?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Jens K. wrote:

On Thu, Nov 23, 2006 at 02:20:14PM +0100, Matthew Planchant wrote:

I think we have the problem.

http://dev.rubyonrails.org/ticket/6254

does it work for you with that patch applied ?

You mean if I reverse the patch and go back to not using batches? I
don’t know I haven’t tried that yet but I assume that it will as it’s
the SQl which is generated to create the batches which doesn’t work with
MS SQL Server.

For the moment I’ve set the batch size to 5000 (i.e. greater than then
number of records and have in any of my models) and it works.

On Thu, Nov 23, 2006 at 02:20:14PM +0100, Matthew Planchant wrote:

I think we have the problem.

http://dev.rubyonrails.org/ticket/6254

does it work for you with that patch applied ?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66