Adding dependant objects to an Index?

I have design question and I’m wondering what’s the best way to solve
it. I’m trying to index HTML content where I have a single model object
call it Article that is an acts_as_ferret model, and an article consists
of many HTML files. I would like to index all of the content of the
article with ferret and search across it. However, since the article’s
content is spread over several files how would I do that if I don’t have
an object in the database for each page? Is there a way from within my
Article object to add more than one Document to the index? These pages
would obviously be attached to the life cycle of the Article. In other
words if I remove the article I want to remove all the pages that went
along with that article. How would I do that?

Another question I have is I would like to search the elements of the
article like author, title, etc, and search the contents of those
Articles within one search field. Can I place all of this data inside a
single index? Or do I have to use the multi_search method?

Thanks
Charlie

On Mon, Oct 02, 2006 at 03:30:59PM +0200, Charlie H. wrote:

words if I remove the article I want to remove all the pages that went
along with that article. How would I do that?

Do you want to be able to find single html files in search results, or
is it ok to only find the whole article, without knowing which file the
hit was in ?

In the first case, you can either create a Page model representing a
single page and index that, or don’t use acts_as_ferret at all and do
the indexing yourself.

The easier way is the second case, just create a method named
html_content returning the concatenated contents from all the files
belonging to your article, and add :html_content to the fields list in
your call to acts_as_ferret. This will index all files belonging to
your article in a single Ferret document.

Another question I have is I would like to search the elements of the
article like author, title, etc, and search the contents of those
Articles within one search field. Can I place all of this data inside a
single index? Or do I have to use the multi_search method?

you’ll only need multi_search if you have several indexes (that is,
several Model classes where you called acts_as_ferret).
In your case, if you choose the second way, just index your meta data
together with the content, aaf will by default search in all fields.

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Jens K. wrote:

On Mon, Oct 02, 2006 at 03:30:59PM +0200, Charlie H. wrote:

words if I remove the article I want to remove all the pages that went
along with that article. How would I do that?

Do you want to be able to find single html files in search results, or
is it ok to only find the whole article, without knowing which file the
hit was in ?

In the first case, you can either create a Page model representing a
single page and index that, or don’t use acts_as_ferret at all and do
the indexing yourself.

This is actually more the scenario. I want the user to be able to jump
right to the relevant portions of article and see their search results.
Possibly with highlights etc. Mainly because these articles can be
quite large.

Another question I have is I would like to search the elements of the
article like author, title, etc, and search the contents of those
Articles within one search field. Can I place all of this data inside a
single index? Or do I have to use the multi_search method?

you’ll only need multi_search if you have several indexes (that is,
several Model classes where you called acts_as_ferret).
In your case, if you choose the second way, just index your meta data
together with the content, aaf will by default search in all fields.

So bottom line is create a Page object for each page of the article and
put that stuff in the DB, and use the acts_as_ferret options to find it.
Use the multi-search across the two models.

Thanks
Charlie

On Mon, Oct 02, 2006 at 08:47:04PM +0200, Charlie H. wrote:

Jens K. wrote:
[…]

you’ll only need multi_search if you have several indexes (that is,
several Model classes where you called acts_as_ferret).
In your case, if you choose the second way, just index your meta data
together with the content, aaf will by default search in all fields.

So bottom line is create a Page object for each page of the article and
put that stuff in the DB, and use the acts_as_ferret options to find it.
Use the multi-search across the two models.

right. to further simplify things, you could index the article’s meta
data with each page, via an indexed method you mention in your field
list. that method should retrieve the meta data from the parent article
object and get this indexed together with each page.

this might actually be faster than using multi_search (unless your
article meta data is really large so that the overhead of indexing it
with each page weighs in). In addition it would save you from having
to handle different kinds of objects (Articles and Pages) in your result
set.

cheers,
Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs