Forum: Ferret Adding extra fields to an index (using RDig?)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ed -. (Guest)
on 2007-02-10 13:29
Hello everyone,

I am writing an application which collects a set of web sites and caches
them locally for offline viewing. I want to do searches on this
collection and associate extra data with each result (e.g date
collected, reason for collection, perhaps a sequence number).

Now all this data exists when the harvesting is done and could be stored
in a database. I want to use RDig to index my collection of sites I also
want to associate the index results with my extra data and display them
along with search results.

The index is built once and searched many times so I want searching to
be as efficient as possible.

The simplest way is to use e.g. the local URL as a key into my database
(easy but needs to be done each time and could slow things down)

Is it possible to add extra fields to ferret index entries?

If so, can this be done at create time or must it be done afterwards? If
it can be done at create time is there a way to get RDig to insert these
extra fields?

Thanks for any help with this

Ed
Jens K. (Guest)
on 2007-02-10 19:55
(Received via mailing list)
Hi!

On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote:
> along with search results.
>
> The index is built once and searched many times so I want searching to
> be as efficient as possible.
>
> The simplest way is to use e.g. the local URL as a key into my database
> (easy but needs to be done each time and could slow things down)
>
> Is it possible to add extra fields to ferret index entries?

of course that is possible, RDig itself uses three different fields -
:url, :title and :data.

> If so, can this be done at create time or must it be done afterwards? If
> it can be done at create time is there a way to get RDig to insert these
> extra fields?

Ferret documents cannot be modified after they have been created, so any
custom fields you want to add have to be added when the index is
created.

Atm RDig doesn't support custom fields, however I'd be happy to apply a
patch adding this capability ;-)


cheers,
Jens

--
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer 
removed_email_address@domain.invalid
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
Ed -. (Guest)
on 2007-02-11 20:17
Hi,

To summarise, I can add custom fields at create time but not afterwards.
Furthermore RDig does not presently support the addition of custom
fields.

Please could you post your patch to enable RDig to support custom
fields.

Thanks

Ed

Jens K. wrote:
> Hi!
>
> On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote:
>
>
> Ferret documents cannot be modified after they have been created, so any
> custom fields you want to add have to be added when the index is
> created.
>
> Atm RDig doesn't support custom fields, however I'd be happy to apply a
> patch adding this capability ;-)
>
>
Jens K. (Guest)
on 2007-02-12 11:12
(Received via mailing list)
On Sun, Feb 11, 2007 at 07:17:51PM +0100, Ed Ed wrote:
> Hi,
>
> To summarise, I can add custom fields at create time but not afterwards.
> Furthermore RDig does not presently support the addition of custom
> fields.

Right.
>
> Please could you post your patch to enable RDig to support custom
> fields.

oh, what I wanted to say is that if *you* built such a feature into
RDig, I'd be happy to integrate it. Sorry if I've been unclear here.

Jens
--
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer 
removed_email_address@domain.invalid
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
Ed -. (Guest)
on 2007-02-12 13:55
Jens K. wrote:

> oh, what I wanted to say is that if *you* built such a feature into
> RDig, I'd be happy to integrate it. Sorry if I've been unclear here.
>

:-(

OK, I'll have a look at the code and see what might be simplest. Seems
to me that adding an extra optional directive to the configuration file
is easiest. This could name a file containing a user-supplied hook which
rdig/indexer.rb could try to include. Or just define the hook procedure
in the config file?

Then if the hook procedure existed the indexer could pass it the
document and doc data structure and the hook procedure could augment the
doc structure as required.

I guess the only Ferret requirement here is that the hook must add the
same set of extra fields to each document (even if values NULL)

Ed
Jens K. (Guest)
on 2007-02-12 14:53
(Received via mailing list)
On Mon, Feb 12, 2007 at 12:55:54PM +0100, Ed Ed wrote:
[..]
>
> OK, I'll have a look at the code and see what might be simplest. Seems
> to me that adding an extra optional directive to the configuration file
> is easiest. This could name a file containing a user-supplied hook which
> rdig/indexer.rb could try to include. Or just define the hook procedure
> in the config file?

defining the hook method in the config sounds good.

> Then if the hook procedure existed the indexer could pass it the
> document and doc data structure and the hook procedure could augment the
> doc structure as required.

exactly.

> I guess the only Ferret requirement here is that the hook must add the
> same set of extra fields to each document (even if values NULL)

not even that, you can have different ferret documents with a different
set of fields.


Jens

--
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer 
removed_email_address@domain.invalid
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
This topic is locked and can not be replied to.