Ruby Forum Ferret > Re: Finding out all terms from search results. How?

Posted by Neville Burnell (Guest)
on 16.06.2006 01:50
(Received via mailing list)
How about something like this, where "field2" is the field you want to
collect

values = []
index.search_each(query) do |doc, score|
  values.push index[doc]["field2"]
end
Posted by Sergei Serdyuk (Guest)
on 16.06.2006 15:59
Hi Neville,

It would work for a small resultset, but that is not an assumption I 
would want to make. I hope there is a way to get this info from Ferret 
directly.

Sergei.




Neville Burnell wrote:
> How about something like this, where "field2" is the field you want to
> collect
> 
> values = []
> index.search_each(query) do |doc, score|
>   values.push index[doc]["field2"]
> end
Posted by Lee Marlow (Guest)
on 16.06.2006 16:47
(Received via mailing list)
Why would this only work for a small resultset?  Are you looking for a
list of terms from the other field as tokenized by ferret or for just
the value you put in that field during indexing?

-Lee
Posted by Jeremy Bensley (Guest)
on 16.06.2006 17:13
(Received via mailing list)
While I don't completely understand all contstraints, it seems as though 
a
generalized version of Neville's solution that goes through all fields 
in
the document would work just fine.

i.e.
fields = []
index.search_each(query) do |doc, score|
  fields += doc.all_fields
end
values = fields.collect { |f| f.string_value }

I don't really know what part of 'Ferret doing this' would be ... the
information would have to be stored and retrieved from the index. Please
elaborate if we do not seem to completely understand the problem.
Posted by Sergei Serdyuk (Guest)
on 16.06.2006 18:45
Let me illustrate my problem a bit more.

There is an index with 1.2M books in it. Every book has category field 
and every book can be currently in stock, which is stored in stock 
field. Now, I generally expect to have 50-60% of books to be stocked. So 
it leaves me with 600,000 books I would need to iterate to find out what 
categories are currently stocked.

It sounds like borderline task where one would think a database would be 
more appropriate, but ability to do advanced search over this collection 
of books is a top priority and database would not provide that.

--
Sergei Serdyuk
Red Leaf Software LLC
web: http://redleafsoft.com
Posted by Sergei Serdyuk (Guest)
on 16.06.2006 18:51
I would think that it can provide a set of terms that are connected to a 
set of documents without pulling out those documents one by one.

--
Sergei Serdyuk
Red Leaf Software LLC
web: http://redleafsoft.com

> Jeremy Bensley wrote:
> I don't really know what part of 'Ferret doing this' would be ... the
> information would have to be stored and retrieved from the index. Please
> elaborate if we do not seem to completely understand the problem.
Posted by Erik Hatcher (Guest)
on 16.06.2006 18:55
(Received via mailing list)
I'm not familiar enough with Ferret, but I do this sort filtering and
set intersections with Java Lucene, primarily using Solr, from a Ruby
on Rails front-end.

I build up bit sets (using Solr's new OpenBitSet class) that
represent "all items collected" and apply that filter to searches and
also intersect (using bit set ANDing) with other sets such as "all
objects from 1861" and "all poetry genre objects", and so on.  I've
also customized Solr to return back facet counts, so given your
example it could show how many books were in stock in each category
and allow you to filter to see all those books easily too.  Using
these types of set intersection operations even bypasses the
traditional Lucene search by simply dealing with efficiently
structure sets of document id's.

	Erik
Posted by Sergei Serdyuk (Guest)
on 16.06.2006 21:08
Thank you Erik. It is not clear to me what it would look like in Ferret, 
but it sounds like a good direction to dig in.

> Erik Hatcher wrote:
> I'm not familiar enough with Ferret, but I do this sort filtering and
> set intersections with Java Lucene, primarily using Solr, from a Ruby
> on Rails front-end.
> 
> I build up bit sets (using Solr's new OpenBitSet class) that
> represent "all items collected" and apply that filter to searches and
> also intersect (using bit set ANDing) with other sets such as "all
> objects from 1861" and "all poetry genre objects", and so on.  I've
> also customized Solr to return back facet counts, so given your
> example it could show how many books were in stock in each category
> and allow you to filter to see all those books easily too.  Using
> these types of set intersection operations even bypasses the
> traditional Lucene search by simply dealing with efficiently
> structure sets of document id's.
> 
> 	Erik
Posted by Erik Hatcher (Guest)
on 16.06.2006 22:47
(Received via mailing list)
On Jun 16, 2006, at 3:08 PM, Sergei Serdyuk wrote:
> Thank you Erik. It is not clear to me what it would look like in  
> Ferret,
> but it sounds like a good direction to dig in.

In Java, building up such filters is done with code like this:

       TermEnum termEnum = reader.terms(new Term(field, ""));
       while (true) {
         Term term = termEnum.term();
         if (term == null || !term.field().equals(field)) break;

         termDocs.seek(term);
         OpenBitSet bitSet = new OpenBitSet(reader.numDocs());
         while (termDocs.next()) {
           bitSet.set(termDocs.doc());
         }

         // ... cache bitSet for future use ...

         if (! termEnum.next()) break;
       }

Ferret has a comparable API underneath that should make this sort of
thing feasible in pure Ruby somehow.

	Erik
Posted by David Balmain (Guest)
on 17.06.2006 02:29
(Received via mailing list)
On 6/17/06, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>          if (term == null || !term.field().equals(field)) break;
>        }
>
> Ferret has a comparable API underneath that should make this sort of
> thing feasible in pure Ruby somehow.

It is similar in Ferret. Have a look here to see the solution to a
similar problem;

    http://www.ruby-forum.com/topic/56232#40931

Hope that helps.

Cheers,
Dave