How about something like this, where "field2" is the field you want to collect values = [] index.search_each(query) do |doc, score| values.push index[doc]["field2"] end
on 16.06.2006 01:50
on 16.06.2006 15:59
Hi Neville, It would work for a small resultset, but that is not an assumption I would want to make. I hope there is a way to get this info from Ferret directly. Sergei. Neville Burnell wrote: > How about something like this, where "field2" is the field you want to > collect > > values = [] > index.search_each(query) do |doc, score| > values.push index[doc]["field2"] > end
on 16.06.2006 16:47
Why would this only work for a small resultset? Are you looking for a list of terms from the other field as tokenized by ferret or for just the value you put in that field during indexing? -Lee
on 16.06.2006 17:13
While I don't completely understand all contstraints, it seems as though
a
generalized version of Neville's solution that goes through all fields
in
the document would work just fine.
i.e.
fields = []
index.search_each(query) do |doc, score|
fields += doc.all_fields
end
values = fields.collect { |f| f.string_value }
I don't really know what part of 'Ferret doing this' would be ... the
information would have to be stored and retrieved from the index. Please
elaborate if we do not seem to completely understand the problem.
on 16.06.2006 18:45
Let me illustrate my problem a bit more. There is an index with 1.2M books in it. Every book has category field and every book can be currently in stock, which is stored in stock field. Now, I generally expect to have 50-60% of books to be stocked. So it leaves me with 600,000 books I would need to iterate to find out what categories are currently stocked. It sounds like borderline task where one would think a database would be more appropriate, but ability to do advanced search over this collection of books is a top priority and database would not provide that. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com
on 16.06.2006 18:51
I would think that it can provide a set of terms that are connected to a set of documents without pulling out those documents one by one. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com > Jeremy Bensley wrote: > I don't really know what part of 'Ferret doing this' would be ... the > information would have to be stored and retrieved from the index. Please > elaborate if we do not seem to completely understand the problem.
on 16.06.2006 18:55
I'm not familiar enough with Ferret, but I do this sort filtering and set intersections with Java Lucene, primarily using Solr, from a Ruby on Rails front-end. I build up bit sets (using Solr's new OpenBitSet class) that represent "all items collected" and apply that filter to searches and also intersect (using bit set ANDing) with other sets such as "all objects from 1861" and "all poetry genre objects", and so on. I've also customized Solr to return back facet counts, so given your example it could show how many books were in stock in each category and allow you to filter to see all those books easily too. Using these types of set intersection operations even bypasses the traditional Lucene search by simply dealing with efficiently structure sets of document id's. Erik
on 16.06.2006 21:08
Thank you Erik. It is not clear to me what it would look like in Ferret, but it sounds like a good direction to dig in. > Erik Hatcher wrote: > I'm not familiar enough with Ferret, but I do this sort filtering and > set intersections with Java Lucene, primarily using Solr, from a Ruby > on Rails front-end. > > I build up bit sets (using Solr's new OpenBitSet class) that > represent "all items collected" and apply that filter to searches and > also intersect (using bit set ANDing) with other sets such as "all > objects from 1861" and "all poetry genre objects", and so on. I've > also customized Solr to return back facet counts, so given your > example it could show how many books were in stock in each category > and allow you to filter to see all those books easily too. Using > these types of set intersection operations even bypasses the > traditional Lucene search by simply dealing with efficiently > structure sets of document id's. > > Erik
on 16.06.2006 22:47
On Jun 16, 2006, at 3:08 PM, Sergei Serdyuk wrote: > Thank you Erik. It is not clear to me what it would look like in > Ferret, > but it sounds like a good direction to dig in. In Java, building up such filters is done with code like this: TermEnum termEnum = reader.terms(new Term(field, "")); while (true) { Term term = termEnum.term(); if (term == null || !term.field().equals(field)) break; termDocs.seek(term); OpenBitSet bitSet = new OpenBitSet(reader.numDocs()); while (termDocs.next()) { bitSet.set(termDocs.doc()); } // ... cache bitSet for future use ... if (! termEnum.next()) break; } Ferret has a comparable API underneath that should make this sort of thing feasible in pure Ruby somehow. Erik
on 17.06.2006 02:29
On 6/17/06, Erik Hatcher <erik@ehatchersolutions.com> wrote: > if (term == null || !term.field().equals(field)) break; > } > > Ferret has a comparable API underneath that should make this sort of > thing feasible in pure Ruby somehow. It is similar in Ferret. Have a look here to see the solution to a similar problem; http://www.ruby-forum.com/topic/56232#40931 Hope that helps. Cheers, Dave