How about something like this, where "field2" is the field you want to collect values = [] index.search_each(query) do |doc, score| values.push index[doc]["field2"] end
on 2006-06-16 01:50
on 2006-06-16 15:59
Hi Neville, It would work for a small resultset, but that is not an assumption I would want to make. I hope there is a way to get this info from Ferret directly. Sergei. Neville Burnell wrote: > How about something like this, where "field2" is the field you want to > collect > > values = [] > index.search_each(query) do |doc, score| > values.push index[doc]["field2"] > end
on 2006-06-16 16:47
Why would this only work for a small resultset? Are you looking for a list of terms from the other field as tokenized by ferret or for just the value you put in that field during indexing? -Lee
on 2006-06-16 17:13
While I don't completely understand all contstraints, it seems as though
a
generalized version of Neville's solution that goes through all fields
in
the document would work just fine.
i.e.
fields = []
index.search_each(query) do |doc, score|
fields += doc.all_fields
end
values = fields.collect { |f| f.string_value }
I don't really know what part of 'Ferret doing this' would be ... the
information would have to be stored and retrieved from the index. Please
elaborate if we do not seem to completely understand the problem.
on 2006-06-16 18:45
Let me illustrate my problem a bit more. There is an index with 1.2M books in it. Every book has category field and every book can be currently in stock, which is stored in stock field. Now, I generally expect to have 50-60% of books to be stocked. So it leaves me with 600,000 books I would need to iterate to find out what categories are currently stocked. It sounds like borderline task where one would think a database would be more appropriate, but ability to do advanced search over this collection of books is a top priority and database would not provide that. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com
on 2006-06-16 18:51
I would think that it can provide a set of terms that are connected to a set of documents without pulling out those documents one by one. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com > Jeremy Bensley wrote: > I don't really know what part of 'Ferret doing this' would be ... the > information would have to be stored and retrieved from the index. Please > elaborate if we do not seem to completely understand the problem.
on 2006-06-16 18:55
I'm not familiar enough with Ferret, but I do this sort filtering and set intersections with Java Lucene, primarily using Solr, from a Ruby on Rails front-end. I build up bit sets (using Solr's new OpenBitSet class) that represent "all items collected" and apply that filter to searches and also intersect (using bit set ANDing) with other sets such as "all objects from 1861" and "all poetry genre objects", and so on. I've also customized Solr to return back facet counts, so given your example it could show how many books were in stock in each category and allow you to filter to see all those books easily too. Using these types of set intersection operations even bypasses the traditional Lucene search by simply dealing with efficiently structure sets of document id's. Erik
on 2006-06-16 21:08
Thank you Erik. It is not clear to me what it would look like in Ferret, but it sounds like a good direction to dig in. > Erik Hatcher wrote: > I'm not familiar enough with Ferret, but I do this sort filtering and > set intersections with Java Lucene, primarily using Solr, from a Ruby > on Rails front-end. > > I build up bit sets (using Solr's new OpenBitSet class) that > represent "all items collected" and apply that filter to searches and > also intersect (using bit set ANDing) with other sets such as "all > objects from 1861" and "all poetry genre objects", and so on. I've > also customized Solr to return back facet counts, so given your > example it could show how many books were in stock in each category > and allow you to filter to see all those books easily too. Using > these types of set intersection operations even bypasses the > traditional Lucene search by simply dealing with efficiently > structure sets of document id's. > > Erik
on 2006-06-16 22:47
On Jun 16, 2006, at 3:08 PM, Sergei Serdyuk wrote: > Thank you Erik. It is not clear to me what it would look like in > Ferret, > but it sounds like a good direction to dig in. In Java, building up such filters is done with code like this: TermEnum termEnum = reader.terms(new Term(field, "")); while (true) { Term term = termEnum.term(); if (term == null || !term.field().equals(field)) break; termDocs.seek(term); OpenBitSet bitSet = new OpenBitSet(reader.numDocs()); while (termDocs.next()) { bitSet.set(termDocs.doc()); } // ... cache bitSet for future use ... if (! termEnum.next()) break; } Ferret has a comparable API underneath that should make this sort of thing feasible in pure Ruby somehow. Erik
on 2006-06-17 02:29
On 6/17/06, Erik Hatcher <erik@ehatchersolutions.com> wrote: > if (term == null || !term.field().equals(field)) break; > } > > Ferret has a comparable API underneath that should make this sort of > thing feasible in pure Ruby somehow. It is similar in Ferret. Have a look here to see the solution to a similar problem; http://www.ruby-forum.com/topic/56232#40931 Hope that helps. Cheers, Dave
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.