David B. wrote:
David
Thanks for your continued help and assistance.
I don’t have code at this stage because I started writing it one way and
realised that the way I was writing it through counts in Ruby would not
work because of pagination.
A little more background is in order. The user will be presented with a
pull down menu with 5 selections in a main category. Doing 6 queries
(one main query) and 5 count queries in this instance is not a problem.
The problem arises when they select one of these categories.
They will then be presented with up to 5 other category structures. One
would be new or old, another would be type (up to 5 nodes), another
would be, for example, book type (such as fiction, no fiction,
authbiography) etc. (up to 20 categories), another could have up to 40
categories. The user is free to select any of these category nodes
because they may be interested in old books and fiction. I will
therefore have to populate all of the nodes with the number of documents
in each node. This could leave me with spawing 60 odd queries to count
the number of documents in each node. Subsequent selections of nodes
would refine the result set down further.
What I really would like to do is 2 or 3 queries. One which does the
normal search over the document set (collection) and the second to
populate each node in the classification structure with the number of
documents that match each node.
It is pretty easy in 2 queries to tell if there are any documents in
each node but doing a count over all the nodes is more tricky. I was
originally going to have another table which had a row for each node
with the name of the node (and structure) in one field and the
document_id’s in another field. For example, [Fishing, “doc1 doc2 doc3
doc4”], [Fishing/Fiction, “doc2, doc3”], [Fishing/Non Fiction, "doc 1]
etc. I would then get a result set that provided all the categories that
had hits against a given query. However, it does not provide the number
of documents against each node. So I could not populate the pull down
categories with Fishing (2), Fiction (1), Non Fiction (1) etc.
Therefore, what I really need is a function that will return the number
of documents in each node of a given classification structure. An
addition to the Num_Docs capability already available perhaps.
I could easily produce a results set that would be like this…
Fishing doc1
Fishing doc2
Fishing/Fiction doc3
Fishing/Fiction doc1
Fishing/Non Fiction doc4
etc…
Num_Docs would provide 5 in this instance but what I really want is:
Fishing 2
Fishing/Fiction 2
Fishing/Non Fiction 1
etc…
All that, and done in 1 or 2 queries over and above the original
search… Simple eh!
I hope that I have not confused you to much, but this is something that
I desperately need or my project is kaput!
I found this:
http://www.mail-archive.com/[email protected]/msg00343.html and
http://www.ruby-forum.com/topic/56232#40931
Do you think that this is the way to go?
Thanks very much.
On 7/10/06, BlueJay [email protected] wrote:
fishing_count = index.search_each("sport AND fishing", :num_docs =>
I have several sub categories (taxonomy really) and what I was thinking
of doing was this in 2 queries. Index the data as per normal so that you
can do the full text search but also index the structure of the taxonomy
and have each branch contain the records that contain it.
Run one big search over the fulltext to get the list of hits and then
use this list as a query against the second index to get all the
category bits.
I’m not sure what you mean by “category bits”. Can you possible
implement the categories like this;
sport/
sport/shooting/
sport/fishing/
sport/fishing/fly
sprot/fishing/deep_sea
etc.
Then, lets say you have a query in query_str. You can get all results
in the sport category like this;
index.search_each(query_str + "AND category:sport/*") {
# ...
}
You can get all results in the fishing category like this;
index.search_each(query_str + "AND category:sport/fishing/*") {
# ...
}
Am I making sense?
This would be a big query though - although it should be quick but I
would need to re-index the category bits everytime a document was added.
You’ve lost me. Could you give some example code?
Does this make sense and/or would it make sense in Ferret. I have done
this before in another search engine that required special category
manipulation but never with Ferret and not sure how to go about doing
this in Ferret.
I am not sure about your idea around filtering the results
I’ll explain filtering once I understand better what it is you are
trying to do.
Cheers,
Dave