Forum: Ferret Questions about Searching

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
22580f640e491f0920a192610d1df393?d=identicon&s=25 Tom Davies (Guest)
on 2006-01-20 15:19
(Received via mailing list)
Hi,

I have some questions about searching with Ferret.  I have a user
index with first_name, last_name and full_name (which is just first
plus last with a space).

Here are a couple of questions:

1) If I store the fields tokenized, it appears as though queries are
case-insensitive.  However, for untokenized, the query is
case-sensitive.  How can I make the untokenized searches
case-insensitive?

2) If I have a field with whitespace in it, how can I search for the
whitespace using wildcard searches.  For instance, if the full_name I
am searching for is "John Doe", how can I build a query for that.  I
have tried numerous combinations, here are a couple I tried:
  full_name:"#{query}"*  <-- This will match every field in the index
  full_name:"#{query}*" <-- This matches nothing

3) When I store the fields as untokenized, exact matches seem to not
work for me anymore.  For instance, this query worked for tokenized
first_name, but does not for untokenized first_name:
  first_name:John

But this query will return results:
  first_name:Joh?

4) Is there a better way to search for the first and last name
combination that storing another index with them concatenated?

Thanks,

Tom
4d6a47158a7c8a032e5f6a4da8976d7d?d=identicon&s=25 Erik Hatcher (Guest)
on 2006-01-20 17:14
(Received via mailing list)
On Jan 20, 2006, at 8:39 AM, Tom Davies wrote:
> Here are a couple of questions:
>
> 1) If I store the fields tokenized, it appears as though queries are
> case-insensitive.  However, for untokenized, the query is
> case-sensitive.  How can I make the untokenized searches
> case-insensitive?

By lowercasing the text you index and lowercasing the text in the
query.  Search matches are case sensitive always, but generally
tokenized fields get lowercased along the way, and the query parser
lowercases terms also (generally by the same analyzer).

> 2) If I have a field with whitespace in it, how can I search for the
> whitespace using wildcard searches.  For instance, if the full_name I
> am searching for is "John Doe", how can I build a query for that.  I
> have tried numerous combinations, here are a couple I tried:
>   full_name:"#{query}"*  <-- This will match every field in the index
>   full_name:"#{query}*" <-- This matches nothing

I strongly suspect the issue is the field being analyzed during query
parsing.  I'm not sure what facilities Ferret has for doing this
exactly off the top of my head, but in Java Lucene there is a
PerFieldAnalyzerWrapper that helps with this.  The space would be
problematic, as well as the double quotes in how you have created
it.  You may need to create a WildcardQuery via the API rather than
using the parser.

> 3) When I store the fields as untokenized, exact matches seem to not
> work for me anymore.  For instance, this query worked for tokenized
> first_name, but does not for untokenized first_name:
>   first_name:John
>
> But this query will return results:
>   first_name:Joh?

This again has to do with the case and analyzer issue.  You are
using a parser that does analysis of the text.  Try using the parser
to create a Query and see what it consists of (.to_s?).

> 4) Is there a better way to search for the first and last name
> combination that storing another index with them concatenated?

It really all depends on what your searching needs are.  What does
the user interface for searching demand?

	Erik
22580f640e491f0920a192610d1df393?d=identicon&s=25 Tom Davies (Guest)
on 2006-01-20 17:35
(Received via mailing list)
Thanks Erik.  Very informative.  I suspect the QueryParser either has
some bugs or is not designed to handle this scenario.  I will try
manually building the specific types of queries via the API.

> It really all depends on what your searching needs are.  What does
> the user interface for searching demand?

For the full name searches, I just wanted wild card matches on the
right hand side of the query.  For instance, any of these should
result in john doe being found:
  J, Jo, Joh, John, John D, etc.

Tom
4d6a47158a7c8a032e5f6a4da8976d7d?d=identicon&s=25 Erik Hatcher (Guest)
on 2006-01-20 19:54
(Received via mailing list)
On Jan 20, 2006, at 10:56 AM, Tom Davies wrote:
> Thanks Erik.  Very informative.  I suspect the QueryParser either has
> some bugs or is not designed to handle this scenario.  I will try
> manually building the specific types of queries via the API.

There are many tricky scenarios because of the necessity for
whitespace and special characters to be handled as separators and
operators and the analyzer (and when it is used) with the query parser.

So no bugs, per se, I don't think in this case.

My article at java.net covers this (in the context of Java) in some
of its glory and frustration I think:

	<http://today.java.net/pub/a/today/2003/11/07/Query...

>> It really all depends on what your searching needs are.  What does
>> the user interface for searching demand?
>
> For the full name searches, I just wanted wild card matches on the
> right hand side of the query.  For instance, any of these should
> result in john doe being found:
>   J, Jo, Joh, John, John D, etc.

The simplest thing to do in this case is what you're doing for
indexing... combine a field with "firstname lastname" as untokenized,
though lowercased.  Then build a WildcardQuery for "piece*" - though
this isn't going to be possible with the whitespace involved when
using the parser, I don't think (unless you can escape it somehow).
Be sure to lowercase the query also.

	Erik
22580f640e491f0920a192610d1df393?d=identicon&s=25 Tom Davies (Guest)
on 2006-01-24 14:05
(Received via mailing list)
Thanks Erik.  Nice article.  I was able to get the wildcard search to
work including whitespace by manually creating the query as follows:

    qp = Ferret::QueryParser.new
    query = qp.get_wild_query('full_name', "#{partial}*")
    INDEX.search_each(query) do |doc, score|

where #{partial} is the partial portion of the full name.

Thanks for your responses.

Tom
This topic is locked and can not be replied to.