Hyphens

Hi there,

I’m working with some legacy data where customer phone numbers are
stored with hyphens between the area code, exchange, and number (e.g.
555-555-5555). Is this the best way to store a phone number? Perhaps
not, but it’s the way they were being stored, so I have to work with
this format.

Right, so when I save a record the log tells me acts_as_ferret indexed
the number with the hyphens in place OK. However, find_by_contents does
not return any results if I query like 555-555-5555.

If I remove the hyphens and save the record, find_by_contents will
return results (e.g. 5555555555).

Does anyone have any thoughts on this?

Thanks in advance!

M.

Hi!
On Wed, Aug 30, 2006 at 05:29:58PM +0200, Michael L. wrote:

not return any results if I query like 555-555-5555.
Seems the tokenizer strips out the hyphens. This happens inside Ferret,
after acts_as_ferret’s debug message.

use something like

acts_as_ferret :fields => {
:phone => { :index => :untokenized },
…other fields go here
}

to let Ferret store the phone numbers unchanged.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Hey Jens,

Thanks for the reply.

When I try this ‘work_phone’ => { :index => :untokenized }, ‘home_phone’
=> { :index => :untokenized } I get:

unknown stored parameter untokenized

C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:in
index=' C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:182:in initialize’
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:113:in
work_phone_to_ferret' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:554:in to_doc’
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:553:in
to_doc' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in ferret_update’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:333:in
callback' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:330:in callback’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:268:in
update_without_timestamps' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/timestamp.rb:48:in update’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1760:in
create_or_update_without_callbacks' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/callbacks.rb:242:in create_or_update’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1523:in
save_without_validation' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/validations.rb:744:in save_without_transactions’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:120:in
save' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:51:in transaction’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:86:in
transaction' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:112:in transaction’
#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/transactions.rb:120:in
save' #{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:1570:in update_attributes’
#{RAILS_ROOT}/app/controllers/customers_controller.rb:23:in `update’

Jens K. wrote:

Hi!
On Wed, Aug 30, 2006 at 05:29:58PM +0200, Michael L. wrote:

not return any results if I query like 555-555-5555.
Seems the tokenizer strips out the hyphens. This happens inside Ferret,
after acts_as_ferret’s debug message.

use something like

acts_as_ferret :fields => {
:phone => { :index => :untokenized },
…other fields go here
}

to let Ferret store the phone numbers unchanged.

Jens


webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Hey again Jens,

Strange, I no longer get an error when I reference the constant for
untokenized by the fully qualified name as you suggested, but results
still do not come back when searching with hyphens.

Hmmm…

Thanks for you help thus far.

Jens K. wrote:

On Wed, Aug 30, 2006 at 07:02:17PM +0200, Michael L. wrote:

unknown stored parameter untokenized

C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:in

ok, the above is for Ferret 0.10.x .

:index => Ferret::Document::Field::Index::UNTOKENIZED

should work for you.

Jens


webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On Wed, Aug 30, 2006 at 08:24:02PM +0200, Michael L. wrote:

Hey again Jens,

Strange, I no longer get an error when I reference the constant for
untokenized by the fully qualified name as you suggested, but results
still do not come back when searching with hyphens.

maybe the problem isn’t tokenization-related, but you’re trying to
search for a substring of the phone number not beginning at the first
character ?

example:
if your indexed value is ‘123-45-55555’,
a search for ‘123*’ or ‘123-45*’ should find the record, but a search
for ‘45*’ won’t. Is this the behaviour you experience ?

Wildcards at the beginning, as in ‘45’, don’t always work. There
currently is another thread about this topic, it’s unclear if this is
supposed to work or not atm, hope Dave can shed some light on this).

To be able to search only for area code or phone number, you should
tokenize the phone number into parts (split at the hyphens).

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On Wed, Aug 30, 2006 at 07:02:17PM +0200, Michael L. wrote:

unknown stored parameter untokenized

C:/ruby/lib/ruby/gems/1.8/gems/ferret-0.9.6/lib/ferret/document/field.rb:221:in

ok, the above is for Ferret 0.10.x .

:index => Ferret::Document::Field::Index::UNTOKENIZED

should work for you.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Heya Jens,

Actually, I’m having the problem where no records get returned, when I
query for the full number: 123-45-55555 for example.

M.

Jens K. wrote:

On Wed, Aug 30, 2006 at 08:24:02PM +0200, Michael L. wrote:

Hey again Jens,

Strange, I no longer get an error when I reference the constant for
untokenized by the fully qualified name as you suggested, but results
still do not come back when searching with hyphens.

maybe the problem isn’t tokenization-related, but you’re trying to
search for a substring of the phone number not beginning at the first
character ?

example:
if your indexed value is ‘123-45-55555’,
a search for ‘123*’ or ‘123-45*’ should find the record, but a search
for ‘45*’ won’t. Is this the behaviour you experience ?

Wildcards at the beginning, as in ‘45’, don’t always work. There
currently is another thread about this topic, it’s unclear if this is
supposed to work or not atm, hope Dave can shed some light on this).

To be able to search only for area code or phone number, you should
tokenize the phone number into parts (split at the hyphens).

Jens


webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On 9/1/06, Michael L. [email protected] wrote:

Heya Jens,

Actually, I’m having the problem where no records get returned, when I
query for the full number: 123-45-55555 for example.

M.

Hi Michael,

It works here in version 0.10.1;

irb(main):001:0> require ‘rubygems’
=> true
irb(main):002:0> require ‘ferret’
=> false
irb(main):003:0> include Ferret
=> Object
irb(main):004:0> i = I.new
=> #
irb(main):005:0> i << {:content => “the phone number is 123-45-55555”}
=> nil
irb(main):006:0> i.search(“content:123-45-55555”)
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#],
max_score=0.1534264087677>
irb(main):007:0>

I put a bug-fix for this in version 0.10.1. I think it is fixed in
0.9.6 too but I can’t remember for certain. You’re better off
upgrading to 0.10.1, especially if you are using acts_as_ferret (since
most of the work has already been done for you).

Cheers,
Dave