Any chance to get 0.11.3 on windows soon?

llecigne · March 23, 2007, 3:56pm

Hi,

I’m working on a Ferret-based application which indexes content in all
European languages. Thus, I have to deal with those funny European
characters.

After googling a bit, I decided to move on with a custom European
analyzer based on MappingFilter, as suggested in the Ferret rdoc.
Everything works fine with Ferret 0.11.3 on Mac OS X.

But this application needs to run on both Windows and Mac OS X. Since
there’s no mswin32 gem for 0.11.3, I decided to downgrade to 0.10.9 and
replace MappingFilter with a custom-made filter as suggested by David in
the following post.

http://www.ruby-forum.com/topic/85299#156036

See the code I wrote at the bottom of this post. The token streams
produced by this analyzer work fine in unit tests but the indexer fails
to use them when a document is added. Here’s the stack trace I get (on
Mac OS X)

wrong argument type Ferret::Analysis::ToASCIIFilter (expected Data)
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in
text=' /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in add_document’
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:277:in
<<' /usr/local/lib/ruby/1.8/monitor.rb:238:in synchronize’
/usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.9/lib/ferret/index.rb:252:in
`<<’

I tried several variants of the code (like avoid super and inheritance)
but never with success.

Therefore, I’m wondering whether 0.11.3 will be available soon on
windows.
Or if I can build this gem myself (I guess I’ll need a Microsoft C
compiler).
Or if I can do things differently to get a European analyzer with
0.10.9.

Thanks for your help.
Laurent

require ‘ferret’
require ‘jcode’

module Ferret::Analysis

ACCENTUATED_CHARS =

‘Ã Ã¡Ã¢Ã£Ã¤Ã¥ÄÄƒÃ§Ä‡ÄÄ‰Ä‹ÄÄ‘Ã¨Ã©ÃªÃ«Ä“Ä™Ä›Ä•Ä—ÄÄŸÄ¡Ä£Ä¥Ä§Ã¬Ã¬ÃÃ®Ã¯Ä«Ä©ÄÄ¯Ä±Ä³ÄµÄ·Ä¸Å‚Ä¾ÄºÄ¼Å€Ã±Å„ÅˆÅ†Å‰Å‹Ã²Ã³Ã´ÃµÃ¶Ã¸ÅÅ‘ÅÅÄ…Å•Å™Å—Å›Å¡ÅŸÅÈ™Å¥Å£Å§È›Ã¹ÃºÃ»Ã¼Å«Å¯Å±ÅÅ©Å³ÅµÃ½Ã¿Å·Å¾Å¼Åº’
REPLACEMENT_CHARS =
‘aaaaaaaacccccddeeeeeeeeegggghhiiiiiiiijjjjkklllllnnnnnnooooooooooqrrrsssssttttuuuuuuuuuuwyyyzzz’

MAPPING = {
  ['Ã ','Ã¡','Ã¢','Ã£','Ã¤','Ã¥','Ä','Äƒ']         => 'a',
  'Ã¦'                                       => 'ae',
  ['Ä','Ä‘']                                 => 'd',
  ['Ã§','Ä‡','Ä','Ä‰','Ä‹']                     => 'c',
  ['Ã¨','Ã©','Ãª','Ã«','Ä“','Ä™','Ä›','Ä•','Ä—']     => 'e',
  ['Æ’']                                     => 'f',
  ['Ä','ÄŸ','Ä¡','Ä£']                         => 'g',
  ['Ä¥','Ä§']                                 => 'h',
  ['Ã¬','Ã¬','Ã','Ã®','Ã¯','Ä«','Ä©','Ä']         => 'i',
  ['Ä¯','Ä±','Ä³','Äµ']                         => 'j',
  ['Ä·','Ä¸']                                 => 'k',
  ['Å‚','Ä¾','Äº','Ä¼','Å€']                     => 'l',
  ['Ã±','Å„','Åˆ','Å†','Å‰','Å‹']                 => 'n',
  ['Ã²','Ã³','Ã´','Ãµ','Ã¶','Ã¸','Å','Å‘','Å','Å'] => 'o',
  ['Å“']                                     => 'oek',
  ['Ä…']                                     => 'q',
  ['Å•','Å™','Å—']                             => 'r',
  ['Å›','Å¡','ÅŸ','Å','È™']                     => 's',
  ['Å¥','Å£','Å§','È›']                         => 't',
  ['Ã¹','Ãº','Ã»','Ã¼','Å«','Å¯','Å±','Å','Å©','Å³'] => 'u',
  ['Åµ']                                     => 'w',
  ['Ã½','Ã¿','Å·']                             => 'y',
  ['Å¾','Å¼','Åº']                             => 'z'
}

class TokenFilter < TokenStream
# Construct a token stream filtering the given input.
def initialize(input)
@input = input
end
end

replace accentuated chars with ASCII one

class ToASCIIFilter < TokenFilter

def next()
  token = @input.next()
  unless token.nil?
    token.text = token.text.tr(ACCENTUATED_CHARS, REPLACEMENT_CHARS)
  end
  token
end

end
class EuropeanAnalyzer < StandardAnalyzer

def token_stream(field, string)
  if defined?(MappingFilter)
    return MappingFilter.new(super, MAPPING) # 0.11.x
  else
    return ToASCIIFilter.new(super) # 0.10.x
  end
end

end
end

llecigne · March 27, 2007, 3:57pm

Nobody indexing european content on windows with 0.10.x ?
Laurent

llecigne · April 3, 2007, 5:43am

Laurent L. wrote:

Nobody indexing european content on windows with 0.10.x ?
Laurent

Hi Laurent,

I too am hoping for an upgraded version of Ferret for windows. I am not
100% clear but I think there are problems with compiling it using MSVC
and stems back to the fact that Ruby is compiled using MSVC and not
mingw. I think there is an effort on to get Ruby compiled under mingw
which will solve the issue with compiling Ferret but I have no idea when
that will surface from the Ruby team.

I have spent hours trying to get it to compile with VC++ Express 2005
but with no luck so far. I think previous windows compiles were with
VC6 which I have no access to.

Sorry this isn’t exactly good news but at least there is still a chance
something will turn up sooner or later.

Chad.

llecigne · April 4, 2007, 11:52am

Chad T. wrote:

Laurent L. wrote:

Nobody indexing european content on windows with 0.10.x ?
Laurent

Hi Laurent,

Hi Chad,

Thanks for your response.

I too am hoping for an upgraded version of Ferret for windows.

Good to know I’m not alone.

I am not
100% clear but I think there are problems with compiling it using MSVC
and stems back to the fact that Ruby is compiled using MSVC and not
mingw. I think there is an effort on to get Ruby compiled under mingw
which will solve the issue with compiling Ferret but I have no idea when
that will surface from the Ruby team.

I was not aware of this issue. Thanks for sharing.

I have spent hours trying to get it to compile with VC++ Express 2005
but with no luck so far. I think previous windows compiles were with
VC6 which I have no access to.

I might be able to get access to VC6. Unfortunately, I’ve no Windows
system available right now and won’t be able to setup one in the coming
days.

If nothing moves on the Ferret and/or Ruby side next week, I’ll take
time to try and build Ferret on a fresh Windows system. I’ll get back to
you at that time to let you know how it goes and get some help if I’m
stuck.

Sorry this isn’t exactly good news but at least there is still a chance
something will turn up sooner or later.

Chad.

It’s way better than nothing. Let’s stay tuned.

Laurent

llecigne · April 6, 2007, 10:45am

Hi David,

David B. wrote:

On 4/4/07, Laurent L. [email protected] wrote:

Thanks for your response.

that will surface from the Ruby team.
days.
Chad.

It’s way better than nothing. Let’s stay tuned.

0.11.4 win32 gem is coming very soon.

Great ! Thx much. I’ll stay tuned.
Cheers,
Laurent

llecigne · April 6, 2007, 5:18am

On 4/4/07, Laurent L. [email protected] wrote:

Thanks for your response.

that will surface from the Ruby team.
days.
Chad.

It’s way better than nothing. Let’s stay tuned.

0.11.4 win32 gem is coming very soon.