Acts_as_ferret and searching word docs

I was wondering if it is possible to search word documents using ferret.
The actual text in a word document isn’t in a binary format - only the
formatting. Surely it would be possible to parse that?

Alex MacCaw wrote:

I was wondering if it is possible to search word documents using ferret.
The actual text in a word document isn’t in a binary format - only the
formatting. Surely it would be possible to parse that?

You might be able to use some of the extensions for M$ platform and ruby
to use COM to get the data. Or if you don’t want to run on M$ platform
you could possibly use Java’s POI from Jakarta to parse out the text and
put it into something that Ruby could then put into ferret.

Charlie

Charlie H. wrote:

Charlie

Or there’s Abiword - runs on all platforms, and ouputs nice text. If
you don’t want graphical dependencies, there’s wvWare, too. I’m using
it at the moment.

I successfully used the wv-utilities (wvText or wvHtml, on debian do
‘apt-get install wv’) to index word documents with Ferret.
Thanks Jens,
Is there any way to do this on windows - or I’ll just have to wait till
I deploy on linux.

On Sat, Nov 18, 2006 at 04:33:26PM +0100, Charlie H. wrote:

Alex MacCaw wrote:

I was wondering if it is possible to search word documents using ferret.
The actual text in a word document isn’t in a binary format - only the
formatting. Surely it would be possible to parse that?

You might be able to use some of the extensions for M$ platform and ruby
to use COM to get the data. Or if you don’t want to run on M$ platform
you could possibly use Java’s POI from Jakarta to parse out the text and
put it into something that Ruby could then put into ferret.

I successfully used the wv-utilities (wvText or wvHtml, on debian do
‘apt-get install wv’) to index word documents with Ferret.

you can have a look at RDig (http://rubyforge.org/projects/rdig) to see
an example of how this could be done.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs