Forum: Ruby on Rails Text Extraction and Indexing

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
6eeb30bd4b12148815c1207638421cab?d=identicon&s=25 Elliott Clark (Guest)
on 2007-04-20 19:40
(Received via mailing list)
Long story short I am going to have to index and search uploaded files.
They will be in Word document, Excel, pdf, and text format.  So what is
best way to extract information in RoR so that I can place the needed
into the database?  There are command line utilities that will convert
to txt but I would prefer an in code solution if possible.  Any
on excel?  The only thing I could find was a perl module.

I've decided to use acts_as_ferret as my indexing agent.  Does anyone
any tips on using it other then ?

Elliott Clark
E48d29dc8fedb2878fa518d41cc63d88?d=identicon&s=25 Jan Prill (Guest)
on 2007-04-20 19:55
(Received via mailing list)
Hi Elliott,

have a look at the ContentExtractor of RDig
might get you a good way regarding pdf and word. Though it uses command
utilities as far as I know.


2007/4/20, Elliott Clark <>:
> ?
> --
> Elliott Clark
> >

Jan Prill

Grünebergstraße 38
22763 Hamburg
Tel +49 (0)40 41265809 Fax +49 (0)40 380178-73022
Mobil +49 (0)171 3516667
This topic is locked and can not be replied to.