Forum: Ferret How to index PDF

Posted by Sébastien Mizrahi (slum)
on 2008-08-11 17:01
Hello,

I'm actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.
Posted by Nathan Li (nasi)
on 2008-08-12 05:54
Sébastien Mizrahi wrote:
> Hello,
> 
> I'm actually trying to index PDF without success.
> Anyone could explain how does it works ?
> 
> Thank you.

You must parse the PDF into pure text using some libs
Posted by Sébastien Mizrahi (slum)
on 2008-08-12 08:49
Nathan Li wrote:
> Sébastien Mizrahi wrote:
>> Hello,
>> 
>> I'm actually trying to index PDF without success.
>> Anyone could explain how does it works ?
>> 
>> Thank you.
> 
> You must parse the PDF into pure text using some libs

Thank you for your quick answer :)
Do you have the name of the lib I should use, and an small tutorial ?
Posted by neongrau __ (neongrau)
on 2008-09-26 13:56
Sébastien Mizrahi wrote:
> Nathan Li wrote:
>> Sébastien Mizrahi wrote:
>>> Hello,
>>> 
>>> I'm actually trying to index PDF without success.
>>> Anyone could explain how does it works ?
>>> 
>>> Thank you.
>> 
>> You must parse the PDF into pure text using some libs
> 
> Thank you for your quick answer :)
> Do you have the name of the lib I should use, and an small tutorial ?

i use the command line tool "pdftotext" for this which i put into 
lib/bin inside my app.

add a method to your model and add id to your indexed fields

e.g.

def text
  path = 'path/to/your/file.pdf'
  text = `#{RAILS_ROOT}/lib/bin/pdftotext -q \"#{path}\" -`
end


ralf
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.