Forum: Ferret How to index PDF

31c25d2bd9d13f3ff3e48fc64c7ec14e?d=identicon&s=25 Sébastien Mizrahi (slum)
on 2008-08-11 17:01
Hello,

I'm actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.
Be52651534949072e7f03fbc5fbf4c01?d=identicon&s=25 Nathan Li (nasi)
on 2008-08-12 05:54
Sébastien Mizrahi wrote:
> Hello,
>
> I'm actually trying to index PDF without success.
> Anyone could explain how does it works ?
>
> Thank you.

You must parse the PDF into pure text using some libs
31c25d2bd9d13f3ff3e48fc64c7ec14e?d=identicon&s=25 Sébastien Mizrahi (slum)
on 2008-08-12 08:49
Nathan Li wrote:
> Sébastien Mizrahi wrote:
>> Hello,
>>
>> I'm actually trying to index PDF without success.
>> Anyone could explain how does it works ?
>>
>> Thank you.
>
> You must parse the PDF into pure text using some libs

Thank you for your quick answer :)
Do you have the name of the lib I should use, and an small tutorial ?
F12532c1224ff2ab2260e853258deddb?d=identicon&s=25 neongrau __ (neongrau)
on 2008-09-26 13:56
Sébastien Mizrahi wrote:
> Nathan Li wrote:
>> Sébastien Mizrahi wrote:
>>> Hello,
>>>
>>> I'm actually trying to index PDF without success.
>>> Anyone could explain how does it works ?
>>>
>>> Thank you.
>>
>> You must parse the PDF into pure text using some libs
>
> Thank you for your quick answer :)
> Do you have the name of the lib I should use, and an small tutorial ?

i use the command line tool "pdftotext" for this which i put into
lib/bin inside my app.

add a method to your model and add id to your indexed fields

e.g.

def text
  path = 'path/to/your/file.pdf'
  text = `#{RAILS_ROOT}/lib/bin/pdftotext -q \"#{path}\" -`
end


ralf
This topic is locked and can not be replied to.