How to index PDF

slum · August 11, 2008, 5:01pm

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

slum · August 12, 2008, 5:54am

SÃ©bastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

slum · August 12, 2008, 8:49am

Nathan Li wrote:

SÃ©bastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

Thank you for your quick answer
Do you have the name of the lib I should use, and an small tutorial ?

slum · September 26, 2008, 1:56pm

Nathan Li wrote:

SÃ©bastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

Thank you for your quick answer
Do you have the name of the lib I should use, and an small tutorial ?

i use the command line tool “pdftotext” for this which i put into
lib/bin inside my app.

add a method to your model and add id to your indexed fields

e.g.

def text
path = ‘path/to/your/file.pdf’
text = #{RAILS_ROOT}/lib/bin/pdftotext -q \"#{path}\" -
end

ralf