PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

-------- Original-Nachricht --------

Datum: Mon, 11 Aug 2008 18:41:51 +0900
Von: dare ruby [email protected]
An: [email protected]
Betreff: PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

Posted via http://www.ruby-forum.com/.

Dear Jose,

it depends on whether your PDF actually contains text or just images
that a human can recognize as
text.
In the first case, you can try using tools like pdftotext
(http://en.wikipedia.org/wiki/Pdftotext), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can
say , “Save as text” .

In the second case, you’ll have to use an OCR (optical character
recognition) software. There are some
good commercial ones available. I’ve liked ABBYY’s Finereader (on
Windows).

Best regards,

Axel

Hi,

In [email protected]omain.invalid
“PDF to text covertor?” on Mon, 11 Aug 2008 18:41:51 +0900,
dare ruby [email protected] wrote:

Could anyone explain how to do convert PDF to text format.

It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you’re looking for.
http://ruby-gnome2.svn.sourceforge.net/viewvc/ruby-gnome2/ruby-gnome2/trunk/poppler/sample/pdf2text.rb?view=markup

(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
(*2) http://poppler.freedesktop.org/

pdftotext is a bundled application in Poppler.

Thanks,

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Thanks in advance

Regards,
Jose Martin

On Mon, Aug 18, 2008 at 11:10 PM, dare ruby [email protected]
wrote:

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There’s a windows port of
the xpdf command line utilities.

http://gnuwin32.sourceforge.net/packages/xpdf.htm
http://www.perlmonks.org/?node_id=298041
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/
http://forjournalists.com/cookbook/index.php?title=XPDF

martin

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs