PDF to text covertor?

martin_mercy2001 · August 11, 2008, 11:44am

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

martin_mercy2001 · August 11, 2008, 12:06pm

-------- Original-Nachricht --------

Datum: Mon, 11 Aug 2008 18:41:51 +0900
Von: dare ruby [email protected]
An: [email protected]
Betreff: PDF to text covertor?

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin

Posted via http://www.ruby-forum.com/.

Dear Jose,

it depends on whether your PDF actually contains text or just images
that a human can recognize as
text.
In the first case, you can try using tools like pdftotext
(pdftotext - Wikipedia), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can
say , “Save as text” .

In the second case, you’ll have to use an OCR (optical character
recognition) software. There are some
good commercial ones available. I’ve liked ABBYY’s Finereader (on
Windows).

Best regards,

Axel

martin_mercy2001 · August 11, 2008, 1:11pm

Hi,

In [email protected]
“PDF to text covertor?” on Mon, 11 Aug 2008 18:41:51 +0900,
dare ruby [email protected] wrote:

Could anyone explain how to do convert PDF to text format.

It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you’re looking for.
Ruby-GNOME 2 download | SourceForge.net

(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler
(*2) http://poppler.freedesktop.org/

pdftotext is a bundled application in Poppler.

Thanks,

martin_mercy2001 · August 19, 2008, 8:14am

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Thanks in advance

Regards,
Jose Martin

martin_mercy2001 · August 19, 2008, 7:43pm

On Mon, Aug 18, 2008 at 11:10 PM, dare ruby [email protected]
wrote:

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There’s a windows port of
the xpdf command line utilities.

http://www.perlmonks.org/?node_id=298041
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

martin

PDF to text covertor?

Regards, Jose Martin

Regards,
Jose Martin