MS Word files and PDFs

Hello,

I’m fairly new to the Ruby scene.
Is there any library that can read MS Word (.doc) files and extract the
pure
text…what about libs for PDF files?

Thanks folks,

M.

You could use the win32ole library and read them yourself via OLE.

On Sun, 23 Apr 2006, Mateo Barraza wrote:

I’m fairly new to the Ruby scene.
Is there any library that can read MS Word (.doc) files and extract the pure
text…what about libs for PDF files?

Hi,

There’s not a MS Word library that I know of that will easily allow you
to extract the pure text, but the OLE suggestion is a good idea. Another
method would be to save as WordprocessingML (XML) (if you have Word
2003) and use
either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or
XSLT). If you’ve got XML, the
interesting nodes (if you really only want text) are the ‘w:t’ ones.

HTH,
Keith

Thanks for your responses; I also found that the POI java project was
extended to support ruby:
http://jakarta.apache.org/poi/poi-ruby.html
Although, I think the win32ole solution is the best for simply
reading the content of the docs…

M

Keith S. wrote:

You could use the win32ole library and read them yourself via OLE.

Hi,

could you provide code snippet

On Feb 8, 2008 8:18 AM, Rajesh S. [email protected]
wrote:

Keith S. wrote:
I have found this most useful:
http://rubyonwindows.blogspot.com/
what you want should be hidden in there
http://rubyonwindows.blogspot.com/search/label/word

A most valuable read anyway.

Cheers
Robert


http://ruby-smalltalk.blogspot.com/


Whereof one cannot speak, thereof one must be silent.
Ludwig Wittgenstein