Forum: Ruby MS Word files and PDFs

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
57dce40176f8419accb69318d3e24988?d=identicon&s=25 Mateo Barraza (Guest)
on 2006-04-23 01:49
(Received via mailing list)
Hello,

I'm fairly new to the Ruby scene.
Is there any library that can read MS Word (.doc) files and extract the
pure
text...what about libs for PDF files?

Thanks folks,

M.
7e6ee7bc26cac7c3f1edce27558ced3d?d=identicon&s=25 Keith Sader (ksader)
on 2006-04-23 02:35
(Received via mailing list)
You could use the win32ole library and read them yourself via OLE.
01d68aff859065b5cbc1cfc67cb16871?d=identicon&s=25 Keith Fahlgren (Guest)
on 2006-04-23 02:44
(Received via mailing list)
On Sun, 23 Apr 2006, Mateo Barraza wrote:
> I'm fairly new to the Ruby scene.
> Is there any library that can read MS Word (.doc) files and extract the pure
> text...what about libs for PDF files?

Hi,

There's not a MS Word library that I know of that will easily allow you
to extract the pure text, but the OLE suggestion is a good idea. Another
method would be to save as WordprocessingML (XML) (if you have Word
2003) and use
either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or
XSLT). If you've got XML, the
interesting nodes (if you really only want text) are the 'w:t' ones.


HTH,
Keith
57dce40176f8419accb69318d3e24988?d=identicon&s=25 Mateo Barraza (Guest)
on 2006-04-23 03:23
(Received via mailing list)
Thanks for your responses; I also found that the POI java project was
extended to support ruby:
http://jakarta.apache.org/poi/poi-ruby.html
Although, I think the win32ole solution is the best for simply
reading the content of the docs...

M
Cb19f170e749494ad501ffe63fcf949f?d=identicon&s=25 Rajesh Soni (soni_rajesh)
on 2008-02-08 08:18
Keith Sader wrote:
> You could use the win32ole library and read them yourself via OLE.

Hi,

could you provide code snippet
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2008-02-08 09:15
(Received via mailing list)
On Feb 8, 2008 8:18 AM, Rajesh Soni <rajesh.soni@softwarefolks.com>
wrote:
> Keith Sader wrote:
I have found this most useful:
http://rubyonwindows.blogspot.com/
what you want should be hidden in there
http://rubyonwindows.blogspot.com/search/label/word

A most valuable read anyway.

Cheers
Robert



--
http://ruby-smalltalk.blogspot.com/

---
Whereof one cannot speak, thereof one must be silent.
Ludwig Wittgenstein
This topic is locked and can not be replied to.