Hello, I'm fairly new to the Ruby scene. Is there any library that can read MS Word (.doc) files and extract the pure text...what about libs for PDF files? Thanks folks, M.
on 2006-04-23 03:49
on 2006-04-23 04:35
You could use the win32ole library and read them yourself via OLE.
on 2006-04-23 04:44
On Sun, 23 Apr 2006, Mateo Barraza wrote: > I'm fairly new to the Ruby scene. > Is there any library that can read MS Word (.doc) files and extract the pure > text...what about libs for PDF files? Hi, There's not a MS Word library that I know of that will easily allow you to extract the pure text, but the OLE suggestion is a good idea. Another method would be to save as WordprocessingML (XML) (if you have Word 2003) and use either REXML or libxml-ruby (two Ruby XML libraries) to parse it (or XSLT). If you've got XML, the interesting nodes (if you really only want text) are the 'w:t' ones. HTH, Keith
on 2006-04-23 05:23
Thanks for your responses; I also found that the POI java project was extended to support ruby: http://jakarta.apache.org/poi/poi-ruby.html Although, I think the win32ole solution is the best for simply reading the content of the docs... M
on 2008-02-08 09:18
Keith S. wrote: > You could use the win32ole library and read them yourself via OLE. Hi, could you provide code snippet
on 2008-02-08 10:15
On Feb 8, 2008 8:18 AM, Rajesh S. <email@example.com> wrote: > Keith S. wrote: I have found this most useful: http://rubyonwindows.blogspot.com/ what you want should be hidden in there http://rubyonwindows.blogspot.com/search/label/word A most valuable read anyway. Cheers Robert -- http://ruby-smalltalk.blogspot.com/ --- Whereof one cannot speak, thereof one must be silent. Ludwig Wittgenstein