Scanning word document in ruby

I am new to the ROR and just got stucked into something…
…that is i have to scan a word document through ruby…
Please suggest me how to do…

On 20 November 2011 14:03, [email protected]
[email protected] wrote:

I am new to the ROR and just got stucked into something…
…that is i have to scan a word document through ruby…
Please suggest me how to do…

googling for
ruby parse microsoft word
yields some hits that might be of help, but it is not going to be easy.

Or have you already rejected those suggestions?

Colin

Check out GitHub - kete/convert_attachment_to: A plugin that will take the value of an uploaded file, convert it to either text or HTML, and insert it into specified attribute. Only works with PDF, MS Word, HTML, and plain text documents currently..

It relies on command line utilities to do the conversions.

It has two caveats:

  • it hasn’t been used with Rails 3 or above and therefore you may need
    to fork it to make it work
  • when it was written the utilities didn’t support .doc-x files, you
    might need to update it to do that

Cheers,
Walter

On Nov 21, 2011, at 6:34 AM, Petite A. [email protected]
wrote:

On Nov 20, 2011, at 3:03 PM, [email protected] wrote:

…that is i have to scan a word document through ruby…

catdoc?

catdoc and xls2csv - free MS-Office format readers

If you are dropping down to the utility level, docvert is more
heavyweight, but handles more cases (including doc-x):

I’ll probably at it to convert_attachement_to at some point as an
option.

Cheers,
Walter

On Nov 20, 2011, at 3:03 PM, [email protected] wrote:

…that is i have to scan a word document through ruby…

catdoc?

On Nov 21, 2011, at 2:00 PM, Javier Q. [email protected] wrote:

That’s for python… or is there somethiing I’m missing here :confused:
From my original response:


Check out GitHub - kete/convert_attachment_to: A plugin that will take the value of an uploaded file, convert it to either text or HTML, and insert it into specified attribute. Only works with PDF, MS Word, HTML, and plain text documents currently..

It relies on command line utilities to do the conversions.

It has two caveats:

  • it hasn’t been used with Rails 3 or above and therefore you may need
    to fork it to make it work
  • when it was written the utilities didn’t support .doc-x files, you
    might need to update it to do that

Docvert is written in Python, but can be run as a command line utility.
Like the other utilities that convert_attachment_to relies upon,
convert_attachment_to would make a system call to docvert and grab the
result.

So definitely not a pure ruby solution, but convert_attachment_to acts
as ruby interface to these existing utilities.

Hope that clears it up.

Cheers,
Walter

Is there a way to read a doc/pdf that is uploaded to my rails app?
using a gem or something?

Javier

On Mon, Nov 21, 2011 at 5:01 PM, Javier Q. [email protected]
wrote:

Is there a way to read a doc/pdf that is uploaded to my rails app?
using a gem or something?

Yes there is pdf toolkit for ruby.
Please check this link http://mishmashmoo.com/blog/?p=4

Regards

sathia

On 21 November 2011 11:31, Javier Q. [email protected] wrote:

Is there a way to read a doc/pdf that is uploaded to my rails app?
using a gem or something?

Is that not what this whole thread has been about?

Colin