Forum: Ruby Word + win32ole - how to find formatting of a word?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
34a7615f38496a5dafbb3e6b721c435e?d=identicon&s=25 Mohit Sindhwani (Guest)
on 2008-10-25 10:34
(Received via mailing list)
HI!  I'm trying to use Ruby and win32ole to parse a Word document.  So
far, I'm able to extract the style and text of each paragraph.  That
works great to convert it into individual divs (in the HTML CSS sense).

Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I'm
trying to find how to extract those special cases.  Does anyone know how
to achieve that?

Appreciate your help - thanks!

Cheers,
Mohit.
10/25/2008 | 4:33 PM.
D0338c0de4cb3c5c17300396159933d1?d=identicon&s=25 Axel Etzold (Guest)
on 2008-10-25 16:35
(Received via mailing list)
> HI!  I'm trying to use Ruby and win32ole to parse a Word document.  So
> far, I'm able to extract the style and text of each paragraph.  That
> works great to convert it into individual divs (in the HTML CSS sense).
>
> Now, inside the paragraphs, there are certain words that have special
> formatting (for e.g. the name of a command which is in monospace) - I'm
> trying to find how to extract those special cases.  Does anyone know how
> to achieve that?
>

Dear Mohit,

you could  save the Word file as an html and then extract the relevant
information...
I did that using OpenOffice and got a file containing the font
information in the following form.


<BODY LANG="en-US" DIR="LTR">
<P STYLE="margin-bottom: 0in">A command in <FONT FACE="Linux
Libertine">Linux
Libertine</FONT></P>
<P STYLE="margin-bottom: 0in">A text in <FONT FACE="Bitstream Charter,
serif">Bitstream
Charter</FONT></P>
</BODY>

If you read in the text of that file as a String, you can then find the
relevant bits using regexps.

Best regards,

Axel
34a7615f38496a5dafbb3e6b721c435e?d=identicon&s=25 Mohit Sindhwani (Guest)
on 2008-10-26 10:49
(Received via mailing list)
Axel Etzold wrote:
> Dear Mohit,
> </BODY>
>

Hi Axel

Thanks for replying!  Converting to HTML and working with that is my
last option actually.  In a well-written document, I found that using
Word to return style information about the paragraph is a lot less work
and relatively easy to work with.  I guess it's time to consider your
suggestion!

Cheers,
Mohit.
10/26/2008 | 5:44 PM.
34a7615f38496a5dafbb3e6b721c435e?d=identicon&s=25 Mohit Sindhwani (Guest)
on 2008-10-26 14:15
(Received via mailing list)
Mohit Sindhwani wrote:
>>>
> Thanks for replying!  Converting to HTML and working with that is my
> last option actually.  In a well-written document, I found that using
> Word to return style information about the paragraph is a lot less
> work and relatively easy to work with.  I guess it's time to consider
> your suggestion!
>
Actually, after digging around, I found that this gets me somewhere
there:
words = doc.Words
words.each {|w|
  index += 1
  ft = w.Font.Name
  ftHash[ft] = 1
}

Thanks for your help!

Cheers,
Mohit.
10/26/2008 | 9:14 PM.
D0338c0de4cb3c5c17300396159933d1?d=identicon&s=25 Axel Etzold (Guest)
on 2008-10-26 21:30
(Received via mailing list)
-------- Original-Nachricht --------
> Datum: Sun, 26 Oct 2008 22:14:53 +0900
> Von: Mohit Sindhwani <mo_mail@onghu.com>
> An: ruby-talk@ruby-lang.org
> Betreff: Re: Word + win32ole - how to find formatting of a word?

> >>> Does anyone know how to achieve that?
> >
>   ft = w.Font.Name
>   ftHash[ft] = 1
> }
>
> Thanks for your help!
>
> Cheers,
> Mohit.
> 10/26/2008 | 9:14 PM.
>
>

Dear Mohit,

you're welcome :)
It's always nice to best answer one's own questions , isn't it ?  Thanks
for the info !

Best regards,

Axel
34a7615f38496a5dafbb3e6b721c435e?d=identicon&s=25 Mohit Sindhwani (Guest)
on 2008-10-27 04:21
(Received via mailing list)
Axel Etzold wrote:
> you're welcome :)
> It's always nice to best answer one's own questions , isn't it ?  Thanks for the info !
>
Thanks for your reply again!  Yes, it's good to find the answer yourself
and then share it :)

I find that Win32ole is quite powerful, just that it needs a little
looking around to work with it.

Cheers,
Mohit.
10/27/2008 | 11:19 AM.
This topic is locked and can not be replied to.