HI! I’m trying to use Ruby and win32ole to parse a Word document. So
far, I’m able to extract the style and text of each paragraph. That
works great to convert it into individual divs (in the HTML CSS sense).
Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I’m
trying to find how to extract those special cases. Does anyone know how
to achieve that?
HI! I’m trying to use Ruby and win32ole to parse a Word document. So
far, I’m able to extract the style and text of each paragraph. That
works great to convert it into individual divs (in the HTML CSS sense).
Now, inside the paragraphs, there are certain words that have special
formatting (for e.g. the name of a command which is in monospace) - I’m
trying to find how to extract those special cases. Does anyone know how
to achieve that?
Dear Mohit,
you could save the Word file as an html and then extract the relevant
information…
I did that using OpenOffice and got a file containing the font
information in the following form.
A command in Linux
Libertine
A text in Bitstream
Charter
If you read in the text of that file as a String, you can then find the
relevant bits using regexps.
Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less work
and relatively easy to work with. I guess it’s time to consider your
suggestion!
Thanks for replying! Converting to HTML and working with that is my
last option actually. In a well-written document, I found that using
Word to return style information about the paragraph is a lot less
work and relatively easy to work with. I guess it’s time to consider
your suggestion!
Actually, after digging around, I found that this gets me somewhere
there:
words = doc.Words
words.each {|w|
index += 1
ft = w.Font.Name
ftHash[ft] = 1
}
Datum: Sun, 26 Oct 2008 22:14:53 +0900
Von: Mohit S. [email protected]
An: [email protected]
Betreff: Re: Word + win32ole - how to find formatting of a word?
Does anyone know how to achieve that?
ft = w.Font.Name
ftHash[ft] = 1
}
Thanks for your help!
Cheers,
Mohit.
10/26/2008 | 9:14 PM.
Dear Mohit,
you’re welcome
It’s always nice to best answer one’s own questions , isn’t it ? Thanks
for the info !