Forum: Ruby Scraping off a Word document?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
33ec7e55a251c1be8d6febfd929aebbe?d=identicon&s=25 Greg Kujawa (gregarican)
on 2009-01-16 16:58
(Received via mailing list)
Here's a conceptual question. I have a Word mail merge, with a few
dozen documents. There's a certain field, let's call in the Employee
field, on each page. These documents are sorted in order of this
field. What I'd like to do is save off each group of pages into its
own Word document under that field. So if it's Employee: Joe Schmoe on
the first five pages I'd want to save off just those pages and name
the file "Joe Schmoe.Doc" and so on.

The mail merge itself is pretty much hard-coded into a big group of
documents, so that's my basis. Any suggestions about what Ruby modules
and methods I'd start out delving into? I'm thinking win32ole of
course, but have a good-sized task ahead of me I have to deliver in
relatively short order :-/
26416a3433dc85ff5ea8b66721b5afa5?d=identicon&s=25 Jeff Strickland (Guest)
on 2009-01-16 18:06
(Received via mailing list)
"gregarican" <greg.kujawa@gmail.com> wrote in message
news:a7d21393-9d0b-4a90-9b0b-e58349e911b5@e10g2000vbe.googlegroups.com...
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/


I'm not sure you can do what you want to do.

You open a Word doc, then want to save a portion based on the Employee
Name
to it's own file? Then, after that Save is finished, keep the file open,
advance the database to the next Employee Name and repeat the save, then
repeat the entire process until you get through all of the Employee
Names?

It occurs to me that Word can't do that task because the Employee Name
field
in the document is an unknown until the actual time of the merge. There
is
only one DOC file for any given letter, and when I do these kinds of
merge
all I get to see is <fieldname> where the variables are that get filled
in
during the merge. You have to fill the variable from the database then
save
the result, advance the database to the next record and fill the
variable
again to save that result.

You are going to create a file for each employee for each letter, and
this
seems to me to defeat the whole reason to merge data into a document.
The
reason I merge is because I want one file for everybody, I specifically
do
not want a separate file for each person.
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2009-01-17 03:35
(Received via mailing list)
gregarican wrote:
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/

Write what you need using the VBA built into Word. Intellisense will
make that
rather easy.

Then either replicate your VBA calls using Ruby's win32ole...

...or just shell directly from Ruby to your VBA!
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-17 11:06
(Received via mailing list)
On 16.01.2009 16:51, gregarican wrote:
> and methods I'd start out delving into? I'm thinking win32ole of
> course, but have a good-sized task ahead of me I have to deliver in
> relatively short order :-/

I'd do it with VB from inside Word.  An alternative might be to use
OpenOffice, read the word, write OO's format (XML in ZIP) and the
manipulate the XML.  But this sounds pretty awkward.

Can't you force the mail merge to produce multiple documents?

Cheers

  robert
Aafa8848c4b764f080b1b31a51eab73d?d=identicon&s=25 Phlip (Guest)
on 2009-01-17 17:21
(Received via mailing list)
Robert Klemme wrote:

> I'd do it with VB from inside Word.  An alternative might be to use
> OpenOffice, read the word, write OO's format (XML in ZIP) and the
> manipulate the XML.  But this sounds pretty awkward.

I suspect Word can also barf out an XML representation.

It may be awkward (get ready for the horror when you open that file!),
but it's
probably the best way. All word processing is heading towards XML for
its
interoperability.
33ec7e55a251c1be8d6febfd929aebbe?d=identicon&s=25 Greg Kujawa (gregarican)
on 2009-01-18 03:20
(Received via mailing list)
On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:
> Robert Klemme wrote:
> > I'd do it with VB from inside Word.  An alternative might be to use
> > OpenOffice, read the word, write OO's format (XML in ZIP) and the
> > manipulate the XML.  But this sounds pretty awkward.
>
> I suspect Word can also barf out an XML representation.
>
> It may be awkward (get ready for the horror when you open that file!), but it's
> probably the best way. All word processing is heading towards XML for its
> interoperability.

I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :-)
26416a3433dc85ff5ea8b66721b5afa5?d=identicon&s=25 Jeff Strickland (Guest)
on 2009-01-18 06:15
(Received via mailing list)
"gregarican" <greg.kujawa@gmail.com> wrote in message
news:c03df6e7-e4f6-4f12-9aa7-bf03921455b4@o4g2000pra.googlegroups.com...
On Jan 17, 11:18 am, Phlip <phlip2...@gmail.com> wrote:
> interoperability.
I wound up writing a C# console program to do the work. I just
referred to ugly underbelly of all of the Word COM stuff and was able
to grab what I needed. It took awhile though, since my text was
contained within text frames. So I had to work with the
Document.Shapes property and whatnot.

In searching for a solution I did run across a VBA code snippet that
would save off each document separately after the mail merge
completed. At least now I have a totally automated solution, although
it's cobbled together from various sources. First I pull my data from
a SQL DB using Ruby, dumping that to an Excel data source. Then I have
a C# program that takes that data source, uses a Word mail merge
template and delivers the final document set. Finally, I have a Ruby
program that looks in that save directory and e-mails the documents to
the individual employees. Eventually it'd be a lot cleaner and easier
to maintain if I had all of the work done in a single program written
in a single language. But that's another fight for another day :-)



<JS>
Well, if anybody can figure this out, it's you.

</JS>
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-18 13:30
(Received via mailing list)
On 18.01.2009 03:18, gregarican wrote:
>
> I wound up writing a C# console program to do the work. I just
> referred to ugly underbelly of all of the Word COM stuff and was able
> to grab what I needed. It took awhile though, since my text was
> contained within text frames. So I had to work with the
> Document.Shapes property and whatnot.

I'd say that's pretty fast.  Good job!

> in a single language. But that's another fight for another day :-)
May I suggest a different approach?  Since your primary step is pulling
data from a relational DB using Ruby, you could as well do this: open
the mail merge Word template, replace mail merge fields with text with
special formatting (for example "<<<field name>>>" or whatever doesn't
collide with RTF meta sequences).  Then you save this as RTF file (ASCII
readable).  Now you only need to read in the mail template file from
Ruby, do all the replacements and then write it out in Ruby again once
for each record.  Sounds pretty simple IMHO.

Kind regards

  robert
Bfa670ef2303deb7dec5a8027367b30b?d=identicon&s=25 David Mullet (mully)
on 2009-01-18 17:27
Robert Klemme wrote:
> On 18.01.2009 03:18, gregarican wrote:
>>
>> I wound up writing a C# console program to do the work. I just
>> referred to ugly underbelly of all of the Word COM stuff and was able
>> to grab what I needed. It took awhile though, since my text was
>> contained within text frames. So I had to work with the
>> Document.Shapes property and whatnot.
>
> I'd say that's pretty fast.  Good job!
>
>> in a single language. But that's another fight for another day :-)
> May I suggest a different approach?  Since your primary step is pulling
> data from a relational DB using Ruby, you could as well do this: open
> the mail merge Word template, replace mail merge fields with text with
> special formatting (for example "<<<field name>>>" or whatever doesn't
> collide with RTF meta sequences).  Then you save this as RTF file (ASCII
> readable).  Now you only need to read in the mail template file from
> Ruby, do all the replacements and then write it out in Ruby again once
> for each record.  Sounds pretty simple IMHO.
>
> Kind regards
>
>   robert

FYI, a similar (though not necessarily better) solution using Find &
Replace in Word is demonstrated here:

  http://rubyonwindows.blogspot.com/2007/11/find-rep...

Greg: If you're willing to share your C# code for automating Word, I,
for one, would like to see it. Feel free to email me, if you like.

David
This topic is locked and can not be replied to.