Forum: Ruby Merging two Word documents with Ruby?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
03964047bca44a50bddf8b99152b43ef?d=identicon&s=25 Denver Mike (Guest)
on 2005-12-21 05:29
I've got a bugger of a problem and I thought I'd toss it out there to
see if anyone can provide any guidance.

I'm working on an application that needs to merge two Microsoft Word
documents.  However, the application will definitely run on a Linux
server, so Word won't be installed.

My only thought would be to use the new XML format -- maybe I can find a
way to merge two documents with those files.

Has anyone else had any experience merging Word documents in Ruby (and
Rails)? Any other experience in manipulating Word documents in other
ways?

Denver Mike
De22b682e935a9ba28c0d8c9dfd4aab2?d=identicon&s=25 Graham (Guest)
on 2005-12-21 10:23
(Received via mailing list)
Several points
- What do you mean by "Merge"?.. Word documents have structure and the
interleaving of lines or words would appear to make little sense.

- Unless your application and user base is new, then you will have many
files NOT in the XML format, in which case you would need to convert
them - and would need Word installed somewhere. Perhaps you could
reconsider your platform choice (to make the problem simpler) - or if
you have no pre-existing documents reconsider your approach to make
Word unecessary? Word can read a wide variety of document types
(including HTML) - so perhaps this is another way to simplify your
problem.

More details required...
Graham
784481e009179262d133db1f1eb3bfb1?d=identicon&s=25 Edwin Van leeuwen (blackedder)
on 2005-12-21 10:43
Denver Mike wrote:
> My only thought would be to use the new XML format -- maybe I can find a
> way to merge two documents with those files.

The only way I see is to use openoffice. There must be a script
somewhere to run openoffice in batch convert mode. That way you can
convert the doc format to odf. ODF is xml based, so should be mergeable.
The xml based format of microsoft is not used yet. The first office
version that will support that is office 12 and not released yet
Ff260830c27224f0e15f37362a6256d0?d=identicon&s=25 Paul Duncan (Guest)
on 2005-12-21 14:57
(Received via mailing list)
* Denver Mike (denvermike@comcast.net) wrote:
[snipped]
> I'm working on an application that needs to merge two Microsoft Word
> documents.  However, the application will definitely run on a Linux
> server, so Word won't be installed.

There's the POI Ruby bindings, although I've never used them myself and
have no idea how good they are.

  http://jakarta.apache.org/poi/poi-ruby.html

If that doesn't work, I'd try wv and catdoc, respectively.
03964047bca44a50bddf8b99152b43ef?d=identicon&s=25 Denver Mike (Guest)
on 2005-12-21 15:05
> - What do you mean by "Merge"?.. Word documents have structure and the
> interleaving of lines or words would appear to make little sense.

Thanks for your thoughts on this Graham.  By "merge", I meant appending
one Word document to the end of another, but to make things more
complicated, I need to add text into the headings across the entire
document.
784481e009179262d133db1f1eb3bfb1?d=identicon&s=25 Edwin Van leeuwen (blackedder)
on 2005-12-21 16:36
Denver Mike wrote:
> Thanks for your thoughts on this Graham.  By "merge", I meant appending
> one Word document to the end of another, but to make things more
> complicated, I need to add text into the headings across the entire
> document.

Microsoft word has something called a master document. Maybe you could
add a masterdocument that inclkudes both files+extra headings. This
masterdocument might be simple enouh that you can actually reverse
engineer it. (Create one in word once and just edit the parts you need
to edit with ruby).
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2005-12-21 17:02
(Received via mailing list)
On 12/21/05, Denver Mike <denvermike@comcast.net> wrote:
>
> > - What do you mean by "Merge"?.. Word documents have structure and the
> > interleaving of lines or words would appear to make little sense.
>
> Thanks for your thoughts on this Graham.  By "merge", I meant appending
> one Word document to the end of another, but to make things more
> complicated, I need to add text into the headings across the entire
> document.
>
This can actually be extremely complex, because a named style (such as
'Body', 'Normal', or 'Heading 1') can (and will) have different
properties (fonts, colors, sizes, margins, encoding, etc) in each of
the two documents.  You will need to rename every style and style
reference in the second document in order to prevent the two from
colliding.
C71594fb2ca545c234b10c3347dd1e3d?d=identicon&s=25 Lei Wu (Guest)
on 2005-12-21 18:14
(Received via mailing list)
If you have a choice, don't use Word document. Use RTF format instead.
RTF files can be opened the same way as Word documents, but are a lot
easier to process.

Lei
7da0c2cbd3e9a596006b994b6a36f09c?d=identicon&s=25 Daniel Calvelo (Guest)
on 2005-12-21 22:35
(Received via mailing list)
If your documents are properly structured using styles (which is rare)
and they share the same styles (and I mean the *same* styles), you can
try to use openoffice in remote command mode, convert the .doc into
..odt, parse the xml of both files, proceed to merge the XMLs and
rebuild an odt file; perhaps going through OOo again to have a .doc
back. But you will need to ensure that the styles are always converted
into something reliably identifiable.

FAO (the UN branch for food and agriculture) uses a template system
(thus forcing a set of styles) which is used to output RTF which is
converted into XML for storage. Are your documents existing legacy ones
or is this a new setup? If you're building it all, then you might
seriously consider using openoffice all the way.
12271b6df73fe29930d65586be5a4a70?d=identicon&s=25 Dave Howell (Guest)
on 2005-12-23 18:41
(Received via mailing list)
On Dec 21, 2005, at 7:06, Denver Mike wrote:
> Thanks for your thoughts on this Graham.  By "merge", I meant appending
> one Word document to the end of another, but to make things more
> complicated, I need to add text into the headings across the entire
> document.

Does it still need to be a Word document when you're done? An entirely
different approach would be to use some kind of Word file display
program and make PDFs of the files, then chain the PDFs together. Do
the headers by slapping a white block over the existing headers and
writing a new header over them.

Personally, my approach would be to abandon the project as just too
messy for words. :)
7da0c2cbd3e9a596006b994b6a36f09c?d=identicon&s=25 Daniel Calvelo (Guest)
on 2005-12-23 19:39
(Received via mailing list)
OpenOffice.org can do the .doc to pdf conversion. I like your idea very
much, Dave. Maybe PostScript would be easier to fiddle with ex-post.
12271b6df73fe29930d65586be5a4a70?d=identicon&s=25 Dave Howell (Guest)
on 2005-12-23 21:28
(Received via mailing list)
On Dec 23, 2005, at 11:37, Daniel Calvelo wrote:

> OpenOffice.org can do the .doc to pdf conversion. I like your idea very
> much, Dave. Maybe PostScript would be easier to fiddle with ex-post.

Probably. If you have a program that lets you overlay one PDF page on
another, then your best bet is to output a PDF page with your header in
it. (I'd probably use TeX, or maybe script OSX's TextEdit program, and
my copy of full Acrobat 4 for the page overlay.) The other alternative
would be to create (or have somebody create for you) an .eps with the
white box and a line of text in a program like Freehand or Illustrator.
If you pop open the .eps file in a text editor, you'll find it not too
difficult to programmatically replace the text, although you won't
easily be able to duplicate the kerning and other textual adjustments.
Have OpenOffice print to a postscript file, then figure out what you
can use as a page marker in order to embed the .eps in that file on
each page so that it comes after (and thus covers) the original
headers, if any. Then feed the modified .ps file into a PDF distiller.

That's what I'd try, I think.
5deb1b5bbb592391b158ca880d06ffc3?d=identicon&s=25 Dominic Sisneros (Guest)
on 2006-01-13 11:30
(Received via mailing list)
abiword can be used from the command line.  See http://
www.advogato.org/person/msevior/diary.html?start=65

This might allow for this to happen
5bbd108babbd739763a9672022d00f6a?d=identicon&s=25 hari (Guest)
on 2006-01-17 08:18
hi guys,

i have got  a doubt .hopeu guy can help

I need to build a utility ,which if i run ,i need to merger two MS wor
documents & i should be able to print the meged document enabling us to
select the ptions of "remove header" & "remove footer"
& consecutively should print document with footer/header removed

help
5bbd108babbd739763a9672022d00f6a?d=identicon&s=25 hari (Guest)
on 2006-01-19 12:26
hi guys,

i have got  a doubt .hopeu guy can help

I need to build a utility ,which if i run ,i need to merge two MS wor
documents & i should be able to print the merged document enabling us to
select the options of "remove header" & "remove footer"
& consecutively should print document with footer/header removed

help -pls
This topic is locked and can not be replied to.