TexTile to Word Document

Hi!

I have fallen quite in love with TexTile! It helps me write reasonably
comprehensible documents “on the go” - free from worrying about
formatting, just nice basic clean documents. Combine that with a script
that adds on some formatting style information and I get pretty nice
HTML documents.

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile? My documents are not overly
complicated - My usage is:

  • Mostly h1 - h3
  • A lot of stuff that should be trated as monospaced (essentially using
    @text@)
  • A bunch of lists
  • Basic formatting using underline, italics and bold
  • some times a div that has a special ID

If this works, it would really make it easy for me to take the content
and create document - then, add on a Word template and get nicely styled
documents.

I don’t mind if the solution is Windows only (i.e., it relies on
win32api).

Any nudges in the correct direction?

Cheers,
Mohit.
4/16/2009 | 12:15 AM.

On Wednesday 15 April 2009 11:16:48 Mohit S. wrote:

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile?

Don’t! That’s the best advice I can give.

If you must, probably the simplest place to start is creating an ODF
document,
or something else Word can read – in the case of ODF, you may have to
have
OpenOffice read it, then save as Word.

But I’m really curious to know why you need this? If it’s just for them
to be
nicely styled, it would be much faster to learn CSS – or, if that’s not
an
option, to perhaps convert the word template to an HTML template.

David M. wrote:

On Wednesday 15 April 2009 11:16:48 Mohit S. wrote:

To extend this to my next level of use, I would like to use it for
creating Word documents. Does anyone know any scripts/ ideas on
creating Word documents using TexTile?

Don’t! That’s the best advice I can give.

If you must, probably the simplest place to start is creating an ODF document,
or something else Word can read – in the case of ODF, you may have to have
OpenOffice read it, then save as Word.

What about rtf? there are some projects (but cannot vouch for them):

http://www.google.com/search?hl=en&q=ruby+rtf&btnG=Google+Search

David M. wrote:

or something else Word can read – in the case of ODF, you may have to have
OpenOffice read it, then save as Word.

But I’m really curious to know why you need this? If it’s just for them to be
nicely styled, it would be much faster to learn CSS – or, if that’s not an
option, to perhaps convert the word template to an HTML template.

HI David and Joel

Thanks for replying.

I haven’t tried creating an ODF document yet, but I did find that
importing the HTML into OpenOffice works partly. Not all my styles are
retained, but at least the headings are maintained. But the process is
cumbersome. Writer doesn’t allow you to open an HTML file and save it
as DOC. You need to first export it to ODT. Then open the ODT in
Writer and save as Word Doc. When you open the resulting DOC in Word,
quite a bit of the formatting is lost and some of the styling is gone.

Opening the HTML directly in Word also works, but it retains styling but
completely loses the semantics (headings, code blocks, etc.) -
everything is directly formatted styled paragraphs.

The reason for wanting to get to Word… well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output. For example, including
page-wise headers and footers, and so on. It’s partly a curiosity and
partly a need :slight_smile:

Joel: I’ll try RTF to see if that helps.

Cheers,
Mohit.
4/16/2009 | 10:09 AM.

On Wednesday 15 April 2009 21:09:20 Mohit S. wrote:

I haven’t tried creating an ODF document yet, but I did find that
importing the HTML into OpenOffice works partly.

That isn’t what I was suggesting, though it is one way.

I was suggesting that you rip open Textile, or write your own Textile
parser,
or even work with the Textile-generated HTML, and write a script that
generates an ODT.

I was mostly suggesting this to discourage you from trying that
approach.
There isn’t a Textile-specific way of doing this, to my knowledge, and
there
wouldn’t likely be a good, generic way of doing it with HTML – at best,
you
could have something take a reference to a CSS class and replace it with
a
reference to a given ODF (or Word) style, but you’d probably have to
recreate
those styles in the word processor – I don’t know of anything that can
take
CSS and generate corresponding ODF (or Word) styles.

The reason for wanting to get to Word… well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.

CSS gives a fair amount of control. It’s possible Word and OpenOffice
provide
more, but not much.

For example, including
page-wise headers and footers, and so on.

A quick Google finds this:

http://css-discuss.incutio.com/?page=PrintStylesheets
http://css-discuss.incutio.com/?page=PrintingHeaders

…and so on.

A quote from that last article: “It is now possible, even feasible, to
use
HTML as the document format for books.”

David M. wrote:

I was suggesting that you rip open Textile, or write your own Textile parser,
or even work with the Textile-generated HTML, and write a script that
generates an ODT.

Ya! I think I may need to look at ripping open the parser and adding
Word (or ODT) to the parser (if I remain stubborn enough to do this). I
am less keen to do this… yet!

I was mostly suggesting this to discourage you from trying that approach.
There isn’t a Textile-specific way of doing this, to my knowledge, and there
wouldn’t likely be a good, generic way of doing it with HTML – at best, you
could have something take a reference to a CSS class and replace it with a
reference to a given ODF (or Word) style, but you’d probably have to recreate
those styles in the word processor – I don’t know of anything that can take
CSS and generate corresponding ODF (or Word) styles.

I don’t care about styles being generated from my CSS. I’m happy enough
if the document retains the semantics of being different types of
sections. I don’t mind creating the styles again in the Word/ ODF
software. Even in my TexTile -> HTML journey, I have a set of styles
that I include into the HTML so that it works together. I expect that
creating/ customizing the style would be a one-off effort - after that,
it’s in a template and when I create a new document, I would just apply
the template to it!

The reason for wanting to get to Word… well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.

CSS gives a fair amount of control. It’s possible Word and OpenOffice provide
more, but not much.

I need to see the links below! I am aware of print style sheets and use
that quite a bit for my websites (mostly hosted on Radiant). But, I’m
keen to generate some “office” documents from the musings on my Palm
phone when I’m out (I find the Documents To Go software not so great…
and not so clean).

…and so on.

A quote from that last article: “It is now possible, even feasible, to use
HTML as the document format for books.”

That last quote is fantastic! I’m actually kind of writing a book. But
I’m not sure if I will complete it. If I don’t complete it, I hope to
release the material on one of my websites. That’s why working in
TexTile is so attractive. It’s already ready for the (Radiant) website
if need be. If I could get to Word, it would open up other applications
for me.

Cheers,
Mohit.
4/16/2009 | 11:25 AM.

Hi Saji!

hmmm… what about some form of tex? Sometime ago, I tried some collaborative
document preparation in the office with the help of a Wiki to create our
annual report (PDF file). I used a tex macro package called ConTeXt.
My experience can be found at:

http://wiki.contextgarden.net/HTML_and_ConTeXt

I shall take a look and see if that helps. I do love the fact that
TexTile is so easy to use :slight_smile:

However one needs to learn a bit of ConTeXt, mostly how to customize
the style part. It is relatively easy to translate html tags to ConTeXt
tags (say h1 - > section; h2 → subsection etc…).

Another alternative is maruku (http://maruku.rubyforge.org) which can
create LaTeX output and from that pdf.
Again, I’ll take a look. I have written a script that can convert a
basic Word document to TexTile (for a Radiant site) which I insert into
my website using your methods! I’ll try to see what works from here.

Thanks for replying.
Cheers,
Mohit.
4/16/2009 | 11:26 AM.

On Wednesday 15 April 2009 22:25:08 Mohit S. wrote:

David M. wrote:

I was suggesting that you rip open Textile, or write your own Textile
parser, or even work with the Textile-generated HTML, and write a script
that generates an ODT.

Ya! I think I may need to look at ripping open the parser and adding
Word (or ODT) to the parser (if I remain stubborn enough to do this). I
am less keen to do this… yet!

Now that I think of it, it’s probably simpler to read the
Textile-generated
HTML. But either way, you’ll have to deal with an office format, which
isn’t
going to be fun.

I don’t care about styles being generated from my CSS. I’m happy enough
if the document retains the semantics of being different types of
sections. I don’t mind creating the styles again in the Word/ ODF
software.

In that case, it’s probably not too difficult. Still harder than adding
a print
mode to CSS, but feasible.

I’ll strongly suggest ODF if you go that route, even if you’re targeting
word,
unless you have a very good Word library. The reason is simple: Last I
checked, the ODF spec is 600 pages. The Microsoft OpenXML spec is 6000
pages,
and is incomplete. On a more subjective level, ODF XML is actually
reasonably
readable, while OpenXML is not. I’d much rather let a tool like
OpenOffice, or
the OpenDocument plugin for Word, handle that for me, rather than trying
to
deal with OpenXML.

That last quote is fantastic! I’m actually kind of writing a book.
[snip]
If I could get to Word, it would open up other applications
for me.

Maybe. It’s possible Word does something CSS doesn’t, here.

What I’m suggesting is that plain old HTML/CSS will probably give you
what you
need for styling, even for print media, without having to use a word
processor. If you can do it with CSS, it will be easier, more portable,
and
likely more future-proof than trying to do it with a word processor.

The reason for wanting to get to Word… well, I do know CSS and use
that for the causal first print from what I write. But I think Word (or
some other publishing software, not sure which one) would give me better
control over controlling the print output.

hmmm… what about some form of tex? Sometime ago, I tried some
collaborative
document preparation in the office with the help of a Wiki to create our
annual report (PDF file). I used a tex macro package called ConTeXt.
My experience can be found at:

http://wiki.contextgarden.net/HTML_and_ConTeXt

However one needs to learn a bit of ConTeXt, mostly how to customize
the style part. It is relatively easy to translate html tags to ConTeXt
tags (say h1 - > section; h2 → subsection etc…).

Another alternative is maruku (http://maruku.rubyforge.org) which can
create LaTeX output and from that pdf.

cheers!
saji

Saji N. Hameed

APEC Climate Center
1463 U-dong, Haeundae-gu, +82 51 745 3951
BUSAN 612-020, KOREA [email protected]
Fax: +82-51-745-3999

HI David

Thanks for your replies.

I’m not averse to working in HTML :slight_smile: I do know the full benefits of a
future-ready text based format. In fact, that’s one of the reasons that
I like TexTile also. I was probing to see if there was a nice enough
way to go to Word. I don’t think I’m going to consider generating a
Word document based on TexTile. Working through the Word spec will
probably be difficult enough! If at all I go that way, I may consider
using win32ole to get Word to generate that document for me based on
parsing Textile.

David M. wrote:

Ya! I think I may need to look at ripping open the parser and adding
Word (or ODT) to the parser (if I remain stubborn enough to do this). I
am less keen to do this… yet!

Now that I think of it, it’s probably simpler to read the Textile-generated
HTML. But either way, you’ll have to deal with an office format, which isn’t
going to be fun.

Yes! that is absolutely correct.

I don’t care about styles being generated from my CSS. I’m happy enough
if the document retains the semantics of being different types of
sections. I don’t mind creating the styles again in the Word/ ODF
software.

In that case, it’s probably not too difficult. Still harder than adding a print
mode to CSS, but feasible.

I already have a print mode for the CSS. With CSS3, it seems I can add
even more. That’s what I’m using right now (CSS2).

I’ll strongly suggest ODF if you go that route, even if you’re targeting word,
unless you have a very good Word library. The reason is simple: Last I
checked, the ODF spec is 600 pages. The Microsoft OpenXML spec is 6000 pages,
and is incomplete. On a more subjective level, ODF XML is actually reasonably
readable, while OpenXML is not. I’d much rather let a tool like OpenOffice, or
the OpenDocument plugin for Word, handle that for me, rather than trying to
deal with OpenXML.

You make a good case here!

What I’m suggesting is that plain old HTML/CSS will probably give you what you
need for styling, even for print media, without having to use a word
processor. If you can do it with CSS, it will be easier, more portable, and
likely more future-proof than trying to do it with a word processor.

Understood… and agreed!

Cheers,
Mohit.
4/17/2009 | 1:18 AM.

Dean W. wrote:

You might look at Asciidoc. I don’t know if it supports Word generation,
however.

Will do! It’s shaping up to be a busy weekend on thsi :slight_smile:

Cheers,
Mohit.
4/17/2009 | 1:19 AM.

You might look at Asciidoc. I don’t know if it supports Word generation,
however.

On Wed, Apr 15, 2009 at 11:16 AM, Mohit S. [email protected]
wrote:

is:

I don’t mind if the solution is Windows only (i.e., it relies on win32api).

Any nudges in the correct direction?

Cheers,
Mohit.
4/16/2009 | 12:15 AM.


Dean W.
twitter: @deanwampler, @chicagoscala
Chicago-Area Scala Enthusiasts (CASE):