Hi list !

From the tests provided by Jason in his treetop experiment I started
to get the feeling that it can be very hard to write a token based
parser for textile since sometimes you need to know all the characters
of the current line to know if “**” actually start a bold sentence.
For example:

This **is not a bold sentence, even if you would think it is, it’s not
**. Believe me.

I had a look at the original implementation of textile in PHP and they
actually run tons of very complicated regexps. This is an extract to
parse inline elements (bold, em, …): http://gist.github.com/131486.

All this to say that I think I will abort my “move forward” parser
solution and will try another route: the “split” parser:

  1. split text into paragraphs/tables
  2. split paragraphs into inline elements (loop until no more split)
  3. split inline elements into links, etc
  4. continue spliting and replacing

This is the fastest way I can imagine to parse elegantly something like

I’ll let you know when I have a prototype…


PS: forget about my other message on word processing, I actually did
not understand that these specs were only related to some internal
word parser.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs