Hi list ! >From the tests provided by Jason in his treetop experiment I started to get the feeling that it can be very hard to write a token based parser for textile since sometimes you need to know all the characters of the current line to know if "**" actually start a bold sentence. For example: This **is not a bold sentence, even if you would think it is, it's not **. Believe me. I had a look at the original implementation of textile in PHP and they actually run tons of very complicated regexps. This is an extract to parse inline elements (bold, em, ...): http://gist.github.com/131486. All this to say that I think I will abort my "move forward" parser solution and will try another route: the "split" parser: 1. split text into paragraphs/tables 2. split paragraphs into inline elements (loop until no more split) 3. split inline elements into links, etc 4. continue spliting and replacing This is the fastest way I can imagine to parse elegantly something like textile. I'll let you know when I have a prototype... Gaspard PS: forget about my other message on word processing, I actually did not understand that these specs were only related to some internal word parser.
on 2009-06-17 22:52
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.