Gaspard, here's a copy of my complaining to _why, which contains a few examples of what I'm up against. This was awhile back, so I've moved on a bit, but what you were asking about still applies. There's probably a way to do everything I need to in Ragel, but I can't figure it out. I spent a few hours figuring out a different capture mechanism and thought I was being quite clever, but in the end it didn't work out. I was relieved because it felt like I was reinventing something a tool should provide anyway. Jason Begin forwarded message:
on 2009-06-08 06:41
on 2009-06-08 13:07
Hi Jason ! I've looked to the ragel code for textile and you are right: it has become quite hard to understand. I have gone through the list of difficult defects and through the current textile reference and I have the feeling that the current parser is quite complicated for the task at hand. Textile does not look like such a complicated grammar (at least not what is listed in the reference page), but maybe I'm wrong and there are many places where determinism is not easily attained. I really feel that the parts that are difficult for the parser are also difficult for the reader when editing text. And most of these hard-to-parse and hard-to-read features in textile (except for tables) are not related to describing content but to styling: something like setting an "id" in an article seems really bad to me: what if you display two articles on one page and they both define "hot" id ? Same goes with "em" padding: that's not content, that's styling. I feel very concerned about all these issues related to textile because I am building a CMS in which my clients put *everything*: letters, comments, documents, quality certification stuff, control lists, etc. So I really need a textile parser that can survive in the long run (10yrs). To achieve this goal, we need to: a. have a parser that is easy to enhance with new needs without breaking old text b. have a grammar that is easy to parse For point "a", I think we can live with S-expression generation and customization during s-expression tree processing. For example an image with caption would be parsed as: !file.jpg (foo bar baz)! ==> [:image, "file.jpg (foo bar baz)"] So the processor will run ruby regex to "finish the work". This means the parser in "C" is kept simple and if someone wants to add more features to the "image" tag, she just has to change the ruby regex. For point "b": we need to *not* support shortcut syntax for styling features such as the "id" thing or "em" padding (at least not at the "C" parser level). If someone really wants an em padding, she should use html (it's not nice to use and this is an indication that this is bad practice) : <div style='padding-left:4em;'> # one # two </div> Since I *really* need such a tool, I could help refactoring redcloth into a two step parser (half in "C", half in ruby). What do you think ? Gaspard
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.