I just went through the ticket list and dropped a bunch from the 4.2 milestone that are just too difficult with Ragel. Many of them I've poked at and they've left me saying, "how the heck am I supposed to do that!?" Multi-byte content will probably never work because Ragel docs say it won't with conditionals (actions that return true or false to determine if a state should be accepted), which I see no way around. Not recognizing vertical pipes escaped with notextile tags in tables, exiting the HTML machine on the first closing block tag it sees, leaving pre blocks prematurely... all these bugs would require a lot of time and code to fix. And they're just the tip of the iceberg. If I walk through the code and look at it through the lens of nondeterminism, I can see lots more problems that people just haven't run into yet. I'd like to release RedCloth 4.2 once I fix the low-hanging fruit. Then, I plan to poke around for alternatives to Ragel. It's been great, but RedCloth has gotten really difficult to maintain because: 1.) It has to compile 2.) It compiles to three languages, has a couple binary gem distributions, and needs to work with Ruby 1.8 and 1.9, which is always a challenge 3.) Many reported bugs involve nondeterminism and require things DFAs like Ragel have a hard time doing 4.) Not that many people can fix bugs themselves because they don't know Ragel or they don't understand the code. 5.) It's hard to tell people they can't mix in extensions. Right now RedCloth is a black box and you have to pre- or post-parse for extra patterns, like wiki links. I want people to be able to use it how they want. If that means mixing in their own cruddy patterns, awesome. A PEG might be the way to go. Looking at Treetop, which is nice, decently maintained, has some history, and is used by Cucumber. Doesn't let me manipulate the parser's acceptance of expressions in code, though. It's a known problem, which is why you don't see any yaml parsers in treetop yet (they have a proposal on Global Parsing State and Semantic Backtrack Triggering). Also, without backreferences or the equivalent in code, it would be hard to match things like HTML tags. Also looking at James Edward Gray II's Ghost Wheel. I like the grammar syntax better and he says it "provides hooks for Ruby code that can be used to make parsing decisions or transform parsed results," but it's less widely used and well-documented and I haven't tried it out, so I don't know its limitations. If anyone else has suggestions of things I should explore, do let me know! I want to keep RedCloth fast, but it also needs to be maintainable. Jason
on 2009-06-07 12:59
on 2009-06-07 21:39
Hi Jason ! Hmmm, this is good and bad news: Good: ruby hooks means I could use a single pass to parse textile customizations in zena instead of running two parsers: nice. Bad: I have just switched to ragel for QueryBuilder to parse pseudo sql and I fear your shortcomings (if that's an english phrase). Could you describe more precisely what you are missing with ragel ? I'm parsing about anything I want with this thing but maybe I'm too dumb to see the walls I'm running into... Gaspard
on 2009-06-08 06:29
It's probably me who's too dumb for Ragel. :). Take a look at the bugs tagged difficult on the tracker. Also I'll forward you what I sent to why describing the problems. Sent from my iPod
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.