Close to a 4.2 release; experimenting with Ragel alternatives

I just went through the ticket list and dropped a bunch from the 4.2
milestone that are just too difficult with Ragel. Many of them I’ve
poked at and they’ve left me saying, “how the heck am I supposed to do
that!?” Multi-byte content will probably never work because Ragel
docs say it won’t with conditionals (actions that return true or false
to determine if a state should be accepted), which I see no way
around. Not recognizing vertical pipes escaped with notextile tags in
tables, exiting the HTML machine on the first closing block tag it
sees, leaving pre blocks prematurely… all these bugs would require a
lot of time and code to fix. And they’re just the tip of the
iceberg. If I walk through the code and look at it through the lens
of nondeterminism, I can see lots more problems that people just
haven’t run into yet.

I’d like to release RedCloth 4.2 once I fix the low-hanging fruit.
Then, I plan to poke around for alternatives to Ragel. It’s been
great, but RedCloth has gotten really difficult to maintain because:
1.) It has to compile
2.) It compiles to three languages, has a couple binary gem
distributions, and needs to work with Ruby 1.8 and 1.9, which is
always a challenge
3.) Many reported bugs involve nondeterminism and require things DFAs
like Ragel have a hard time doing
4.) Not that many people can fix bugs themselves because they don’t
know Ragel or they don’t understand the code.
5.) It’s hard to tell people they can’t mix in extensions. Right now
RedCloth is a black box and you have to pre- or post-parse for extra
patterns, like wiki links. I want people to be able to use it how
they want. If that means mixing in their own cruddy patterns, awesome.

A PEG might be the way to go. Looking at Treetop, which is nice,
decently maintained, has some history, and is used by Cucumber.
Doesn’t let me manipulate the parser’s acceptance of expressions in
code, though. It’s a known problem, which is why you don’t see any
yaml parsers in treetop yet (they have a proposal on Global Parsing
State and Semantic Backtrack Triggering). Also, without
backreferences or the equivalent in code, it would be hard to match
things like HTML tags.

Also looking at James Edward G. II’s Ghost Wheel. I like the
grammar syntax better and he says it “provides hooks for Ruby code
that can be used to make parsing decisions or transform parsed
results,” but it’s less widely used and well-documented and I haven’t
tried it out, so I don’t know its limitations.

If anyone else has suggestions of things I should explore, do let me
know! I want to keep RedCloth fast, but it also needs to be
maintainable.

Jason

Hi Jason !

Hmmm, this is good and bad news:

Good: ruby hooks means I could use a single pass to parse textile
customizations in zena instead of running two parsers: nice.

Bad: I have just switched to ragel for QueryBuilder to parse pseudo
sql and I fear your shortcomings (if that’s an english phrase).

Could you describe more precisely what you are missing with ragel ?
I’m parsing about anything I want with this thing but maybe I’m too
dumb to see the walls I’m running into…

Gaspard

It’s probably me who’s too dumb for Ragel. :). Take a look at the bugs
tagged difficult on the tracker. Also I’ll forward you what I sent to
why describing the problems.

Sent from my iPod