[ANN] Walters - a faster HTML escaping lib for JRuby

Walters (https://github.com/wmeissner/walters) is a jruby extension
optimised for fast html/xml/uri/url/href escaping. It originally
started as a Houdini wrapper to test XNI, but then I ported the
Houdini C code to java so JRuby would have a pure java implementation
(which makes it a lot easier to deploy in a WAR). It was named for
Norman Murray Walters, an Australian contemporary of Houdini’s.

Benchmarks.

The performance delta isn’t as impressive as Houdini/EscapeUtils is
for MRI, because JRuby has hotspot to do the heavy lifting, but you
still end up with at least a 3x speedup.

Escaping 1000 bytes of text requiring escaping 1000000 times under
jruby-1.7.4:

Rack::Utils.escape_html 89.980000 0.230000 90.210000 ( 90.272000)
Haml::Helpers.html_escape 50.420000 0.170000 50.590000 ( 51.147000)
ERB::Util.html_escape 44.650000 0.130000 44.780000 ( 45.518000)
CGI.escapeHTML 36.230000 0.090000 36.320000 ( 36.358000)
String#gsub 35.490000 0.090000 35.580000 ( 35.587000)
Walters.escape_html 10.090000 0.030000 10.120000 ( 10.126000)

That equates to not quite 100MB/sec on a 2.26ghz core2duo. If your
web app is trying to push more than that through a 5 year old cpu, you
probably have other problems :slight_smile:

You can also monkey patch any of the above methods (see the README) to
use Walters.escape_html, so you can get automagical speedups just with
a few lines in a rails initializer.

Oh nice…I saw a few folks complaining about the Houdini ext and not
getting something for JRuby. Nice going, Wayne :slight_smile:

  • Charlie (mobile)

Oh, do you still have the XNI version? Might be a good showcase.

  • Charlie (mobile)

The XNI version is still there (its what you get when you install the
gem on MRI), but it is currently fairly slow (but still faster than
the pure-ruby alternatives). When I did the native implementations of
XNI for JRuby and MRI, I went for a fairly simple implementation
versus the highly optimised implementation JRuby’s FFI has. The MRI
backend also releases the GIL on every ruby->C call (to emulate the
JRuby behaviour), which has a bit of overhead.

Things like Houdini (i.e. where “MOAR SP33D!” is the design goal), are
pretty much the worst-case scenario for XNI - by design it has to do
things like copy strings instead of passing a pointer to the string
backing store (since it can’t on JRuby), and release the GIL on MRI.
That shouldn’t worry most C extensions, which should be to access
functionality, not for performance (for that, you really want a
dedicated java/C-ext for the specific VM).