Possible bug in oniguruma, not 64-bit clean?

I have a ruby script that uses some oniguruma features. It works on
this setup:
FreeBSD
32-bit intel
ruby 1.8.5 (2006-08-25) [i386-freebsd4]
ruby compiled from the FreeBSD port, with oniguruma support
It does not work on this setup:
Linux
x64
ruby 1.9.0 (2007-05-07 patchlevel 0) [x86_64-linux]
I suspect that the bug may be a problem with code in oniguruma that’s
not 64-bit clean. Unfortunately, it’s been difficult for me to trim
this down to a minimal example that demonstrates the bug. What seems
to happen is that oniguruma evaluates a certain regex repeatedly, but
at some point (possibly after hundreds of evaluations), it overwrites
the first 8 bytes of the regex expression with nulls. Here’s what the
source code looks like:

tex.split(/\(?:begin|end){#{x}}/).each { |m|

Here’s the error:

…/translate_to_html.rb:479:in `block in handle_tables’: unmatched
close parenthesis: /\000\000\000\000\000\000\000\000in|end){tabular}/
(RegexpError)

Notice how the source code quoted in the error message is not the same
as the actual source code. I’m imagining C code something like this.
(Pardon me if my C syntax is incorrect – I’m rusty.)
typedef struct {
char *a;
char *b;
char s[];
} regex_t;
regex_t *p;
p = malloc(…);
strcpy(((char *) p)+8,string); // incorrectly assuming 4-byte
pointers
p->b = NULL; // overwrites the first 8 characters of s[]
Of course the real C code inside the oniguruma implementation would have
to be a lot more complex than this, or else the error would be easier
to reproduce, and would not occur seemingly randomly, after hundreds of
evaluations. I’m guessing that the error occurs because my regex
includes the interpolated string #{x}, which would cause the regex
object to get recreated every time the value of x changes.

I would be willing to put more effort into trying to make a short,
reproducible example of the bug, if people on this group thought it
would be helpful. However, I’ve already put ~8 hours into trying to
make a short test case, and just haven’t had any luck. I thought that
maybe if I posted here, the folks who work on the oniguruma code might
look at my post and say, “Oh, I can imagine how such a bug would occur.
I’ll review the relevant part of the code.”

On 6/11/07, Ben C. [email protected] wrote:

I suspect that the bug may be a problem with code in oniguruma that’s

 char s[];

object to get recreated every time the value of x changes.

I would be willing to put more effort into trying to make a short,
reproducible example of the bug, if people on this group thought it
would be helpful. However, I’ve already put ~8 hours into trying to
make a short test case, and just haven’t had any luck. I thought that
maybe if I posted here, the folks who work on the oniguruma code might
look at my post and say, “Oh, I can imagine how such a bug would occur.
I’ll review the relevant part of the code.”

I suggest to file a bug in either Ruby Tracker [1] or Oniguruma Tracker
[2].
This way this error doesn’t get lost in the tons of other mails. I do
not know which one is better. I’d choose [1]. (I haven’t found a
“canonical” one, nor anything useful on oniguruma homepage[3])

Jano

[1] http://rubyforge.org/tracker/?atid=1698&group_id=426&func=browse
[2]
http://rubyforge.org/tracker/index.php?func=detail&aid=11478&group_id=3289&atid=12694
[3] サービス終了のお知らせ

Jano S. wrote:

I suggest to file a bug in either Ruby Tracker [1] or Oniguruma Tracker
[2].
This way this error doesn’t get lost in the tons of other mails. I do
not know which one is better. I’d choose [1]. (I haven’t found a
“canonical” one, nor anything useful on oniguruma homepage[3])

Hmm…I guess the problem is that I don’t have any easy way to help
someone else reproduce the bug, so I’m not sure I know how to file a
useful bug report at this point.

Okay, I managed to make something a little smaller:
http://www.lightandmatter.com/ruby_bug.tar.gz
When I run the script “bug” in this tarball on my
x64 machine, I get the following output:


warning, undefined reference nonmetricunits
warning, undefined reference electroncapture
./a.rb:570:in block in handle_math': unmatched close parenthesis: /\220\007~\000\000\000\000\000in|end){align\*?}/ (RegexpError) from ./a.rb:564:in each’
from ./a.rb:564:in handle_math' from ./a.rb:760:in parse_para’
from ./a.rb:1048:in block in <main>' from ./a.rb:1046:in each’
from ./a.rb:1046:in `’

(The two warning messages are output from my program, not
from ruby.)

Ruby’s behavior in this example is really, really strange.
Believe it or not, if I cut out the third line of the file,
which is a comment, I no longer get the error from the ruby
interpreter. In fact, almost any seemingly trivial change I
make to the file a.rb seems to get rid of the error. This is
the kind of behavior that’s made it extremely difficult for
me to cut this down to a small test program that demonstrates
the bug.

I’ll post this on the oniguruma bug tracker.

On Jun 11, 3:43 pm, Ben C.
[email protected] wrote:

Okay, I managed to make something a little smaller:
http://www.lightandmatter.com/ruby_bug.tar.gz
When I run the script “bug” in this tarball on my
x64 machine, I get the following output:


warning, undefined reference nonmetricunits
warning, undefined reference electroncapture
./a.rb:570:in `block in handle_math’: unmatched close parenthesis:
/\220\007~\000\000\000\000\000in|end){align*?}/ (RegexpError)
^

Where’s the matching opening paren for “end)” there?

Regards,

Dan

Daniel B. wrote:

./a.rb:570:in `block in handle_math’: unmatched close parenthesis:
/\220\007~\000\000\000\000\000in|end){align*?}/ (RegexpError)
^

Where’s the matching opening paren for “end)” there?

It’s not there, because the ruby interpreter has changed my
source code! If you look back at my original post, I gave the
source code that I actually wrote:

tex.split(/\(?:begin|end){#{x}}/).each { |m|

The opening paren is there in my source code. When the error
message is printed out, it’s apparent that the first 8 bytes
of the regex string have been overwritten with garbage by
a bug in the ruby interpreter.