Read_multipart bug (in Ruby's CGI and Rail's AbstractRequest

unknown · June 1, 2007, 6:45pm

I have written code that receives multipart forms. I’m currently
testing the software by using curl to push files through a Rails
controller. One of my test cases fails, and it strongly looks like
the failure is due to incorrect code in read_multipart:

…
bufsize = 10 * 1024
…
buf =
buf.sub(/\A((?:.|\n)*?)(?:[\r\n]{1,2})?#{quoted_boundary}([\r\n]{1,2}|–)/n)
do
body.print $1
if “–” == $2
content_length = -1
end
boundary_end = $2.dup
“”
end
…
break if buf.size == 0
break if content_length === -1
end
raise EOFError, “bad boundary end of body part” unless
boundary_end=~/–/

The problem appears to be that if a bufsize chunk is read that matches
the “buf.sub” regular expression above exactly, buf can become “”,
which has size 0, even though there is more information to be read.
This can leave $2 set to “\r\n” with buf.size 0, so the EOFError
exception is raised, even though there are more bytes to read
(including the “–”).

If I change bufsize to be 100 * 1024, or 4 * 1024, the problem goes
away, because I no longer get a buffer that matches the pattern
exactly. If I comment out the “break if buf.size == 0” the problem
goes away, because the code can keep going and find the “–” it’s
looking for. Specifically, if I comment out the “break if buf.size ==
0”
and instrument the code to show me some of the values of the variables
I get the output that I have after my signature. In it, it’s clear
that $2 is “\r\n” and buf.size is 0, even though there’s more to come,
but since we don’t break on buf.size 0, the loop goes around once more
and picks up the extra bytes.

Looking at all of read_multipart, I’m not sure why the
“break if buf.size == 0” code is there, since earlier in the code an
EOFError will be raised if body.read returns nil or “”. On the other
hand, the exit may be necessary to prevent some sort of infinite loop
not associated body.read running into the end of the file.

I’m posting this to the Ruby mailing list, because I suspect there are
people on the list intimately familiar with read_multipart who might
be able to see the problem and fix it properly. The particular data
set that I’m exercising contains sensitive data, but if there’s
interest, I can capture curl’s output and scrub the data, although I’m
not sufficiently familiar with CGI to know how to write a unit test
for read_multipart.

I’m cc’ing this to the Rails mailing list because Rails has its own
copy of the CGI code in actionpack/lib/action_controller/request.rb,
and that’s where I first encountered the bug.

–
Cliff M. [email protected]
Stolen Bases
aka [email protected]

Instrumented code:

…
File.new(‘/tmp/request.log’,‘a’).syswrite(“quoted_boundary =
"#{quoted_boundary}", buf = #{buf}\n”)
buf =
buf.sub(/\A((?:.|\n)*?)(?:[\r\n]{1,2})?#{quoted_boundary}([\r\n]{1,2}|–)/n)
do
…
boundary_end = $2.dup
File.new(‘/tmp/request.log’,‘a’).syswrite(“boundary_end =
#{boundary_end}, $2 = #{$2.inspect}\n”)
“”
…
File.new(‘/tmp/request.log’,‘a’).syswrite(“buf.size =
#{buf.size}, content_length = #{content_length}\n”)

break if buf.size == 0

        break if content_length == -1
      end
      File.new('/tmp/request.log','a').syswrite("raise? boundary_end

= #{boundary_end}\n")

      raise EOFError, "bad boundary end of body part" unless

boundary_end=~/–/

Output from instrumented code:

quoted_boundary =
“------------------------------d138ece8d971”,
buf = <>

------------------------------d138ece8d971

$2 = “\r\n”
buf.size = 0, content_length = 124
quoted_boundary =
“------------------------------d138ece8d971”,
buf = application/pdf

------------------------------d138ece8d971–

$2 = “–”
buf.size = 2, content_length = -1