Clients hang on large PUTs to Mongrel::HttpHandler-based web service

Randy_Fischer · June 3, 2008, 11:10pm

Hi folks,

I have a problem with a storage web service our group wrote using
Mongrel::HttpHandler We have a consistent problem when using
http PUT to this service when the data is larger than about 4 GB.

The web service actually retrieves and processes the data, but the
clients hang - the TCP connection is still in the ESTABLISHED
state on the client side, but the TCP session no longer exists on
the server side (the temporary file is unlinked from the directory
and, somewhat later, from the mongrel process)

I’m using mongrel 1.1.4, and as far as clients go, I’ve tried curl and
a java-based application using the jakarta commons http client
software - same issue.

I’m wondering if this is a simple 32-bit int issue in the
ragel-generated
code?

Any advice on how to approach debugging/fixing this would be
appreciated - this is very repeatable.

Work-arounds would be met with almost equal glee.

Thanks,

-Randy Fischer

Randy_Fischer · June 4, 2008, 2:25am

On Tue, 3 Jun 2008 17:09:05 -0400
“Randy Fischer” [email protected] wrote:

Hi folks,

I’m wondering if this is a simple 32-bit int issue in the ragel-generated
code?

Shouldn’t be, since the ragel code is only used to parse the headers,
and when that’s done it then just streams 16k chunks from the socket to
a tempfile. Now, if your headers are 4G then I’d like to know how you
did that since Mongrel would block you hard.

Only thing that I could think of is that you aren’t setting a size
properly at some point. Either your header reports the wrong size, or
you’re not setting it.

–
Zed A. Shaw

Hate: http://savingtheinternetwithhate.com/
Good: http://www.zedshaw.com/
Evil: http://yearofevil.com/

Randy_Fischer · June 4, 2008, 2:31am

Randy,
Are you sure this is an issue with the size of the input and not the
amount
of time that the connection is left open?

Michael

Randy_Fischer · June 4, 2008, 6:12am

On Tue, Jun 3, 2008 at 8:30 PM, Michael D’Auria
[email protected] wrote:

Randy,
Are you sure this is an issue with the size of the input and not the amount
of time that the connection is left open?
Michael

I’ll check by using a smaller filesize that I know will work, using
curl’s bandwidth limit feature
to really increase the connection time.

Thanks for the suggestion!

-Randy

Randy_Fischer · June 4, 2008, 6:39am

On Tue, Jun 3, 2008 at 8:16 PM, Zed A. Shaw [email protected] wrote:

On Tue, 3 Jun 2008 17:09:05 -0400
“Randy Fischer” [email protected] wrote:

I’m wondering if this is a simple 32-bit int issue in the ragel-generated
code?

Shouldn’t be, since the ragel code is only used to parse the headers,
and when that’s done it then just streams 16k chunks from the socket to
a tempfile. Now, if your headers are 4G then I’d like to know how you
did that since Mongrel would block you hard.

Naw, it’s the content length in the body of a PUT, I ask since I saw

int content_length

in http11_parser.c

Only thing that I could think of is that you aren’t setting a size
properly at some point. Either your header reports the wrong size, or
you’re not setting it.

Easily double checked with tcpdump, and the curl dump headers stuff.
And so I will - thanks for the suggestion.

-Randy

Randy_Fischer · June 4, 2008, 10:25pm

On Wed, 4 Jun 2008 00:35:16 -0400
“Randy Fischer” [email protected] wrote:

On Tue, Jun 3, 2008 at 8:16 PM, Zed A. Shaw [email protected] wrote:

Naw, it’s the content length in the body of a PUT, I ask since I saw

int content_length

Well, looking in the source I can’t see where that’s actually used to
store the Content-Length header value. It actually seems to be dead.
Instead you have this line in http_request.rb:

content_length = @params[Const::CONTENT_LENGTH].to_i

Now, that means it relies on Ruby’s base integer type to store the
content length:

http://www.ruby-doc.org/core/classes/Fixnum.html

““A Fixnum holds Integer values that can be represented in a native
machine word (minus 1 bit). If any operation on a Fixnum exceeds this
range, the value is automatically converted to a Bignum.””

Which is kind of vague, but there’s a good chance it’s implemented as a
32-bit signed integer giving you a problem with a 4G content size. It
should be converted to a Bignum on overflow, but a quick test would be
to check the class of the content_length right after this line to see
what it’s getting.

–
Zed A. Shaw

Hate: http://savingtheinternetwithhate.com/
Good: http://www.zedshaw.com/
Evil: http://yearofevil.com/

Randy_Fischer · June 5, 2008, 2:19am

Great! I’ll check the content length - right now, it’s looking to
be some sort of network (maybe firewall) issue. When I have
it figured out I will report back. But instrumenting the mongrel
handler I wrote shows that it’s attempting to put out a reasonable
response header.

Unfortunately security on this system is difficult - I just got
sudo access to tcpdump… I started out as a sysadmin, but
man, they drive me crazy sometimes…

Did I say that mongrel rocks?

Thanks Zed.

-Randy

Randy_Fischer · July 14, 2008, 1:07am

Follow up to an old problem, finally solved, in case anyone else
stumbles across the same problem.

I have a problem with a storage web service our group wrote using
Mongrel::HttpHandler We have a consistent problem when using
http PUT to this service when the data is larger than about 4 GB.

Well, it turns out I could only repeat it consistently between two
particular systems. There was some back and forth on this
list, and I threw out the red herring that the http11_parser.c code
used an unsigned int for the content size. Zed pointed out that
particular variable was just dead code:

Instead you have this line in http_request.rb:

content_length = @params[Const::CONTENT_LENGTH].to_i

Now, that means it relies on Ruby’s base integer type to store the
content length:

Since @params[Const:CONTENT_LENGTH] is a string, ruby’s
to_i method can get it right, casting to a fixnum internally when
necessary - integer overflow was not the issue.

On Tue, Jun 3, 2008 at 8:30 PM, Michael D’Auria
[email protected] wrote:

Randy,
Are you sure this is an issue with the size of the input and not the
amount
of time that the connection is left open?
Michael

That turns out to be the correct answer, though I eliminated it
(incorrectly)
by using curl’s limit-bandwidth option to get times greater than that
exhibited by my 4GB transfers - those all worked.

What was causing the problem was the lag between the end of the
upload/request from the client, to the time when the server finally
sent a response after processing the request (the processing time was
entirely taken up with copying the upload as a temporary mongrel
file to its permanent disk file location).

Still, using tcpdump showed that the response was making it back
to the client from the server intact and correctly.

What was timing out was the firewall on the client system, which
was using statefull packet filtering (iptables on an oldish redhat
system). The dead time in the http request/response had
exceeded the time to live for the state tables. Turning off the
keep-state flag in the firewall rules allowed the transfer to
complete. Now it’s just a matter of tweaking the parameters so
we can get keep-state working again.

Thanks for all the help on this.

-Randy Fischer

Randy_Fischer · July 18, 2008, 7:58am

On Sun, 13 Jul 2008 19:03:54 -0400
“Randy Fischer” [email protected] wrote:

What was timing out was the firewall on the client system, which
was using statefull packet filtering (iptables on an oldish redhat
system). The dead time in the http request/response had
exceeded the time to live for the state tables. Turning off the
keep-state flag in the firewall rules allowed the transfer to
complete. Now it’s just a matter of tweaking the parameters so
we can get keep-state working again.

Ah, yes, classic mistake. People tend to think the client side just
works, but things like firewalls, routers, those stupid anti virus
programs, many times are more likely causes of trouble.

Good job.

Randy_Fischer · July 17, 2008, 4:44pm

On Jul 13, 2008, at 6:03 PM, Randy Fischer wrote:

Follow up to an old problem, finally solved, in case anyone else
stumbles across the same problem.

I have a problem with a storage web service our group wrote using
Mongrel::HttpHandler We have a consistent problem when using
http PUT to this service when the data is larger than about 4 GB.

I wrote a mongrel handler (and a small patch to mongrel) about a year
ago that handled PUT a little more gracefully than the default. It
prevented mongrel from blocking during the upload.

Want me to send you the code? I imagine it’s a tad out of date now,
but the idea was sound.

cr