10% 500 Errors

bbazzarrakk · March 10, 2008, 2:20pm

I have nginx running as a proxy to about twelve upstream app servers,
serving a rails app. Nothing else really in this configuration.

I am seeing about 10% of requests throwing 500 errors, and this in my
error
log:

2008/03/10 08:41:05 [info] 6632#0: *12005 client closed prematurely
connection while sending response to client, client: xxx, server: xxx,
request: xxx, host: xxx, referrer: xxx

I’m also seeing lots of:

client xxx closed keepalive connection

but that strikes me as normal, and I’m seeing:

client closed prematurely connection while reading client request line,
client: xxx, server: xxx

I have googled far and wide, and the best answers I came up with were to
add
these lines to my conf:

proxy_ignore_client_abort on;
proxy_next_upstream error;

but, that doesn’t seem to have solved the problem.

Any ideas?

Thanks in advance.

bbazzarrakk · March 10, 2008, 2:38pm

I’ve run into things like that before, though not that specifically.

I was using round-robin load balancing and one of my mongrel instances
was ever so slightly misconfigured, causing every request that went to
it to fail. It seemed more or less random, and that was the annoying
part.

Philip Ratzsch
Software Engineer Developer I
Information Systems, Rackspace
[email protected]
210-312-3191 [w]
706-799-9799 [c]

Opinions expressed are mine and do not necessarily reflect those of my
employer.

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use
of the
individual or entity to which this message is addressed, and unless
otherwise
expressly indicated, is confidential and privileged information of
Rackspace.
Any dissemination, distribution or copying of the enclosed material is
prohibited.
If you receive this transmission in error, please notify us immediately
by e-mail
at [email protected], and delete the original message.
Your cooperation is appreciated.

bbazzarrakk · March 10, 2008, 2:38pm

I was seeing something similar with PHP5 fastcgi and lighttpd, though it
was a lot more than 10% - maybe 25-50. I’m getting a little worried as I
think I may be seeing the same thing at around 5-10% with nginx.

During my investigations into why this was happening in lighttpd, I came
across the following paragraph in the mod_fcgi docs:

causing errors.
Not too sure if your rails app/mongrel is restarting processes after a
set limit and coming across the same race condition?

If this problem is could become aparent in nginx it would be great if
there was a plugin to spawn and manage fcgi threads which could limit
the number of connections to each backend.

HTH.
Phill

James Golick wrote:

I’m also seeing lots of:

proxy_ignore_client_abort on;
proxy_next_upstream error;

but, that doesn’t seem to have solved the problem.

Any ideas?

Thanks in advance.

–

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]

Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

bbazzarrakk · March 10, 2008, 2:51pm

Yeah, I don’t think it’s mongrel, as they’re all started the identical
way
(thru a process monitor).

Also, every time I get the error, I see one of those issues in the nginx
error.log. But, I guess it still could be happening upstream…

I will test them all and report back.

On Mon, Mar 10, 2008 at 9:29 AM, Phillip B Oldham <

bbazzarrakk · March 10, 2008, 2:56pm

On Mon, Mar 10, 2008 at 09:12:35AM -0400, James Golick wrote:

I’m also seeing lots of:

proxy_ignore_client_abort on;
proxy_next_upstream error;

The default

proxy_next_upstream error timeout invalid_header;

is better.

but, that doesn’t seem to have solved the problem.

Any ideas?

Do you see 500, 502, or 504 errors ? They are very different things.
Any messages logged at info level is not caused by 50X error.

bbazzarrakk · March 10, 2008, 3:00pm

Okay, I’ll change it back.

What should I set my log level to to determine which type of error it
is?

J.

bbazzarrakk · March 10, 2008, 3:02pm

On Mon, Mar 10, 2008 at 09:53:01AM -0400, James Golick wrote:

What should I set my log level to to determine which type of error it is?

egrep ‘[(alert|crit|error)]’ logfile

bbazzarrakk · March 10, 2008, 2:53pm

Nothing… all of the mongrels respond normally when hit without nginx.

This has got to be an nginx issue…

bbazzarrakk · March 10, 2008, 3:10pm

On Mon, Mar 10, 2008 at 09:55:23AM -0400, James Golick wrote:

Nginx will always log an error when there’s a 500?

Yes, at least I have tried to log them all. The log is single way
to know about the problems.

So what do you see - 500, 502, 503, or 504 ?

bbazzarrakk · March 10, 2008, 3:03pm

Scratch that, it’s already at debug, and you’re right, those errors are
at
info level.

Nginx will always log an error when there’s a 500?

bbazzarrakk · March 10, 2008, 3:19pm

On Mon, Mar 10, 2008 at 10:06:27AM -0400, James Golick wrote:

Nothing.

I’m thinking now that these must be coming occasionally from my upstream
servers.

I mean what do you see in access_log - 500, 502, etc ?

bbazzarrakk · March 10, 2008, 3:14pm

Nothing.

I’m thinking now that these must be coming occasionally from my upstream
servers.

Thanks for your help

bbazzarrakk · March 10, 2008, 3:25pm

But what does the production.log or mongrel logs say?

I mean, can you find errors in the backend logs matching the date/time
for the errors in the nginx log?

–
Aníbal Rojas

http://anibal.rojas.com

bbazzarrakk · March 10, 2008, 3:26pm

That’s the really weird thing - nothing.

It seems like maybe my upstream is responding with 200, but actually
showing
a 500-style error?

bbazzarrakk · March 10, 2008, 3:37pm

On Mon, Mar 10, 2008 at 10:18:53AM -0400, James Golick wrote:

That’s the really weird thing - nothing.

It seems like maybe my upstream is responding with 200, but actually showing
a 500-style error?

You may log $upstream_status in access_log to see an exact upstream
status.

bbazzarrakk · March 10, 2008, 3:46pm

Still says 200. Somebody is throwing this error and won’t admit to it.
I’m
guessing it’s mongrel, but at this point, I’ve really got no idea.

bbazzarrakk · March 10, 2008, 4:11pm

Yes, if the failsafe handler is activated, basically an exception
thrown in the exception handler, then rails/mongrel returns a 200 code
with a response body that contains a 500 error message.

bbazzarrakk · March 10, 2008, 3:33pm

Nothing. There’s nothing in the logs.

According to the mongrel mailing list, it can raise 500s when it’s in
err
state. But, does it respond with an incorrect error code? I dunno.

I mean, I’m not seeing anything in any error logs, and nginx is
reporting a
200 for all requests. wtf?

On Mon, Mar 10, 2008 at 10:17 AM, AnÃbal Rojas [email protected]

bbazzarrakk · March 10, 2008, 4:14pm

James,

As some kind of "last resource"...

What about hacking a quick controller to respond with the

offending codes and check how are they being handled?

–
Aníbal

bbazzarrakk · March 10, 2008, 4:25pm

Dave - is there any common reason that might be happening intermittently
on
just about any controller/action combination for an app?

On Mon, Mar 10, 2008 at 11:04 AM, AnÃbal Rojas [email protected]

10% 500 Errors

I was using round-robin load balancing and one of my mongrel instances was ever so slightly misconfigured, causing every request that went to it to fail. It seemed more or less random, and that was the annoying part.

I was using round-robin load balancing and one of my mongrel instances
was ever so slightly misconfigured, causing every request that went to
it to fail. It seemed more or less random, and that was the annoying
part.