Proxy errors with apache2.2.3 + mongrels

I’ve posted this to rails-deployment as well.

I have to administer a medium size rails app (1’5 million requests
each day), recently I’ve switched from lighttpd + fcgi to apache +
mongrel. In the following lines I am going to describe the platform:

All machines are running Debian Etch, with 4 gb ram and dual core
intel32 processors. Web server runs debian’s apache 2.2.3-4 mpm worker
package and also NFS server, Two app servers runs 14 mongrels ach
one. There is two db servers, master running mysql5 (debian package)
with innodb and an slave server with myisam tables. This slave server
has 8 gb ram and has a memcached with 4 gb reserved.

We are experiencing problems in peak hours with timeout and slow
navigation on pages, also there are problems with some file uploads.
I’m investigating on improving quality of service on the site, and
I’ve reached to some proxy errors in apache log, with more than 300
simultaneous stablished connections on the webserver[1].

There are 4 kinds of proxy errors:

[Wed Oct 17 06:24:14 2007] [error] (111)Connection refused: proxy:
HTTP: attempt to connect to 192.168.10.154:21004 (bomer) failed

[Tue Oct 16 23:28:01 2007] [error] [client 67.142.130.19] proxy: error
reading status line from remote server bomber, referer:
http://www.google.com/search?hl=es&q=tipos+de+herramintas++++&lr=

[Wed Oct 17 07:14:11 2007] [error] (70007)The timeout specified has
expired: proxy: prefetch request body failed to 192.168.10.154:21013
(kgb) from 200.66.13
3.30 ()

[Wed Oct 17 07:16:54 2007] [error] [client 201.141.199.86] proxy:
Error reading from remote server returned by /myfiles/clubsx/Sin t??
tulo 007_0001(1).jpg, r
eferer:
http://www.espacioblog.com/clubsx/post/2006/06/04/los-archivos-chiwasnaked

With slow or moderated traffic mongrels works well (they use cpu and
appears in first lines of ‘top’ command) but with heavy traffic,
mongrels go down.

I am looking for help with solving this issues, I’ve been looking in
google about proxy errors, but what I’ve found haven’t helped, I can
provide more details if they can help.

Also I’ve been suggested to use nginx web server instead of apache,
but I am reluctant to use it, unless proven better than apache, can
anyone point me to technical argument for using nginx?

[1] $ netstat -nat |grep ‘:80’| grep ‘EST’ | wc -l

Thanks for your time.

Jacobo García

“Error reading status line” means Mongrel closed the socket without
sending any content back.

IT WOULD BE REALLY HELPFUL IF MONGREL WOULD SEND BACK “503 Server Busy”
WHEN IT’S BUSY. That way we would know immediately whether the
num_processors limit had been reached or not. HUGE TIMESAVER FOR
MONGREL USERS!

====

Next…

Seeing your load balancer config, and mongrel_cluster.yml would help
narrow things down & simplify the discussion.

Also, add this to your log file config:

%D ( request duration -)
%{BALANCER_WORKER_NAME}e

Request duration, in combination with start time and
balancer_worker_name, is useful in finding cases where requests to a
back end app server overlap. These numbers will also allow you to
construct a histogram of concurrent requests each second.

thanks.

mongrel_cluster.yml:


port: “21000”
docroot: /var/www/mysite/current/public/
cwd: /var/www/mysite/current
pid_file: /var/www/mysite/tmp/pids/mongrel.pid
log_file: /var/www/mysite/current/log/mongrel.log
environment: production
servers: 14
user: deploys
group: www-data

proxy balancer:
ServerName my.ip.add.ress
NameVirtualHost *:80
<Proxy balancer://mongrel_cluster>
BalancerMember http://bomber:21000 keepalive=on max=1 retry=30
BalancerMember http://bomber:21001 keepalive=on max=1 retry=30
BalancerMember http://bomber:21002 keepalive=on max=1 retry=30
BalancerMember http://bomber:21003 keepalive=on max=1 retry=30
BalancerMember http://bomber:21004 keepalive=on max=1 retry=30
BalancerMember http://bomber:21005 keepalive=on max=1 retry=30
BalancerMember http://bomber:21006 keepalive=on max=1 retry=30
BalancerMember http://bomber:21007 keepalive=on max=1 retry=30
BalancerMember http://bomber:21008 keepalive=on max=1 retry=30
BalancerMember http://bomber:21009 keepalive=on max=1 retry=30
BalancerMember http://bomber:21010 keepalive=on max=1 retry=30
BalancerMember http://bomber:21011 keepalive=on max=1 retry=30
BalancerMember http://bomber:21012 keepalive=on max=1 retry=30
BalancerMember http://bomber:21013 keepalive=on max=1 retry=30
BalancerMember http://bomber:21014 keepalive=on max=1 retry=30
BalancerMember http://bomber:21015 keepalive=on max=1 retry=30
BalancerMember http://bomber:21016 keepalive=on max=1 retry=30
BalancerMember http://bomber:21017 keepalive=on max=1 retry=30
BalancerMember http://bomber:21018 keepalive=on max=1 retry=30
BalancerMember http://bomber:21019 keepalive=on max=1 retry=30
BalancerMember http://bomber:21020 keepalive=on max=1 retry=30
BalancerMember http://bomber:21021 keepalive=on max=1 retry=30
BalancerMember http://bomber:21022 keepalive=on max=1 retry=30
BalancerMember http://bomber:21023 keepalive=on max=1 retry=30
BalancerMember http://bomber:21024 keepalive=on max=1 retry=30
BalancerMember http://bomber:21025 keepalive=on max=1 retry=30
BalancerMember http://bomber:21026 keepalive=on max=1 retry=30
BalancerMember http://bomber:21027 keepalive=on max=1 retry=30
BalancerMember http://bomber:21028 keepalive=on max=1 retry=30
BalancerMember http://bomber:21029 keepalive=on max=1 retry=30
BalancerMember http://kgb:21000 keepalive=on max=1 retry=30
BalancerMember http://kgb:21001 keepalive=on max=1 retry=30
BalancerMember http://kgb:21002 keepalive=on max=1 retry=30
BalancerMember http://kgb:21003 keepalive=on max=1 retry=30
BalancerMember http://kgb:21004 keepalive=on max=1 retry=30
BalancerMember http://kgb:21005 keepalive=on max=1 retry=30
BalancerMember http://kgb:21006 keepalive=on max=1 retry=30
BalancerMember http://kgb:21007 keepalive=on max=1 retry=30
BalancerMember http://kgb:21008 keepalive=on max=1 retry=30
BalancerMember http://kgb:21009 keepalive=on max=1 retry=30
BalancerMember http://kgb:21010 keepalive=on max=1 retry=30
BalancerMember http://kgb:21011 keepalive=on max=1 retry=30
BalancerMember http://kgb:21012 keepalive=on max=1 retry=30
BalancerMember http://kgb:21013 keepalive=on max=1 retry=30
BalancerMember http://kgb:21014 keepalive=on max=1 retry=30
BalancerMember http://kgb:21015 keepalive=on max=1 retry=30
BalancerMember http://kgb:21016 keepalive=on max=1 retry=30
BalancerMember http://kgb:21017 keepalive=on max=1 retry=30
BalancerMember http://kgb:21018 keepalive=on max=1 retry=30
BalancerMember http://kgb:21019 keepalive=on max=1 retry=30
BalancerMember http://kgb:21020 keepalive=on max=1 retry=30
BalancerMember http://kgb:21021 keepalive=on max=1 retry=30
BalancerMember http://kgb:21022 keepalive=on max=1 retry=30
BalancerMember http://kgb:21023 keepalive=on max=1 retry=30
BalancerMember http://kgb:21024 keepalive=on max=1 retry=30
BalancerMember http://kgb:21025 keepalive=on max=1 retry=30
BalancerMember http://kgb:21026 keepalive=on max=1 retry=30
BalancerMember http://kgb:21027 keepalive=on max=1 retry=30
BalancerMember http://kgb:21028 keepalive=on max=1 retry=30
BalancerMember http://kgb:21029 keepalive=on max=1 retry=30

I’ve been added the log parameters you’ve suggested to apache, i’ll put
em in a future messages.

thanks

Robert M. wrote:

“Error reading status line” means Mongrel closed the socket without
sending any content back.

IT WOULD BE REALLY HELPFUL IF MONGREL WOULD SEND BACK “503 Server Busy”
WHEN IT’S BUSY. That way we would know immediately whether the
num_processors limit had been reached or not. HUGE TIMESAVER FOR
MONGREL USERS!

====

Next…

Seeing your load balancer config, and mongrel_cluster.yml would help
narrow things down & simplify the discussion.

Also, add this to your log file config:

%D ( request duration -)
%{BALANCER_WORKER_NAME}e

Request duration, in combination with start time and
balancer_worker_name, is useful in finding cases where requests to a
back end app server overlap. These numbers will also allow you to
construct a histogram of concurrent requests each second.

thanks.

I use munin to monitor machines and all of them are in excelent shape :slight_smile:

Robert M. wrote:

Another avenue to consider is everything in order on the systems
involved? If Rails is chewing up lots of RAM you could be swapping…
here’s some diagnostics I usually run ( caveat: I’m not a sysadmin, nor
do I play one on TV ):

sar -rB 1 0
sar 1 0
sar -n EDEV 1 0 (network IO incl. errors)
sar -d 1 0 ( look for excessive block device IO )?

What sorts of things do other folks on this list look at?

Another avenue to consider is everything in order on the systems
involved? If Rails is chewing up lots of RAM you could be swapping…
here’s some diagnostics I usually run ( caveat: I’m not a sysadmin, nor
do I play one on TV ):

sar -rB 1 0
sar 1 0
sar -n EDEV 1 0 (network IO incl. errors)
sar -d 1 0 ( look for excessive block device IO )?

What sorts of things do other folks on this list look at?

Jacobo Garcia escribió:

[Tue Oct 16 23:28:01 2007] [error] [client 67.142.130.19] proxy: error
reading status line from remote server bomber, referer:
tipos de herramintas - Buscar con Google

We had this problem and solved it adding the following lines in our
apache conf:

#Fix for Apache bug 39499
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1

Regards!

Rafael García wrote:

Jacobo Garcia escribió:

[Tue Oct 16 23:28:01 2007] [error] [client 67.142.130.19] proxy: error
reading status line from remote server bomber, referer:
tipos de herramintas - Buscar con Google

We had this problem and solved it adding the following lines in our
apache conf:

#Fix for Apache bug 39499
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1

Regards!

I’ve already has this lines (in virtual host conf) that is different
than proxy conf, should I put them in proxy conf?

Rafael García wrote:

Jacobo Garcia escribió:

We had this problem and solved it adding the following lines in our

I’ve already has this lines (in virtual host conf) that is different
than proxy conf, should I put them in proxy conf?

I have it in virtualhost conf, not in proxy conf.

Maybe reading this post find the solution:
http://www.overset.com/2007/04/03/mod_proxy-and-internet-explorer-problems/

With the fixed proposed in the article and some tweaks in balancermember
settings (retry=5) no max, no keepalive=on I’ve been able to reduce
proxy errors, but still having some “Connection refused: proxy: HTTP:
attempt to connect to 192.168.10.153:21000 (bomber) failed”

I am not sure how this setup will behave under heavy load, i’m going to
test it this evening, and see what hapens.

I have been testing nginx as well and has been running flawlessly,
consuming very little memory, I think it is worth a try.

What I’m not seeing in those bug reports is “Connection Refused” errors.

As an aside, I bet that as of Apache 2.2.4 "SetEnv proxy-nokeepalive 1 "
is no longer needed. One cause of the “error reading headers” error
was that up until 2.2.4 mod_proxy wasn’t checking to see whether a
previously used socket was still good or not before using it for the
next request. The fix is in 2.2.4. It makes sense that turning
keepalives off for proxy connections works. It probably makes Apache
reconnect for each request: a blank slate. That’s a practical option
if you’ve got a few dozen to a hundred requests per second.

So two possible causes of error reading response headers are:

  1. Mongrel has hit 950 workers ( or whatever num_processors is ) –
    unlikely, I suppose, unless there’s a bug in Mongrel’s tracking of
    worker list length.
  2. Apache attempted to read/write to a bad socket

The no-keepalives probably addresses the second case. I’m not sure what
it does in the first case. The way mod_proxy checks is to read a zero
byte buffer off the socket. If Mongrel just closes right away I’m not
sure what state the socket’s in at the client ( mod_proxy ) side of
things – whether it’s in close_wait or whatever – and what a zero-byte
read would return.

This is one reason why returning a 503 from Mongrel would be helpful –
it’d be nice to rule out the num_processors accept/close right off the
bat, however unlikely it may be deemed.

It might be worth upgrading to Apache 2.2.6 and seeing whether that
fixes the issue. It’s a crap shoot, done without fully understanding
the problem, but if it works, what the heck – it makes your day better.

Anyhow, I’m really interested in the outcome of this – we rely on
mod_proxy_balancer, and I want to be sure I understand it.

Jacobo Garcia escribió:

We had this problem and solved it adding the following lines in our

I’ve already has this lines (in virtual host conf) that is different
than proxy conf, should I put them in proxy conf?

I have it in virtualhost conf, not in proxy conf.

Maybe reading this post find the solution:
http://www.overset.com/2007/04/03/mod_proxy-and-internet-explorer-problems/