Forum: Ruby on Rails Performance issue.. after a while

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
531eb73f8fbf05a197721d02b4e6aadb?d=identicon&s=25 Bogdan Ionescu (Guest)
on 2006-02-25 21:34
(Received via mailing list)
Hello,

I have an project running on a dedicated server:
Debian, P4 CPU 3.00GHz, 1GB RAM,
ruby 1.8.4 (2005-12-24) [x86_64-linux],
rails (1.0.0), activerecord (1.13.2)
lighttpd-1.4.10 + fastcgi + mysql 5.0
7 dispatchers.

The project is a game, so a typical user would visit 100+ pages.
When the server is busiest, it gets 35-40k requests/hour.

For some misterious reason after a number of hours the whole thing
starts
moving slower, typically the server load goes up to 5-8 and I know that
I
have to either start killing dispatch.fcgi processes, or simply restart
the
whole thing.
It is definitely not the fact that the server cannot deal the number of
requests. It appears that some of the dispatch.fcgi processes simply
bring
the server to a semihalt. Killing the culprit makes the load go under 1%
and
the game itself several times faster. The problem is that I never know
which
one is the one causing the problems.
I have attempted to find and fix memory leaks, I have removed rmagick
from
file_column since it was said that rmagick was causing leaks;
I have removed the unnecessary services, I am keeping the lighttpd
configuration to a minimum, yet, I pretty much have to restart the
server
daily.
Are there any special tricks that have to be done to have the
dispatchers
behave? And maybe to use less RAM? ;)
Any suggestions are welcome.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
avirtual  8177  0.1  0.2  20124  2232 ?        S    13:35   0:11
lighttpd -f
/root/lighttpd.conf
avirtual  8178  2.2 11.2 147620 115436 ?       R    13:35   2:37
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8179  2.0 14.2 177640 145588 ?       S    13:35   2:22
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8180  2.1 13.6 172560 140140 ?       S    13:35   2:31
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8181  1.4  0.1 178156  1512 ?        S    13:35   1:43
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8182  2.2  9.1 131236 93564 ?        R    13:35   2:34
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8184  2.0 14.1 177920 145164 ?       S    13:35   2:23
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8186  2.7 13.8 173764 141844 ?       S    13:35   3:12
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
42172acdf3c6046f84d644cb0b94642c?d=identicon&s=25 Pat Maddox (pergesu)
on 2006-02-25 22:05
(Received via mailing list)
I would go through all your code and make sure there are no
possibilities that an infinite loop occurs.  Every time I've had an
app go fine for a while and then suddenly start crawling, it's because
some infinite loop that I didn't notice occurred.  In a game I'm sure
there are lots of possibilities for things like this to happen.

Pat
8e44c65ac5b896da534ef2440121c953?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-02-25 23:23
(Received via mailing list)
Bogdan-

	I have had troubles with lighttpd1.4.10 and I am currently running
my apps on either 1.4.8 or 1.4.9. 1.4.9 has been working great for me
on debian specifically. And I got something similar with 1.4.10 where
i got zombied fcgi's. So try downgrading to 1.4.9 or 1.4.8 and see if
that solves your problem. And are you running your fcgi's with unix
sockets or over IP:PORTNUM? You might want to run the fcgi's each on
their own consecutive port numbers as standalone spawn-fcgi's and let
lighty just load balance between them. This way you can reap them
easily with script/process/reaper. Or you can grep through the ps
awxx | grep dispatch.fcgi results and see which ones are zombied and
kill them and respawn. You could do this in a script.

Cheers-
-Ezra
531eb73f8fbf05a197721d02b4e6aadb?d=identicon&s=25 Bogdan Ionescu (Guest)
on 2006-02-25 23:39
(Received via mailing list)
I can't really blame 1.4.10 for the troubles. I've upgraded to 1.4.10
only
several days ago, to have a bug fixed.
I'm having a basic lighttpd.conf which uses unix sockets.
I will look for some documentation/blogs regarding spawning fcgi's on
different ports.
If you have a configuration file that I could look upon, it would be
great.

Thanks,
Bogdan
8e44c65ac5b896da534ef2440121c953?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-02-26 00:00
(Received via mailing list)
58479f76374a3ba3c69b9804163f39f4?d=identicon&s=25 Eric Hodel (Guest)
on 2006-02-26 03:30
(Received via mailing list)
On Feb 25, 2006, at 12:32 PM, Bogdan Ionescu wrote:

> I have an project running on a dedicated server:
> Debian, P4 CPU 3.00GHz, 1GB RAM,
> ruby 1.8.4 (2005-12-24) [x86_64-linux],
> rails (1.0.0), activerecord (1.13.2)
> lighttpd-1.4.10 + fastcgi + mysql 5.0
> 7 dispatchers.
>
> The project is a game, so a typical user would visit 100+ pages.
> When the server is busiest, it gets 35-40k requests/hour.

You're using caching, right?  Judging from your process run times
Rails doesn't see most of those requests.  Your processes should
accumulate several minutes of CPU time if you're serving nearly a
million requests per day.

> For some misterious reason after a number of hours the whole thing
> starts moving slower, typically the server load goes up to 5-8 and
> I know that I have to either start killing dispatch.fcgi processes,
> or simply restart the whole thing.

 From your process sizes you're probably spending most of your time
swapping.

>
> COMMAND
> avirtual  8184  2.0 14.1 177920 145164 ?       S    13:35   2:23 /
> usr/bin/ruby1.8
> avirtual  8186  2.7 13.8 173764 141844 ?       S    13:35   3:12 /
> usr/bin/ruby1.8

Judging from your process times I doubt you need seven fastcgi
processes.  It looks like you sent this mail nine hours (at 22:32)
after starting these processes and they've each accumulated less than
three minutes of CPU time.  Try running just four.

How big is your app when you start it?  130MB to 180MB virtual is
alarmingly large.

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com
531eb73f8fbf05a197721d02b4e6aadb?d=identicon&s=25 Bogdan Ionescu (Guest)
on 2006-02-26 10:31
(Received via mailing list)
On 2/26/06, Eric Hodel <drbrain@segment7.net> wrote:
>
>
> You're using caching, right?  Judging from your process run times
> Rails doesn't see most of those requests.  Your processes should
> accumulate several minutes of CPU time if you're serving nearly a
> million requests per day.


The content is dynamic with no static pages. The 'ps' was about 1 hour
after
killing several processes.
This is how it looks after 15 hours:
(note that in the past 4-5 hours, the server was more or less idle)

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
avirtual  8179  1.7 10.3 209064 105580 ?       S    Feb25  15:24
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8181  1.5 13.8 174060 142088 ?       S    Feb25  13:19
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8182  1.8 15.5 192668 159224 ?       S    Feb25  15:59
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8631  1.6  0.1 180692  1520 ?        S    Feb25  12:02
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8669  1.8 14.6 183140 149996 ?       S    Feb25  13:42
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8918  0.0  0.1 129200  1472 ?        S    Feb25   0:02
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual  8927  1.5 14.8 183840 151868 ?       S    Feb25  10:51
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi



>
> Judging from your process times I doubt you need seven fastcgi
> processes.  It looks like you sent this mail nine hours (at 22:32)
> after starting these processes and they've each accumulated less than
> three minutes of CPU time.  Try running just four.
>
> How big is your app when you start it?  130MB to 180MB virtual is
> alarmingly large.


Lighttpd was started 2-3 hours before I took the ps. It was not a rush
hour.
Right after starting lighttpd and idle dispatch.fcgi takes 53-63MB of
RAM
The one or two that are active will quickly jump to 131MB.
After that in a matter of hours all of them will jump to 200-220MB.

In the meantime I've lowered the number of dispatchers to 5 (though it
seems
the dispatchers are simply attempting to steal as much RAM a possible)
and
compiled ruby on the server.
I am also going to try to spawn fcgi's as separate processes and see how
it
goes.
Bogdan
531eb73f8fbf05a197721d02b4e6aadb?d=identicon&s=25 Bogdan Ionescu (Guest)
on 2006-02-26 18:07
(Received via mailing list)
Changing the way the dispatchers are started seems to have generated
immediate results.

Lighttpd.conf before:
fastcgi.server = ( ".fcgi" =>
  ( "localhost" =>
      ("min-procs" => 3, "max-procs" => 5,   "socket"    =>
"/home/avirtual/railsData/log/fcgi.socket",
        "bin-path"  => "/home/avirtual/railsData/public/dispatch.fcgi",
"bin-environment" => ( "RAILS_ENV" => "production" )  )  ))


Lighttpd.conf now:
fastcgi.server = ( ".fcgi" =>    ( "localhost" =>
            ( "socket" =>
"/home/avirtual/railsData/tmp/railsData-0.socket"
),
            ( "socket" =>
"/home/avirtual/railsData/tmp/railsData-1.socket"
),
            ( "socket" =>
"/home/avirtual/railsData/tmp/railsData-2.socket"
),
            ( "socket" =>
"/home/avirtual/railsData/tmp/railsData-3.socket"
),
            ( "socket" =>
"/home/avirtual/railsData/tmp/railsData-4.socket"
)            )    )

I've created a spawner described in:
http://jamis.jamisbuck.org/articles/2006/02/11/tip...

Somehow the dispatchers use less RAM and they do not jump to 130M
instantly.
It is obvious that the requests are balanced in a logical way now, and
that
the first dispatchers handle most requests.
avirtual 19450  7.9  5.9  89876 60584 ?        S    07:08  23:17
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual 19452  1.3  5.6  87108 57848 ?        S    07:08   3:57
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual 19454  0.2  3.7  67476 38204 ?        S    07:08   0:51
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual 19456  0.0  3.9  70088 40756 ?        S    07:08   0:17
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
avirtual 19458  0.0  3.9  70160 40824 ?        S    07:08   0:07
/usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi

The performance has been linear during the day. I will know more in a
couple
of hours or tomorrow, but it seems that although I haven't found any
'omygod
what a stupid endless loop' in the code, changing the configuration
helped
more than I could have anticipated.

Now, maybe someone more experienced could try to explain why the
standard
lighttpd configuration was so bad in my case.

Bogdan
8e44c65ac5b896da534ef2440121c953?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-02-26 23:00
(Received via mailing list)
On Feb 26, 2006, at 9:04 AM, Bogdan Ionescu wrote:

> "production" )  )  ))
>             ( "socket" => "/home/avirtual/railsData/tmp/
> It is obvious that the requests are balanced in a logical way now,
> usr/bin/ruby1.8 /home/avirtual/railsData/public/dispatch.fcgi
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

Bogdan-

	There is nothing wrong with the rails standard lighty conf that does
the min-procs/max-procs. But ever since lighty 1.3.x somtime the
dynamic spawning has been removed from lighty. So the min-procs
desn't have any effect at all and lighty will always spawn what you
set max-procs to. But for some unknown reason, the way this works can
get a little weird under heavier load with more fcgi's. The load
balancing between fcgi's doesn't seem to work as well with the min/
max-procs directives and sockets. So like you I have had much better
luck with explicitely listing all fcgi listeners in lighty and using
spawn-fcgi to load the fcgi listeners stand alone.

	I have also had really good luck with using IP:PORTNUM listeners for
the fcgi's instead of sockets. It seems to me that lighty has an
easier time load balancing between listeners when it doesn't have to
think about it as much and the fcgi's are each listed explicitely.

	I'm glad its running for you. 200-250 MB of ram for each fcgi seems
a bit excessive. My fcgi's are usually between 25-80MB ram each. But
you are running a game so maybe they are each doing more work and
holding more in memory then I am.

Cheers-
-Ezra
531eb73f8fbf05a197721d02b4e6aadb?d=identicon&s=25 Bogdan Ionescu (Guest)
on 2006-02-26 23:15
(Received via mailing list)
It's been 10 hours since I've started lighttpd with the new
configuration.
The top dispatch.fcgi uses now 100MB. The other 4 are at 88-90MB.
Plus, no lag at all and the fifth dispatcher barely gets used.
Also the system load is under 1%.
There is something magical in this configuration ;)
8e44c65ac5b896da534ef2440121c953?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-02-26 23:27
(Received via mailing list)
On Feb 26, 2006, at 2:15 PM, Bogdan Ionescu wrote:

> It's been 10 hours since I've started lighttpd with the new
> configuration.
> The top dispatch.fcgi uses now 100MB. The other 4 are at 88-90MB.
> Plus, no lag at all and the fifth dispatcher barely gets used.
> Also the system load is under 1%.
> There is something magical in this configuration ;)
>

Cool, thats how lighty should behave ;^)

-Ezra
This topic is locked and can not be replied to.