Server optimizations for php

Dear all,
Hi,

I am planning to move a 16 node php cluster to a couple of machines.

The current machines have 2 GB RAM and they do the task of nginx plus
they
do the task of php.
Each machine is running 5 php processes using php-fpm.

I would like to move all php requests to machines with 64GB RAM and a
few
CPU cores 4-8.
The systems are running linux.

My question is what are the performance settings that I can do on the
php
processing machines.

Hi. Jure,

Thanks for the valuable advice.
I will look in the cool-thread servers from Sun. We are usually buying
from
Sun but moslty the x64 server.

The php application is a typical CMS for a hosting company.

On Fri, 23 Jan 2009 15:24:01 +0100
Atif G. [email protected] wrote:

Hi. Jure,

Thanks for the valuable advice.
I will look in the cool-thread servers from Sun. We are usually buying
from Sun but moslty the x64 server.

The php application is a typical CMS for a hosting company.

Then you should analyze it further. Are you sure your bottleneck is
really
cpu?

Because “typical CMS” usualy means poorly designed database and there’s
where your bottleneck is. For those kind of problems it’s usually more
efficient at the longer term to throw expirienced programmers at the
problem :slight_smile:

Jure Pečar
http://jure.pecar.org
http://f5j.eu

Hi Jure,

The programs have been quiet optimized already and now we need to throw
more
hardware on it.

Jure Pečar wrote:

On Fri, 23 Jan 2009 15:24:01 +0100
Atif G. [email protected] wrote:

Hi. Jure,

Thanks for the valuable advice.
I will look in the cool-thread servers from Sun. We are usually buying
from Sun but moslty the x64 server.

I tested a cool-thread t2250 with 64 threads from sun a couple of weeks
ago. My conclusion is that for our php application was that one thread
wasn’t powerful enough to serve a php page fast enough. So in our case
we would end up with a lot of parallel but slower processes. Our current
x86_64 hardware could deliver the pages about 2 secs faster per php-cgi
process.

The company I work for (http://www.hyves.nl) now has over 500
webservers, serving +200M pageviews a day and 17.7M per hour max. This
is done with quadcores or 2 x quadcores with 8G mem. I’m currently
migrating them to nginx + php-fpm because we gain about 20% performance
over apache.

Some tips I would like to give you,

  • use a optcacher like eaccelerator, apc or xcache, you can win about 8
    times performance.
  • benchmark what will be the ideal number of php-cgi processes for your
    application, too much will lead to unnecessary context switches which
    costs performance, waits for backend connections etc. Not enough
    processes will lead to wait times between nginx and php fastcgi. For our
    application I spawn about 5 php-cgi’s per cpu core.
  • If nginx and php-fpm run on the same host use unix sockets, these are
    slightly faster than tcp sockets.
  • cache complex sql queries in memcached or in shared memory (all above
    optcode caches provide api’s for it).
  • Tune your nginx config accordantly, we also serve static content from
    the same webservers. We set totally different headers like expire etc,
    to save bandwidth and cpu cycles which are matched by location regexes.

Don’t know if all the rules apply in your environment but see what you
can use from the above tips :slight_smile:

Marlon de Boer
System A. http://www.hyves.nl

On Fri, Jan 23, 2009 at 4:46 PM, Marlon de Boer [email protected] wrote:

Jure Pečar wrote:

On Fri, 23 Jan 2009 15:24:01 +0100
Atif G. [email protected] wrote:

Hi. Jure,

Thanks for the valuable advice.
I will look in the cool-thread servers from Sun. We are usually buying
from Sun but moslty the x64 server.

Hi Marlon,
firstly thanks for your helpful response.
Please see my comments below.

I tested a cool-thread t2250 with 64 threads from sun a couple of weeks
ago. My conclusion is that for our php application was that one thread
wasn’t powerful enough to serve a php page fast enough. So in our case
we would end up with a lot of parallel but slower processes. Our current
x86_64 hardware could deliver the pages about 2 secs faster per php-cgi
process.

I have ordered a T5120 also for a test.
Usually we stick with x86_64 systems too.
If I understand correctly you are suggesting to stay on a x86_64 system
for
the php processors.
We use also nginx+php-cgi+fpm

The company I work for (http://www.hyves.nl) now has over 500
webservers, serving +200M pageviews a day and 17.7M per hour max. This
is done with quadcores or 2 x quadcores with 8G mem. I’m currently
migrating them to nginx + php-fpm because we gain about 20% performance
over apache.

Definitely. We have recently moved from apache+mod-php to
nginx+php-cgi+fpm
and not going back.

Some tips I would like to give you,

  • use a optcacher like eaccelerator, apc or xcache, you can win about 8
    times performance.

Yup we are using eaccelerator.

  • benchmark what will be the ideal number of php-cgi processes for your
    application, too much will lead to unnecessary context switches which
    costs performance, waits for backend connections etc. Not enough
    processes will lead to wait times between nginx and php fastcgi. For our
    application I spawn about 5 php-cgi’s per cpu core.

so with this calculation on a 2xquadcore box there will be 30 php-cgi
processes.
Would it not mean that on a 32 core be better to have all these
processes
use differnt cores?

  • If nginx and php-fpm run on the same host use unix sockets, these are
    slightly faster than tcp sockets.

At the moment the nginx and php-cgi share the host but not in future.

In future nginx run on its own and php server on their own.

The php servers run php-cgi+fpm and that only.

Ngiinx servers serve some localy static content and some files from
cache.
For the rest they redirect to generic php backend.

  • cache complex sql queries in memcached or in shared memory (all above
    optcode caches provide api’s for it).

Yes, we heavily use caching but not using memcached (did not work for
us) or
shared memory as the cache must persist between all hosts.
We are cms provider and each cms instance runs from its own db and under
its
own name. so caching a “select * from users” would need to be done with
on
eah domain. Anyway, on the caching side we are fine.

  • Tune your nginx config accordantly, we also serve static content from
    the same webservers. We set totally different headers like expire etc,
    to save bandwidth and cpu cycles which are matched by location regexes.

We are separating static content to localy static (css for website1.com)
and
globaly static (a javscript file that is used on all websites).
for the globaly static we use yet other nginx servers.

Don’t know if all the rules apply in your environment but see what you
can use from the above tips :slight_smile:

All the tips are very useful.

On Jan 24, Atif G. wrote:

files and smarty generated template in the RAM.
With the low prices of RAM it wont make a hole in the budget neither.

Oh and for page views we have around 0.5M at the moment but this is growing
quiet fast.

Atif, I’d suggest you do not go about buying beefier machines without
strong justifications.

RAM scaling

I’ve worked with many websites that do millions of requests a day in
apache+php and am yet to see any application that needs even 32GB of
RAM. When you are trying to consolidate web servers, the following math
does not hold true in general:
‘x’ machines with ‘y’ amount of RAM needs to be replaced by a single
machine with x*y amount of memory.

In a well designed web application, large portions of memory is used by
shared resources (code segments, shared cache etc. etc.) The amount of
memory required for actual connection handling is very lowThis is all
the more true in the case of nginx as compared to apache. (not even
close to 1MB per request). So, memory needs to be scaled less than
linearly when you consolidate servers. The only exception is when both
of the following conditions hold true:

  1. Large amounts of memory is used purely for application caching
  2. There is some sort of a well defined partitioning of cached data
    across nodes to ensure that the overlap is minimal

Where nginx should make a huge difference over apache: If your php
application is I/O bound and a large number of connections need to be
held in waiting state, then nginx should help in better memory and cpu
usage.

CPU scaling

As you might know, adding more CPUs scales less than lineraly. Much of
the problems arise from lock contentions. So unless you know that your
entire application stack is known to scale well on a large number of
cores, do not try to use something like a 64 core machine; the results
might be devastating

Jure and Marlon,

Here are some more informations which might give some more insights.

The current system looks like this

16 x single-core AMD based system with 2GB RAM.
Each system is running nginx + php-cgi-fpm (10 instances of)
Here is a graph of a typical day’s load on CPU

Here is a weekly graph which shows almost no CPU activity for half a day
on
21st.

The only thing I changed was redirected the php requests to a dedicated
server.

And here you can see the graph on one server on 21st. between 00:00 and
15:00

It seems like the server was doing nothing but was getting the same
amount
of requests.
So from this I know that moving the php-cgi to dedicate host will allow
me
to better scale the system.

Now about the php-processes.
I have a memory log for each php-request and it takes between 0.5MB to
2.5MB (this is after the opcache eaccelerator)

But this is now. I want to scale the system to be tranquille for the
next 2
years.
So would a 2 - 4 of these servers (
Hardware | Oracle ) with 32GB RAM be
overkill

Would these more more suitable
2-4 Hardware | Oracle with 8 cores and 32-64GB RAM.

From price level I think they will come to more or less the same.
I like the idea to take some extra RAM and put the tmp files, cached
opcode
files and smarty generated template in the RAM.
With the low prices of RAM it wont make a hole in the budget neither.

Oh and for page views we have around 0.5M at the moment but this is
growing
quiet fast.

Thanks and keep the ideas coming.

best regards
Atif

Arvind,

Thanks for your helpful suggestions.

I will start doing tests with different class of machines and pick the
ones
that work best.

I will report to the list of my results.
best regards

On Jan 24, Atif G. wrote:

files and smarty generated template in the RAM.
With the low prices of RAM it wont make a hole in the budget neither.

Oh and for page views we have around 0.5M at the moment but this is growing
quiet fast.

Atif, I’d suggest you do not go about buying beefier machines without
strong justifications.

RAM scaling

I’ve worked with many websites that do millions of requests a day in
apache+php and am yet to see any application that needs even 32GB of
RAM. When you are trying to consolidate web servers, the following math
does not hold true in general:
‘x’ machines with ‘y’ amount of RAM needs to be replaced by a single
machine with x*y amount of memory.

In a well designed web application, large portions of memory is used by
shared resources (code segments, shared cache etc. etc.) The amount of
memory required for actual connection handling is very lowThis is all
the more true in the case of nginx as compared to apache. (not even
close to 1MB per request). So, memory needs to be scaled less than
linearly when you consolidate servers. The only exception is when both
of the following conditions hold true:

  1. Large amounts of memory is used purely for application caching
  2. There is some sort of a well defined partitioning of cached data
    across nodes to ensure that the overlap is minimal

Where nginx should make a huge difference over apache: If your php
application is I/O bound and a large number of connections need to be
held in waiting state, then nginx should help in better memory and cpu
usage.

CPU scaling

As you might know, adding more CPUs scales less than lineraly. Much of
the problems arise from lock contentions. So unless you know that your
entire application stack is known to scale well on a large number of
cores, do not try to use something like a 16 core machine.

On Fri, Jan 23, 2009 at 4:46 PM, Marlon de Boer [email protected] wrote:

I tested a cool-thread t2250 with 64 threads from sun a couple of weeks
ago. My conclusion is that for our php application was that one thread
wasn’t powerful enough to serve a php page fast enough. So in our case
we would end up with a lot of parallel but slower processes. Our current
x86_64 hardware could deliver the pages about 2 secs faster per php-cgi
process.

Marlon,
I have just finished testing on the T5210 with 64 threads and have come
to
the same conculsion as you.
thanks for the correct advice. I had to try it out myself though.

best regards