A hardware question

mike · April 27, 2008, 9:44am

I ordered some servers from my provider - they were supposed to be 3
identical ones. However, they gave me 3 different ones all with varied
processors.

a) Single Processor Dual Core Xeon 5160 - 3.00GHz (Woodcrest) - 1 x 4MB
cache

b) Single Processor Quad Core Xeon 5310 - 1.60GHz (Clovertown) - 1 x
8MB cache (low voltage one)

c) Single Processor Quad Core Xeon 5335 - 2.00GHz (Clovertown) - 1 x
8MB cache (low voltage one)

all a single SATA2 disk, all 2G RAM.

Which one would be best for nginx + php/fastcgi? I know there is a
debate on more cores vs. faster clock speed, and certain apps take
advantage of it better. I think nginx will use such little resources
that a dual-core higher clock speed would be better…

There are a handful of FastCGI pools running under different uid/gids,
so the processes can bounce around to different cores often, and nginx
can have 4 workers each one bound to a single core. So perhaps having
4 individual cores able to service instead of 2 faster ones is better?

I plan on growing up to 2 million php requests per day (per machine)
and maybe 5 or 6 million static files (video, image, HTML, etc)

Just wondering what you guys would prefer?

mike · April 27, 2008, 10:30am

The single SATA disk will be your bottleneck, followed by system time.
I’d reserve the machines with the largest physical cache for php, its
where you will see the biggest payoff.

IMHO

Dave

mike · April 27, 2008, 11:08am

physical cache being L2/L3 cache type stuff?

yeah i shouldn’t be hitting the SATA bottleneck. right now most is
served via NFS, hopefully migrating it to local disk for
scripts/templates, and for data using mogilefs (which will run on
these same machines)

mike · April 27, 2008, 11:19am

As Igor suggested serving files directly from NFS will cause the
workers to stall. You should be able to compensate by using more
workers, perhaps 2 or 3 per physical CPU but it depends heavily on the
setup of your NFS server, the network in between, etc.

mike · April 27, 2008, 10:44pm

Understood. I don’t -want- to use NFS, but nobody else has given me
any other options. I tried iSCSI+OCFS2, and that had some odd issues
and I am not sure it was reliable enough for a low-latency web
environment with millions of files.

I’m pretty OCD, I’d like all my machines to match, and I have the
ability right now to get them synced up before I start using them.

Also, would FreeBSD or Linux be better for the dual or quad core? Last
answer I got was nginx probably works better under FBSD. NFS works
better under FBSD too. My NFS server is already FBSD…

mike · April 27, 2008, 10:52pm

Am 27.04.2008 um 22:35 schrieb mike:

better under FBSD too. My NFS server is already FBSD…
We are using AFS (Andrew File System), which can have big caches on
every machine and it performs very well, although it was a little bit
problematic to setup. There are now 60GB of data in the AFS, but only
a fraction is daily served.

HTH,

__Janko

mike · April 28, 2008, 6:49am

On Sun, Apr 27, 2008 at 01:35:06PM -0700, mike wrote:

Understood. I don’t -want- to use NFS, but nobody else has given me
any other options. I tried iSCSI+OCFS2, and that had some odd issues
and I am not sure it was reliable enough for a low-latency web
environment with millions of files.

We use the proxying in this case instead of NFS:

client > nginx (1) > nginx

On nginx (1) it’s better to set “proxy_max_temp_file_size 0” for
the proxied location.

I’m pretty OCD, I’d like all my machines to match, and I have the
ability right now to get them synced up before I start using them.

Also, would FreeBSD or Linux be better for the dual or quad core? Last
answer I got was nginx probably works better under FBSD. NFS works
better under FBSD too. My NFS server is already FBSD…

Use OS that you know better. I think FreeBSD and Linux are both good for
nginx.

mike · April 28, 2008, 8:14am

On Sun, Apr 27, 2008 at 10:52:59PM -0700, mike wrote:

In your opinion, would you go with lower clock speed quad cores, or
higher clock speed dual cores?

From my expirience nginx needs more CPU in 3 cases:

nginx does many gzipping,
nginx handles many SSL connections,
kernel processes a lot of TCP connections.
“A lot” means about at least 3,000 requests/s.

I would use quad core for PHP and dual core for nginx.

mike · April 28, 2008, 12:16pm

Igor S. wrote:

We use the proxying in this case instead of NFS:

client > nginx (1) > nginx

On nginx (1) it’s better to set “proxy_max_temp_file_size 0” for
the proxied location.

You recommend to use proxying for mass file serving. I am in this case,
I use nginx and a specific module that I have coded. (to simplify
things, it control access files and redirect client to
ngx_http_internal_redirect)

I have think to use NFS for serve the files. I don’t understand how to
use proxying instead of that.

mike · April 28, 2008, 8:00am

Okay.

In your opinion, would you go with lower clock speed quad cores, or
higher clock speed dual cores?

Right now I’m doing the same thing with nginx (1) -> nginx myself. I
might switch back to something else so I can do healthchecks and
remove servers from the pool.

I’ll probably stick with Linux for the clients as I am more
comfortable with those. I won’t need to do much work on the NFS server
so that’s fine keeping it FBSD…

I’d really appreciate your opinion though on the hardware.

mike · April 28, 2008, 8:57pm

I am actually thinking about making a MogileFS passthrough

(I posted an email about it, looks like someone in Japan figured it out)

Would allow for minimal PHP/perl/python/whatever language to be
involved, just enough to look up where a file is. Then pass that back
to nginx with “file is on $server” and then use X-Accel-Redirect to
bounce to that server…

Haven’t done it yet, but I think it would work quite amazing!

mike · April 28, 2008, 4:21pm

On Mon, Apr 28, 2008 at 12:16:59PM +0200, Chavelle V. wrote:

I use nginx and a specific module that I have coded. (to simplify
things, it control access files and redirect client to
ngx_http_internal_redirect)

I have think to use NFS for serve the files. I don’t understand how to
use proxying instead of that.

Set up nginx on NFS server and proxy to it instead of getting files via
NFS:

    location /files/ {
         proxy_pass                 http://nfs_server_nginx;
         proxy_max_temp_file_size   0;
    }

mike · April 28, 2008, 9:14pm

I know there are MogileFS client libraries for Perl/PHP/Ruby, but if
someone can code a C library for MogileFS and it is then possible to
make a Nginx module to support. I guess this will be more efficient
than going down the Perl/PHP/Ruby route.

Does it make sense at all? Would the benefits outweigh the extra
efforts to code the nginx module?

-Liang

kingler from 72pines

mike · April 30, 2008, 12:22am

If some enterprising C hacker wanted to work on this Engine Y.
would pay a bounty to make it happen as long as it is released open
source. Contact me if you are interested in working on this project.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

mike · April 30, 2008, 2:05am

I would also contribute a few hundred bucks if it was developed and
ran an open source license (so we could use and other people could
enhance, etc!)

mike · April 28, 2008, 9:36pm

I think it would be something worthwhile myself.

Just needs parameters - the tracker database info, and the domain
right? Then you just use the domain + key to get the file. Maybe
domain isn’t even required.

However, wouldn’t you need some sort of logic to determine how to
create the key?

Some people use an md5 of the URI I guess, so it would be something like

foo.com/media/abc72849482bc8398af98effd9g843

/media/ would be mapped to something like

option #1 would be nginx do the mysql connection itself.
option #2 would be nginx just connects to the tracker (this is
probably best) since it’s a simple socket connection

location /media/ {
mogilefs on;
mogilefs_domain images;

option #1
mogilefs_tracker_username mogilefs;
mogilefs_tracker_password foo;
mogilefs_tracker_host 192.168.1.203;
mogilefs_tracker_database mogilefs;

option #2
mogilefs_tracker_host 192.168.1.202:7001;

and perhaps something like mogilefs_key_style hash; ? or something…
(used to determine how the keys are actually setup)
}

I think the biggest missing piece of the puzzle is how to determine
the key structure. Otherwise I am sure it’s quite simple, and it uses
dav/HTTP to retrieve the files anyway, so the C code needed would be:

option #1:

mysql client
something which can do efficient HTTP GETs with offset support
(probably already in nginx)

option #2:

something that can do a simple socket connect, ask the tracker for
info, and then pass it on to step 2 (probably already in nginx)
something which can do efficient HTTP GETs with offset support
(probably already in nginx)

If I knew anything worthwhile in C I would try it myself. I am sure
someone even with basic C skills could hack it up really quick. It
just needs one socket to talk to the tracker, then depending on the
reply, open a second connection to grab the file…

mike · April 30, 2008, 4:35am

I’m thinking it would be quite easy, once the key hashing style was
determined.

i.e. how will it create the mogilefs key, and then how to emulate it
in the application when storing the file.

getting the file would be like:
get http://foo.com/somepath/foo.jpg

how to translate that to a:
domain + key in mogilefs

and when the file is being stored in mogilefs, how to create that key
so http://foo.com/somepath/foo.jpg matches up with it.

I’m going to see if I can crosspost to the mogilefs mailing list and
see if anyone there is interested in this and has any ideas.

I don’t think there is any reason this mod_mogilefs would need any
write support. I’d see application level stuff needing to be the place
where files are added to the mogilefs cluster. I see nginx interfacing
just to get the files being the main purpose.

I believe when the request comes in, nginx will check to see if that
location is mogilefs on; - if it is, then it does the key “hashing”
mechanism (TBD) there. Then it connects to the mogilefs tracker using
a simple socket connection and issues a “get_paths $key” which returns
a list of storage node(s) with the file. Then nginx would basically
mimic a standard GET through an HTTP upstream proxy to one of those
servers (or perhaps add them all in a list and try them in order, just
like it does today with proxy_next_upstream)

Since mogstoreds are just normal webservers with DAV support (can even
use nginx for that) it’s just a simple HTTP GET to get the file. No
scripting level language needed…

I think this would be a -very- simple thing to do. If one knows C,
that is. The biggest thing is how to hash the URL or URI in a
consistent fashion that can be done in PHP/Perl/Python/whatever and
later retrieved the same way through nginx.

mike · April 30, 2008, 10:31am

mike ha scritto:

[talking about a MogileFS module for Nginx]

foo.com/media/abc72849482bc8398af98effd9g843

/media/ would be mapped to something like

[…]

I think that the better solution is to extend the nginx upstream module
so that the list of backend servers (address, port and weight) can be
obtained at request runtime, and not only at configuration time.

This means that a module should only implement the logic to obtain the
backends list, and all the work can be done by Nginx.

However this assumes that on every backend, the URI to the file is the
same (but this should not be a problem, if Nginx is on each node).

Regards Manlio P.

mike · April 30, 2008, 11:17am

On 4/30/08, Manlio P. [email protected] wrote:

I think that the better solution is to extend the nginx upstream module so
that the list of backend servers (address, port and weight) can be obtained
at request runtime, and not only at configuration time.

How would you implement this so it doesn’t require interaction with an
application to do key lookup?

MogileFS stores the files in
http://mogstored-server:7501/dev1/0/000/000/0000000005.fid

Something has to get what the user’s typed in to the mogstored format

This means that a module should only implement the logic to obtain the
backends list, and all the work can be done by Nginx.

However this assumes that on every backend, the URI to the file is the same
(but this should not be a problem, if Nginx is on each node).

I believe it’s the same with mogile.

mike · April 30, 2008, 6:21am

Oh and it would be nice if there was an X-Accel-Mogilefs: or something
header where you passed it the file domain/key or something. That way
the apps can do application-level checks (like user
authentication/etc) and pass it off to nginx to allow it to fulfill
the request instead of throwing an access denied