Horizontal scaling - advice needed


#1

Hi folks,

I posted this in railsweenie, but it’s not really a rails question, and
the
post isn;t getting any answers.

I’m not an experienced web developer. I’m still rather groping around. I
come from a client server background. I’ve plumped for Rails, although
this
post’s focus isn’t really Rails.

I’m putting together a relatively simple site which I want to design
from
the ground up for horizontal scalability, partly for the challenge,
partly
because I need to learn and get experience. To help me do this I am
going to
run at least two virtual machines to enforce the correct environment.

Currently my idea is to federate the data so that the users are divided
between the 2 or more machines, perhaps splitting alphabetically by user
name (ie. A-G to machine 1, etc). Where there is interaction between
account
holders I am thinking of Drb’ing. Obviously rails is not going to be
able to
do the interaction side of things, but I am fine with that; I’m prepared
for
a bit of manual labour.

I would love comments/advice on my above ideas, and further insights
into
horizontal scaling.

But also…

To facilitate the above I need some kind of proxy in front of the two
machines directing incoming requests to the correct machine based on the
login name which will be part of the url. Here I come unstuck since I
have
no idea how to do this.

There must be proxies of this kind, but I’ll be blowed if I know what an
appropriate one would be, or where to start in making it do what I want.

Can anyone give me a few pointers? Is squid the thing? Mongrel (I don’t
really know what mongrel is)? Can apache be made to do this, and if so
is it
a bad idea? Obviously it needs to be pluggable since I’ll be using my
own
code (C or Pascal) to do the lookups for the redirection.

Thanks for your words of wisdom,

Greg


#2

On 07.03.2007 18:35, Greg Loriman wrote:

the ground up for horizontal scalability, partly for the challenge, partly
because I need to learn and get experience. To help me do this I am going to
run at least two virtual machines to enforce the correct environment.

Currently my idea is to federate the data so that the users are divided
between the 2 or more machines, perhaps splitting alphabetically by user
name (ie. A-G to machine 1, etc). Where there is interaction between account
holders I am thinking of Drb’ing. Obviously rails is not going to be able to
do the interaction side of things, but I am fine with that; I’m prepared for
a bit of manual labour.

I am not sure whether I get your interaction correctly. What types of
things would have to be dealt with between users that are not done
through persistent state?

There must be proxies of this kind, but I’ll be blowed if I know what an
appropriate one would be, or where to start in making it do what I want.

Can anyone give me a few pointers? Is squid the thing? Mongrel (I don’t
really know what mongrel is)? Can apache be made to do this, and if so is it
a bad idea? Obviously it needs to be pluggable since I’ll be using my own
code (C or Pascal) to do the lookups for the redirection.

Thanks for your words of wisdom,

Just a quickie as I’m on my way out: your Drbing will certainly hurt
horizontal scalability - apart from the issue of finding instances etc.
If possible, you should build you app in a way that it does not need
this. Ideally you create it so that each HTTP request can be satisfied
by communicating with the backend store (database) only.

It’s probably also ok to assume some session stickiness as load
balancing routers can do that (for example based on IP) and this seems a
fairly common scenario. If not, you need some mechanism to make session
information available to all app servers (either via the backend store
or via some other mechanism).

Kind regards

robert


#3

Greg Loriman wrote:

Hi folks,

I posted this in railsweenie, but it’s not really a rails question, and the
post isn;t getting any answers.

Maybe try: http://www.ruby-forum.com/forum/3

I would love comments/advice on my above ideas, and further insights into
horizontal scaling.
[…]
Can anyone give me a few pointers? Is squid the thing? Mongrel (I don’t
really know what mongrel is)? Can apache be made to do this, and if so is it
a bad idea? Obviously it needs to be pluggable since I’ll be using my own
code (C or Pascal) to do the lookups for the redirection.

One approach that works quite well is to use Apache 2.2 with
mod_proxy_balancer in front of a cluster of Mongrel processes while
keeping session state in the database. That should scale horizontally
quite nicely until the database becomes the bottleneck - by then, I
expect you’d have a pretty respectable amount of volume. I have no idea
how much traffic it would take to swamp MySQL on a beefed up server with
super fast disks, but my guess is it’s enough to allow you to pay
someone to cluster MySQL :slight_smile:

http://mongrel.rubyforge.org/

http://blog.codahale.com/2006/06/19/time-for-a-grown-up-server-rails-mongrel-apache-capistrano-and-you

http://www.simplisticcomplexity.com/2006/8/13/apache-2-2-mod_proxy_balancer-mongrel-on-ubuntu-6-06

I’ve been down the road of traditional clustering with appservers
sending messages to each other furiously duplicating state, and also
with session affinity to avoid some of the duplication, but presently I
feel the “shared nothing” architecture is best for me. It has some
successful precedences with high-volume sites, and you basically get it
for free with Ruby/Rails/Apache/Mongrel.


#4

On Thu, 8 Mar 2007, Greg Loriman wrote:

a bit of manual labour.
Drb would rapidly become a bottleneck there. And explicitly federating
your data like that seems needlessly complicated.

Put a fast proxy of some sort in front of your backend processes. When
you need more throughput, add another machine and some more processes.

To facilitate the above I need some kind of proxy in front of the two
machines directing incoming requests to the correct machine based on the
login name which will be part of the url. Here I come unstuck since I have
no idea how to do this.

There must be proxies of this kind, but I’ll be blowed if I know what an
appropriate one would be, or where to start in making it do what I want.

If I were doing something that required some very specific proxying
behavior that I couldn’t get with an off-the-shelf solution (HAProxy is
a
very nice general purpose proxy with a ton of features), I’d write a
purpose built proxy. It’s not really too hard to do.

Kirk H.


#5

Drb would rapidly become a bottleneck there.

I’ve not seen any articles on the characteristics and limitations of
Drb but it isn’t immediately apparent to me that it should be a
bottleneck. Why do you think it would be a problem?

I can only see that in the longer term there might be a network
traffic problem if I had a lot of machines. I haven’t worked it out
yet but perhaps the growth in inter-machine traffic would not be
linear.

And explicitly federating
your data like that seems needlessly complicated.

I would agree except that the application is simple while at the same
time such federation is the ultimate in horizontal scaling AFAIK, and
that is the exercise I would like to embark on. I want to find out the
ins and outs, costs and difficulties.

Put a fast proxy of some sort in front of your backend processes. When
you need more throughput, add another machine and some more processes.

Eventually the database will be the bottleneck, hence federation.

There must be proxies of this kind, but I’ll be blowed if I know what an
appropriate one would be, or where to start in making it do what I want.

If I were doing something that required some very specific proxying
behavior that I couldn’t get with an off-the-shelf solution (HAProxy is a
very nice general purpose proxy with a ton of features), I’d write a
purpose built proxy. It’s not really too hard to do.

I’ll keep that in mind. I hope you are right. But I’ve always had the
idea that tcp/ip is a somewhat painful affair, and I suspect one would
have to drop to that level to get adequate efficiency/speed to act as
a router/redirector of incoming requests.

thanks for the words.


#6

I am not sure whether I get your interaction correctly. What types of
things would have to be dealt with between users that are not done
through persistent state?

By interaction I just mean recording user relationships, like when one
user records another user as a friend on slashdot. They’ll be a
‘friend’ table with two foriegn keys.

In my naive idea of things each machine would be equivalent to the
others. They would each have Webserver->Rails->Database. Interactions
between users means recording a new relationship between two users,
normally quite straightforward on a single database, but requiring
inter-machine communication and a certain amount of fiddling about
where the user accounts are on different machines/database-backends.
Two-phase commmit comes to mind, but I intend to work around the lack
of that.

In other words I’m shifting the scalability problem to the network
(routing, switches etc). I may also address the networking problem, in
the distant future, by migrating accounts based on usage patterns so
that user “interactions” will tend to be local to one machine.
Obviously that is a strategy with some interesting problems to be
solved, as you can probably immediately guess at, which I am looking
forward to.

Thanks for your words of wisdom,

Just a quickie as I’m on my way out: your Drbing will certainly hurt
horizontal scalability - apart from the issue of finding instances etc.

Do you think that my answer above addresses that?

It’s probably also ok to assume some session stickiness as load
balancing routers can do that (for example based on IP) and this seems a
fairly common scenario. If not, you need some mechanism to make session
information available to all app servers (either via the backend store
or via some other mechanism).

Definately. I have had this in mind. Perhaps ultimately I would end up
with two (or more) domain (as in data) databases, one session
database, one proxy-redirection database, and the proxy-redirector
itself. And one day each running on their own machines. Right now I am
imagining several vmware instances to allow for developement.


#7

I’ve been down the road of traditional clustering with appservers
sending messages to each other furiously duplicating state, and also
with session affinity to avoid some of the duplication, but presently I
feel the “shared nothing” architecture is best for me. It has some
successful precedences with high-volume sites, and you basically get it
for free with Ruby/Rails/Apache/Mongrel.

I would agree in general, but as you mentioned earlier eventually
something is going to become a bottleneck if user growth is enourmous.
I expressely wish to explore the clustered appserver side of things
since it does ultimately scale endlessly. This particular app is
simple enough for me to consider trying my had at it.

So it is interesting to read your comments about “furious duplication”
and session affinity. Were there other notable features to such a
system, and to the developemnt of it?

Greg


#8

On Thu, 8 Mar 2007, Greg L. wrote:

Drb would rapidly become a bottleneck there.

I’ve not seen any articles on the characteristics and limitations of
Drb but it isn’t immediately apparent to me that it should be a
bottleneck. Why do you think it would be a problem?

Sure. Because while DRb is terribly cool and useful for a lot of
things,
it’s not particularly fast. Even if everything that you do with it is
very lightweight, it is not hard for it to become a bottleneck when
talking about heavy, large scale usage. But, benchmark it yourself for
the use case you envision and see if it’ll move the volume of traffic
that
you would need it to.

I would agree except that the application is simple while at the same
time such federation is the ultimate in horizontal scaling AFAIK, and
that is the exercise I would like to embark on. I want to find out the
ins and outs, costs and difficulties.

Yeah, but you should be able to scale to a very large amount of
traffic
before having to even consider adding in the complication of federating
your data based on some criteria.

Eventually the database will be the bottleneck, hence federation.

Again, though, you’ll have to scale very, very large before you ever
have
to worry about the db being a bottleneck, provided you pay attention to
schema design and proper db tuning.

I’ll keep that in mind. I hope you are right. But I’ve always had the
idea that tcp/ip is a somewhat painful affair, and I suspect one would
have to drop to that level to get adequate efficiency/speed to act as
a router/redirector of incoming requests.

It’s not that hard, even writing it in a language like Ruby. Over the
last two weeks I have implemented a very fast, purpose built proxy for
clustering web apps. It’s not a big, general purpose tool like HAProxy.
It’s narrowly targetted, and for it’s narrow purpose, it is comparable
in
speed to HAProxy (faster as of the last tests this weekend, but I am
adding some code that will cut into that a bit).

Kirk H.


#9

Greg L. wrote:

since it does ultimately scale endlessly. This particular app is
simple enough for me to consider trying my had at it.

So it is interesting to read your comments about “furious duplication”
and session affinity. Were there other notable features to such a
system, and to the developemnt of it?

Probably the most “notable” feature was that I really do want to do it
again :slight_smile: I don’t know of any highly scalable architectures that use that
type of approach - they could very well exist, I just don’t know of
them. On the other hand, I know of several using the “share nothing”
approach.

It might be enlightening for you to perform some tests to determine what
type of load is required to cause the database to become the limiting
factor in your growth. It may be premature for you to worry about it. It
might be cheaper for you to simply purchase a faster database server.

This should do:
http://www.sun.com/servers/highend/sunfire_e25k/index.xml

Or even something more attainable:
http://www.dell.com/content/products/productdetails.aspx/pedge_6800?c=us&l=en&s=bsd&cs=04

If we ignore the fact that you most likely can’t create a workload large
enough to swamp a fast database server (and if you can, you’ll have no
trouble hiring some brilliant performance freaks to handle the mundane
scaling task while you do the fun stuff), you may want to consider other
ways of reducing the limiting factor of the database besides
partitioning your data - maybe one database instance for insert/updates
with replication to several read/only database instances for example.

Google for livejournal architecture - there is some interesting info
regarding how they scaled.


#10

On Mar 7, 2007, at 3:41 PM, removed_email_address@domain.invalid wrote:

Eventually the database will be the bottleneck, hence federation.

Again, though, you’ll have to scale very, very large before you
ever have to worry about the db being a bottleneck, provided you
pay attention to schema design and proper db tuning.

I think that long before the database becomes a performance
bottleneck you’ll be worried about it being a single point of failure.

Redundancy and availability issues are tricky even for low-volume
applications.

Gary W.


#11

On Thu, Mar 08, 2007 at 02:40:07AM +0900, Greg Loriman wrote:

a bit of manual labour.

I would love comments/advice on my above ideas, and further insights into
horizontal scaling.

Well: my advice is that the sort of “loose federation” you describe is
something which is very difficult to build. You can make it work for,
say, a
proxy-based POP3/IMAP mail cluster: here the protocol is
straightforward,
the session can be unambiguously proxied to the right backend server,
and
there is no interaction between accounts. (When you start using IMAP
shared
folders, this breaks down)

However even in such a scenario, you don’t have resilience. If you lose
the
machine where the A to G accounts are stored, then all those users lose
their mail. So in fact each backend machine has to be a mini-cluster, or
at
least, have mirrored disks and a warm spare machine to plug them into
when
disaster strikes.

Many people have resilience as high, or higher, on their agenda than
performance. So this doesn’t sound like a good way to go.

My advice would be:

  1. Keep your database in one place, so that all the front-ends have
    shared
    access to the same data at all times.

To start with have a single database machine. Then expand this to a
2-machine database cluster. You can then point 2, 3, 4 or more
front-ends at
this cluster; for many applications you may find that you won’t need to
scale the database until later.

(Note that regardless of whether your application ends is heavier on
front-end CPU or back-end database resources, scaling the frontends and
the
database cluster separately makes it much easier to monitor resource
utilisation and scale each part as necessary)

The easiest way to do database clustering is with a master-slave
arrangement: do all your updates on the master, and let these replicate
out
to the slaves, where read-only queries take place. Of course, this isn’t
good enough for all applications, but for others it’s fine.

Full database clustering is challenging, but if your site is making you
lots
of money you can always throw an Oracle 10g grid at it. If you’re
seriously
thinking of that route, you can start with Oracle on day one; it is now
free
for a single processor with up to 1GB of RAM and 4GB of table space.

  1. For transient session state, assuming your session objects aren’t
    enormous, use DRb to start with. Point all your front-ends at the same
    DRb
    server. DRb is remarkably fast for what it does, since all the
    marshalling
    is done in C.

When you outgrow that, go to memcached instead. This is actually not
hard to
set up: you just run a memcached process on each server. The session
data is
automatically distributed between the nodes.

Both cases aren’t totally bombproof: if you lose a node, you’ll lose
some
session data. Either put important session data in the database, or
build a
bombproof memcached server [boots from flash, no hard drive, fanless]

If that’s not important, then you don’t need a separate memcached
server. If
you have N webapp frontends, then just run memcached on each of them.

To facilitate the above I need some kind of proxy in front of the two
machines directing incoming requests to the correct machine based on the
login name which will be part of the url. Here I come unstuck since I have
no idea how to do this.

Well, the traditional approach is to buy a web loadbalancing appliance
(or
resilient pair of them), and configure it for “sticky” load balancing
based
on a session cookie or some other attribute in the URL.

Hardware appliances are generally good. They are reliable over time;
there’s
much less to go wrong than a PC. They do a single job well.

You could instead decide to use a recent version of Apache with
mod_proxy to
do the proxying for you.

But it may be better to design your app with a single shared database
and a
single shared session store, such that it actually doesn’t matter where
each
request arrives.

Can anyone give me a few pointers? Is squid the thing? Mongrel (I don’t
really know what mongrel is)? Can apache be made to do this, and if so is it
a bad idea? Obviously it needs to be pluggable since I’ll be using my own
code (C or Pascal) to do the lookups for the redirection.

mod_proxy with mod_rewrite is “pluggable” in the way you describe. See
http://httpd.apache.org/docs/2.2/misc/rewriteguide.html
and skip to the section headed “Proxy Throughput Round-Robin”.

You’d use an External Rewriting Program (map type prg) to choose which
back-end server to redirect to. The example above is written in Perl,
but
the same is equally possible in Ruby, C, Pascal or whatever.

However, if you don’t know anything about Apache, this is certainly not
where I’d recommend you start.

squid is a proxy cache. You can use it to accelerate static content, but
it
won’t help you much with dynamic pages from Rails. Mongrel is a
webserver
written in Ruby, much as webrick is, although is apparently more
efficient.

In summary I’d say start your design with the KISS principle:

  • one database; scale it horizontally (by database clustering) when
    needed

  • one global session store; scale it horizontally when needed

  • one frontend application server; scale horizontally when needed

In addition to that, consider:

  • serve your static HTML, images and CSS from a fast webserver
    (e.g. apache, lighttpd). This is easy to arrange.

  • consider serving your Rails application from the same webserver using
    fastcgi (e.g. Apache mod_fcgid), rather than a Ruby webserver like
    mongrel or webrick. Harder to set up, but you can migrate to this
    later.
    Then most HTTP protocol handling is being done in C.

  • profile your application carefully to find out where the bottlenecks
    are,
    before you throw hardware at performance problems.

HTH,

Brian.


#12

On Mar 7, 2007, at 09:40, Greg Loriman wrote:

I’m putting together a relatively simple site which I want to
design from
the ground up for horizontal scalability, partly for the challenge,
partly
because I need to learn and get experience. To help me do this I am
going to
run at least two virtual machines to enforce the correct environment.

The set of problems you’ll run into by trying to plan for scaling by
adding artificial constraints will not overlap the problems you’ll
really encounter if you want to scale.

Scaling is done by making the stuff that is objectively slow fast,
not by planning for what you think might possibly end up being slow
and guessing how you can make those things fast.

a bit of manual labour.
Use a database. You can run at least a tens of thousands of users on
good hardware without breaking a sweat. When that DB starts
sweating, get better hardware. If you’re lucky enough to be really
successful then you might need two machines. In that case you can
hire somebody who knows what they’re doing.

I have
no idea how to do this.

There must be proxies of this kind, but I’ll be blowed if I know
what an
appropriate one would be, or where to start in making it do what I
want.

Don’t bother. You don’t need them until your hardware proves
unreliable, or one machine can’t handle the load. Investigate then.

Can anyone give me a few pointers? Is squid the thing? Mongrel (I
don’t
really know what mongrel is)? Can apache be made to do this, and if
so is it
a bad idea? Obviously it needs to be pluggable since I’ll be using
my own
code (C or Pascal) to do the lookups for the redirection.

Stop trying to plan for the unknowable and write the software to be
simple. When you find something slow, optimize that. Then repeat.


#13

On 07.03.2007 21:04, Greg L. wrote:

between users means recording a new relationship between two users,
Obviously that is a strategy with some interesting problems to be
solved, as you can probably immediately guess at, which I am looking
forward to.

Don’t do that, i.e. don’t work with multiple data stores because there
are solutions out there that deal with all the replication and
consistency issues (if you need it) - even in open source land. It’s
not worth reinventing that wheel because it’s difficult to get right
and fast.

As others have suggested, use one datastore. Start out with a single DB
server (or a tandem if you need failover) and scale later. Your options
depend on the DB vendor you are going to use so you should explore
scalability options before you start (Oracle has it, MS SQL Server has
it and MySQL has it also AFAIK, don’t use MaxDB).

Btw, Oracle can scale pretty good when using Shared Server - of course
this depends on the workload. See
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/process.htm#sthref1644

When I wanted to do something like you and were targeting at a high
profile commercial application I would go with Oracle. It has it’s
drawbacks but it has a prove track record for stability, it’s a lot more
flexible and tailored to deal with large volumes than other products
(which at the same time also means more complex). DB2 seems also a
likely candidate for large installations but I only have brief
experience with that and I did not like some things I saw. That may
have changed though.

or via some other mechanism).

Definately. I have had this in mind. Perhaps ultimately I would end up
with two (or more) domain (as in data) databases, one session
database, one proxy-redirection database, and the proxy-redirector
itself. And one day each running on their own machines. Right now I am
imagining several vmware instances to allow for developement.

I’d rather have the load balancing in the router. Routers are fast and
built do to the job; creating a proprietary proxy for distribution based
on some custom criteria is almost guaranteed to be slower than a router.
(Even if the router is a Linux box, dunno whether such solutions
exist).

Btw, if you are willing to pay for robustness you can also create a
solution with VMWare ESX where a number of VM’s can be served by a pool
of physical machines; if one machine fails others can take over VM’s.
Sounds cool but is not cheap. :slight_smile:

Kind regards

robert


#14

On Thu, Mar 08, 2007 at 05:05:07AM +0900, Greg L. wrote:

By interaction I just mean recording user relationships, like when one
user records another user as a friend on slashdot. They’ll be a
‘friend’ table with two foriegn keys.

In my naive idea of things each machine would be equivalent to the
others. They would each have Webserver->Rails->Database. Interactions
between users means recording a new relationship between two users,
normally quite straightforward on a single database, but requiring
inter-machine communication and a certain amount of fiddling about
where the user accounts are on different machines/database-backends.

And very hard (if not impossible) to enforce referential integrity on,
since
user A would need to have a ‘foreign key’ entry pointing to user X in a
different table on a different machine.

You could solve this by each machine having at least a complete table of
all
users on the system (which could be a read-only copy of a single master
table, if addition of new users is relatively infrequent compared to
reading
data about users). But if you’re going to do that, I’d argue you might
as
well replicate all the data.

Also: imagine trying to make use of this ‘federated’ data. Say user A
(on
machine 1) has a relationship with user H (on machine 2), and user H has
a
relationship with user X (on machine 3). These are three tables in three
separate databases. Now try doing a join across these to find out who is
related two hops away. Ouch. This will become a very expensive
operation,
requiring a series of separate queries against each of the databases. If
you
had a single, central database, properly indexed, it would be a
relatively
cheap single query.

Hence I think the federated approach may turn out to be a false economy
anyway.

Regards,

Brian.


#15

SOrry to rain on your parade. You won’t learn about horizontal
scalability by building a proof of concept application using a single
architecture

My guess is their are a few ways to learn this- through experience or
through reading and neither require Ruby (though Ruby isn’t apriori
bad for this) There’s a nice 1000page book called “Distributed
Systems” by Coulouris but in 2007 you’re possibly well served using
wikipedia as a jumping off point and learning the following concepts:

Amdahl’s Law, cache coherency, distributed computing, Distributed
Shared Memory, MPP, Multiprocessing, Multitasking, NUMA,
parallelism, shared nothing, SMP, virtual synchrony.

If you are using VMs why stop at two?

Peter

On Mar 7, 2007, at 12:40 PM, Greg Loriman wrote:

post’s focus isn’t really Rails.
divided
I would love comments/advice on my above ideas, and further
no idea how to do this.
a bad idea? Obviously it needs to be pluggable since I’ll be using
my own
code (C or Pascal) to do the lookups for the redirection.

Thanks for your words of wisdom,

Greg

Peter B.
removed_email_address@domain.invalid
917 445 5663


#16

On Mar 7, 10:24 pm, Brian C. removed_email_address@domain.invalid wrote:

  1. For transient session state […]

AFAIk edge rails does client side sessions (signed cookie) by default.
Also an option that could be considered.

Isak