Highwire, Ruby Load Balancer


#1

After stumbling across the post below, i was playing around with it, but
noticed if there was ever a bug in the app, the whole domain would
freeze up. Is there anyone else using this out there? I just tried
emailing him…we’ll see if support is still around…

Paul B. paul at paulbutcher.com
Wed Sep 20 06:18:53 EDT 2006

* Previous message: [Mongrel] [Slightly OT] Uploading files with

firefox
* Next message: [Mongrel] Why Rails + mongrel_cluster + load
balancing doesn’t work for us and the beginning of a solution
* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

We have been searching for a Rails deployment architecture which works
for
us for some time. We’ve recently moved from Apache 1.3 + FastCGI to
Apache
2.2 + mod_proxy_balancer + mongrel_cluster, and it’s a significant
improvement. But it still exhibits serious performance problems.

We have the beginnings of a fix that we would like to share.

To illustrate the problem, imagine a 2 element mongrel cluster running a
Rails app containing the following simple controller:

class HomeController < ApplicationController
def fast
sleep 1
render :text => “I’m fast”
end

def slow
  sleep 10
  render :text => "I'm slow"
end

end

and the following test app

#!/usr/bin/env ruby
require File.dirname(FILE) + ‘/config/boot’
require File.dirname(FILE) + ‘/config/environment’

end_time = 1.minute.from_now

fast_count = 0
slow_count = 0

fastthread = Thread.start do
while Time.now < end_time do
Net::HTTP.get ‘localhost’, ‘/home/fast’
fast_count += 1
end
end

slowthread = Thread.start do
while Time.now < end_time do
Net::HTTP.get ‘localhost’, ‘/home/slow’
slow_count += 1
end
end

fastthread.join
slowthread.join

puts “Fast: #{fast_count}”
puts “Slow: #{slow_count}”

In this scenario, there will be two requests outstanding at any time,
one
“fast” and one “slow”. You would expect approximately 60 fast and 6 slow
GETs to complete over the course of a minute. This is not what happens;
approximately 12 fast and 6 slow GETs complete per minute.

The reason is that mod_proxy_balancer assumes that it can send multiple
requests to each mongrel and fast requests end up waiting for slow
requests,
even if there is an idle mongrel server available.

We’ve experimented with various different configurations for
mod_proxy_balancer without successfully solving this issue. As far as we
can
tell, all other popular load balancers (Pound, Pen, balance) behave in
roughly the same way.

This is causing us real problems. Our user interface is very
time-sensitive.
For common user actions, a page refresh delay of more than a couple of
seconds is unacceptable. What we’re finding is that if we have (say) a
reporting page which takes 10 seconds to display (an entirely acceptable
delay for a rarely-used report) then our users are seeing similar delays
on
pages which should be virtually instantaneous (and would be, if their
requests were directed to idle servers). Worse, we’re occasionally
seeing
unnecessary timeouts because requests are queuing up on one server.

The real solution to the problem would be to remove Rails’ inability to
handle more than one thread. In the absence of that solution, however,
we’ve
implemented (in Ruby) what might be the world’s smallest load-balancer.
It
only ever sends a single request to each member of the cluster at a
time.
It’s called HighWire and is available on RubyForge (no Gem yet - it’s on
the
list of things to do!):

svn checkout svn://rubyforge.org/var/svn/highwire

Using this instead of mod_proxy_balancer, and running the same test
script
above, we see approximately 54 fast and 6 slow requests per minute.

HighWire is very young and has a way to go. It’s not had any serious
optimization or testing, and there are a bunch of things that need doing
before it can really be considered production ready. But it does work
for
us, and does produce a significant performance improvement.

Please check it out and let us know what you think.


#2

On 5/5/07, Eddy removed_email_address@domain.invalid wrote:
[…]

  sleep 10

end_time = 1.minute.from_now

puts “Fast: #{fast_count}”
puts “Slow: #{slow_count}”

Ok, first things first:

sleep is not good on “threaded” ruby applications. Long numbers froze
the whole VM, not just the thread involved.

Also, a rails app is locked inside a big mutex to solve issues around
thread-safety (better name it unsafety) of Rails. So, any incoming
connection that require been served by Rails dispatcher will bet into
the queue.

Most of the named “load-balancer” behave like that: round-robin
balancing. Even if you could weight them, they are strict on that and
not adapt well through time.

We’ve experimented with various different configurations for
mod_proxy_balancer without successfully solving this issue. As far as we
can
tell, all other popular load balancers (Pound, Pen, balance) behave in
roughly the same way.

From my point of view, they should learn about timmings from each
member of the cluster and recalculate the weight they could handle.

unnecessary timeouts because requests are queuing up on one server.

Maybe you could “switch” over a lightweight solution, that partially
cover these problems (Mongrel + erb, do a google for it) :wink:

The real solution to the problem would be to remove Rails’ inability to
handle more than one thread.

That is a real problem: A lot of parts of Rails lack aren’t
thread-safe, adapting them will require a huge amount of work, but I
agree is worth.

svn checkout svn://rubyforge.org/var/svn/highwire

Haven’t checked the code (yet), but sounds interesting. Also if there
will be a loading strategy, that could be configurable (maybe via
callbacks or something) that allow you change how loading will work.

Please check it out and let us know what you think.

Excellent news, thanks for sharing it with us.


Luis L.
Multimedia systems

Leaders are made, they are not born. They are made by hard effort,
which is the price which all of us must pay to achieve any goal that
is worthwhile.
Vince Lombardi


#3

btw, I’m not surprised highwire isn’t talked about more.

A response from the Highwire guy:

I’m afraid that we haven’t been using Highwire for a while now. That
doesn’t mean that the problem Highwire was designed to address
doesn’t still exist (it very definitely does), but we have decided to
follow a different solution. We have divided our mongrel cluster into
two halves - a “normal” cluster on which the common “fast” operations
take place and an “admin” cluster on which occasional “slow” things
take place. Very soon, we plan to take this further and move the
“slow” cluster onto an entirely separate server.

That means, I’m afraid, that we haven’t got a patch for the problem
you mention because we’ve not developed it any further. Having said
that, Highwire really couldn’t be simpler, so if you want to take on
creating a patch, or even ownership of the Highwire project, please
be my guest! Let me know if you’re interested.

BTW - you might be interested in a couple of blog articles we’ve
recently written on Ruby on Rails:

http://about.82ask.com/news/wizardry/


Paul B.
CTO
82ASK
Mobile: +44 (0) 7740 857648
Main: +44 (0) 1223 309080
Fax: +44(0) 1223 309082
Email: removed_email_address@domain.invalid
MSN: removed_email_address@domain.invalid
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: http://www.linkedin.com/in/paulbutcher

Luis L. wrote:

On 5/5/07, Eddy removed_email_address@domain.invalid wrote:
[…]

  sleep 10

end_time = 1.minute.from_now

puts “Fast: #{fast_count}”
puts “Slow: #{slow_count}”

Ok, first things first:

sleep is not good on “threaded” ruby applications. Long numbers froze
the whole VM, not just the thread involved.

Also, a rails app is locked inside a big mutex to solve issues around
thread-safety (better name it unsafety) of Rails. So, any incoming
connection that require been served by Rails dispatcher will bet into
the queue.

Most of the named “load-balancer” behave like that: round-robin
balancing. Even if you could weight them, they are strict on that and
not adapt well through time.

We’ve experimented with various different configurations for
mod_proxy_balancer without successfully solving this issue. As far as we
can
tell, all other popular load balancers (Pound, Pen, balance) behave in
roughly the same way.

From my point of view, they should learn about timmings from each
member of the cluster and recalculate the weight they could handle.

unnecessary timeouts because requests are queuing up on one server.

Maybe you could “switch” over a lightweight solution, that partially
cover these problems (Mongrel + erb, do a google for it) :wink:

The real solution to the problem would be to remove Rails’ inability to
handle more than one thread.

That is a real problem: A lot of parts of Rails lack aren’t
thread-safe, adapting them will require a huge amount of work, but I
agree is worth.

svn checkout svn://rubyforge.org/var/svn/highwire

Haven’t checked the code (yet), but sounds interesting. Also if there
will be a loading strategy, that could be configurable (maybe via
callbacks or something) that allow you change how loading will work.

Please check it out and let us know what you think.

Excellent news, thanks for sharing it with us.


Luis L.
Multimedia systems

Leaders are made, they are not born. They are made by hard effort,
which is the price which all of us must pay to achieve any goal that
is worthwhile.
Vince Lombardi


#4

Just a note for anyone else, like me, who kept hitting this forum entry
while googling around. We replaced highwire successfully with balance
(http://www.inlab.de/balanceng/).

Had to change one of the default build constants (MAXCHANNELS) from 16
to 32, but after which its’ working great against our pool of 10
mongrels.

Running as follows configures it to feed one request at a time round
robin t each free mongrel instance unless they’re all concurrently busy
at which point it switches to sending them round robin to all 10 a-la
mod_proxy_balancer.

/usr/local/bin/balance -M -b 127.0.0.1 7050 127.0.0.1:8000:1
127.0.0.1:8001:1 127.0.0.1:8002:1 127.0.0.1:8003:1 127.0.0.1:8004:1
127.0.0.1:8005:1 127.0.0.1:8006:1 127.0.0.1:8007:1 127.0.0.1:8008:1
127.0.0.1:8009:1 ! 127.0.0.1:8000 127.0.0.1:8001 127.0.0.1:8002
127.0.0.1:8003 127.0.0.1:8004 127.0.0.1:8005 127.0.0.1:8006
127.0.0.1:8007 127.0.0.1:8008 127.0.0.1:8009

So far its’ been working very well. Status from first afternoon of use
are as follows:

/usr/local/etc/rc.d/balance status

balance at 127.0.0.1:7050
GRP Type # S ip-address port c totalc maxc sent
rcvd
0 RR 0 ENA 127.0.0.1 8000 0 1972 1 1534959
17503343
0 RR 1 ENA 127.0.0.1 8001 0 2122 1 1604954
38727099
0 RR 2 ENA 127.0.0.1 8002 0 2220 1 6587106
21329912
0 RR 3 ENA 127.0.0.1 8003 1 1412 1 6868618
16225686
0 RR 4 ENA 127.0.0.1 8004 0 1952 1 2376831
21449204
0 RR 5 ENA 127.0.0.1 8005 1 1564 1 12380050
18871952
0 RR 6 ENA 127.0.0.1 8006 0 1894 1 5663904
21877505
0 RR 7 ENA 127.0.0.1 8007 0 2025 1 22239136
20035666
0 RR 8 ENA 127.0.0.1 8008 1 1787 1 8410442
21476800
0 RR 9 ENA 127.0.0.1 8009 0 1914 1 11913518
18254808
1 RR 0 ENA 127.0.0.1 8000 0 231 0 116116
1439980
1 RR 1 ENA 127.0.0.1 8001 0 232 0 150685
2026796
1 RR 2 ENA 127.0.0.1 8002 0 232 0 115881
638747
1 RR 3 ENA 127.0.0.1 8003 0 231 0 117039
1072487
1 RR 4 ENA 127.0.0.1 8004 0 233 0 121611
1491177
1 RR 5 ENA 127.0.0.1 8005 0 233 0 120231
1162390
1 RR 6 ENA 127.0.0.1 8006 0 231 0 4665602
1843309
1 RR 7 ENA 127.0.0.1 8007 0 229 0 118730
1373138
1 RR 8 ENA 127.0.0.1 8008 0 229 0 114949
1497304
1 RR 9 ENA 127.0.0.1 8009 0 230 0 119177
1224900

Eddy wrote:http://www.inlab.de/balanceng/

btw, I’m not surprised highwire isn’t talked about more.

A response from the Highwire guy:

I’m afraid that we haven’t been using Highwire for a while now. That
doesn’t mean that the problem Highwire was designed to address
doesn’t still exist (it very definitely does), but we have decided to
follow a different solution. We have divided our mongrel cluster into
two halves - a “normal” cluster on which the common “fast” operations
take place and an “admin” cluster on which occasional “slow” things
take place. Very soon, we plan to take this further and move the
“slow” cluster onto an entirely separate server.

That means, I’m afraid, that we haven’t got a patch for the problem
you mention because we’ve not developed it any further. Having said
that, Highwire really couldn’t be simpler, so if you want to take on
creating a patch, or even ownership of the Highwire project, please
be my guest! Let me know if you’re interested.

BTW - you might be interested in a couple of blog articles we’ve
recently written on Ruby on Rails:

http://about.82ask.com/news/wizardry/


Paul B.
CTO
82ASK
Mobile: +44 (0) 7740 857648
Main: +44 (0) 1223 309080
Fax: +44(0) 1223 309082
Email: removed_email_address@domain.invalid
MSN: removed_email_address@domain.invalid
AIM: paulrabutcher
Skype: paulrabutcher
LinkedIn: http://www.linkedin.com/in/paulbutcher