Here are some notes from the scalability session of last week’s Rails
camp. They were entered by another session participant and are posted
at:
http://www.rubyonrailscamp.com/10%3A15%2Bsession%2B-%2Bscaling
The key points from my point of view:
- the Ruby VM is sketchy, rather like the Java VM around 1997
- the single threaded nature of Rails dispatch handling means we may
incur a big memory/hardware hit, for example if pages depend on remote
services with varying response time.
Nonetheless Rails is still attractive because of its elegance and
expressiveness. But keep your eyes open.
What scale applications on the Engine Y. site?
* terminology: ‘slice’ is a virtual xen server – around 30req/second
* theballot.org – Ran on 2 slices. 1Gb/second of traffic. They ran
on $20/month hose before then.
* kongregate – Flash game distribution site. 3 slices now. Deploying
several times a day. They plan for 67 boxes by next summer.
* They have made scaling easy, to levels equivalent to basecamp.
Scaling solutions of theirs
- Start with 2 Load balancers
- slices dont even have disks, mount root from external FS via GFS.
Each slice gets 5 mongrel instances. This stuff runs enginex (sp?). - Each ‘slice’ machine stores a DB instance. There is a rails plugin
for managing writes/reads. - Use AOE raid for disk store.
- Likely bottleneck is slices, not file system. Single cluster would
be 16-24 machines (which is a big web site) - On sudden spike when hosted with them, in an hour they can add
slices. - For us what we build now… dont need to do anything special to be
hosted by them. It’ll generally migrate easily. - Capistrano is used by them for deployment. It helps a lot.
- Number 1 performance issue that they see is N+1 poorly structed
SQL problem. - attr_accessible, attr_protected is IMPORTANT
- Memory usage is issue on servers. Mongrel process is at least
40Meg each. Some extreme cases are above 140Meg. Memory is cheap.
Processor usage has not been a factor. All boxes are dual processor quad
core AMDs and they are sleeping. - Don’t worry about it until it is becoming a problem! Don’t
preoptimize. - pennyarcade is a rail sit and it is huge.
- Amount of silicon used for rails is 30% to 5x more than other
machines typically used – but so what? - Statement: In the end DB limits you, not the application.
Lack of multithreading is raised as a question
Case study:
- Java vs Ruby – Say, 1000 simultaneous requests
* Mongrel can multithread but can back up on slow request
dispatch.
* In cases when you have to wait for things to do stuff –
backgrounddrb is used. This releases the lock on the worker. Also look
at ‘merb’ – mongrel plus erb. First use for this is image upload.
* In a typical rails environment image upload locks process.
* Worst case – 100Meg mongrel processes, 1000threads
simultaneously. That’s 100Gig, @ 16Gig per machine makes for 8
machines… Not a big deal.
Array implementation and rails calls
* Supposedly each rails call creates 60000(!) arrays.
* There is a patch to make Array implementation quicker – but it is
not accepted yet.
Problem with Ruby is some guys hobby
* At rubyconf matz’s talk was underwhelming. Development way slow.
* rubinius (sp?) – Interpreter would be compiled to C. And
interpreter would be written in ruby. Apparently good performance gains
have been seen.
Corporate support, etc.
* IBM hosting this
* Sun doing jruby
* See recent post on digg – php eats rails for lunch? Presumably
this post: http://ohloh.net/wiki/articles/php_eats_rails
Hiring
* Hiring is about to go dot.com stupid – anybody who breathes is
almost good enough.
* Hard to find good programmers who know rails and ruby
* Good interview question for them: Have you ever implemented a
binary level protocol?