Recent Criticism about Ruby (Scalability, etc.)

On Wed, Oct 03, 2007 at 06:40:02PM +0900, Robert K. wrote:

On 23.09.2007 21:08, Phlip wrote:

That’s not scaling! (Okaaay, that’s only one aspect of scaling!)

It definitively is. One aspect of Ruby that hinders scaling is the
absence of native threads IMHO. On the other hand, mechanisms are
provided for IPC (DRb for example) which are easy to use and thus may be
counted as compensating at least partially for the lack of native threading.

Agreed. This is a far more useful argument against Ruby’s ability to
scale than benchmarks from some website.

wrong term here.
It’s scalability of code, but not of system load handling. There are
different possible uses of the term “scale”, and I think some of us are
running up against that discrepancy. You’re talking about the
scalability of an application, and Phlip is talking about the
scalability
of the development project behind the app, from what I see.

On Thu, 4 Oct 2007 01:42:22 +0900, Chad P. [email protected]
wrote:

That’s true. However, very roughly, compute resource can scale about
linearly with compute requirement.

What about Amdahl’s law?

What about it? Unless you’re writing software that doesn’t scale with
the hardware, more hardware means linear scaling, assuming bandwidth
upgrades. If bandwidth upgrades top out, you’ve got a bottleneck no
amount of hardware purchasing or programmer time will ever solve.

Amdahl’s law is relevant because most software can’t be written to
scale entirely linearly with the hardware, because most computational
problems are limited in the amount of parallelism they admit. You may
have been fortunate enough to have been presented with a lot of
embarrassingly parallel problems to solve, but that isn’t the norm.

design, combined with greater code cleverness needs to scale without
throwing hardware at the problem."

No argument there, as long as it’s understood that there are limits to
what can be achieved. I don’t want to discourage anyone from seeking
linear scalability as an ideal, but it’s not a realistic thing to
promise or assume.

-mental

On Oct 3, 2007, at 5:40 AM, Robert K. wrote:

It definitively is. One aspect of Ruby that hinders scaling is the
absence of native threads IMHO. On the other hand, mechanisms are
provided for IPC (DRb for example) which are easy to use and thus
may be counted as compensating at least partially for the lack of
native threading.

I admit to being puzzled by the general fascination
with threads (native or not) for solving scaling
problems.

Good old processes have always seemed like a reasonable
way to partition problems and take advantage of
concurrency opportunities due to waiting on I/O
(single processor) or the parallel nature of a
CPU-bound calculation (multi-processor).

Processes also prevent the kind of problems associated
with concurrent access to shared memory that are
inherent in a multi-thread/single-process model.
A multi-process solution can more easily be
retargeted to a multi-machine solution across a network
than can a multi-thread solution.

I suspect that language and OS features have a great
affect on the concurrency model a programmer might
select or prefer.

I’m not suggesting that processes are in all cases
preferred to threads just that I would tend to explore
a multi-process solution first before a multi-thread
solution.

On Thu, 4 Oct 2007 02:39:28 +0900, MenTaLguY [email protected] wrote:

Amdahl’s law is relevant because most software can’t be written to
scale entirely linearly with the hardware, because most computational
problems are limited in the amount of parallelism they admit. You may
have been fortunate enough to have been presented with a lot of
embarrassingly parallel problems to solve, but that isn’t the norm.

Actually, this is not entirely true. Amdahl’s law only applies to
optimizing a fixed workload. I need to rethink this argument.

-mental

Gary W. wrote:

I’m not suggesting that processes are in all cases
preferred to threads just that I would tend to explore
a multi-process solution first before a multi-thread
solution.

+1
threads are a misfeature

On Thu, Oct 04, 2007 at 02:15:09AM +0900, Brian A. wrote:

majority (all?) of readers of this ng will be involved in scenarios in
which the cost of development time far exceeds electricity or server
costs for their deployed applications.

Part of what kept me from getting involved in Ruby sooner than I did
was my erroneous view that I wanted to be using technology that would
be sufficient to run Amazon, Ebay, etc. Little did it matter that I
wasn’t pursuing that type of project - analogous to the fact that
most, if not all, Hummer drivers will never encounter serious off road
or combat situations :slight_smile:

That’s okay, though, because most civilian Hummers are unsuited to
offroading and combat situations.

I’m all for increasing the performance and scalability of Ruby, but I
think the productivity gains still outweigh the extra runtime costs
for most projects.

Agreed – for the vast majority of projects in the range for which Ruby
is typically used, as compared with languages with execution performance
that is actually sufficiently better than Ruby’s to bother measuring the
difference. I mean, sure, it’s fun to compare benchmarks between Python
and Ruby (for instance), but if you actually need a performance boost
you
should be comparing Ruby with C instead (or comparing Python with C, as
your tastes in productivity-enhancing languages may lead you).

On Thu, Oct 04, 2007 at 03:06:03AM +0900, Gary W. wrote:

problems.

Good old processes have always seemed like a reasonable
way to partition problems and take advantage of
concurrency opportunities due to waiting on I/O
(single processor) or the parallel nature of a
CPU-bound calculation (multi-processor).

There’s entirely too much focus on threads these days. In many cases,
separate processes are superior to multithreading concurrency. On the
other hand, there are cases where multithreading concurrency is superior
to multiprocessing concurrency.

Processes also prevent the kind of problems associated
with concurrent access to shared memory that are
inherent in a multi-thread/single-process model.
A multi-process solution can more easily be
retargeted to a multi-machine solution across a network
than can a multi-thread solution.

On the other hand, sometimes it’s nice to keep process overhead down.
Most of the time, however, I think multithreaded concurrency is a case
of
premature optimization.

I suspect that language and OS features have a great
affect on the concurrency model a programmer might
select or prefer.

I’m not suggesting that processes are in all cases
preferred to threads just that I would tend to explore
a multi-process solution first before a multi-thread
solution.

Same here. That’s one of the reasons that, though I know a fair bit
about multithreaded concurrency in theory, I’ve done very little with
multithreaded development in practice. In fact, all I’ve found need to
do with mutilthreaded dev so far is dealing with others’ code, where the
code was implemented using multithreaded concurrency before I ever laid
eyes on it.

On Thu, Oct 04, 2007 at 02:39:28AM +0900, MenTaLguY wrote:

Amdahl’s law is relevant because most software can’t be written to
scale entirely linearly with the hardware, because most computational
problems are limited in the amount of parallelism they admit. You may
have been fortunate enough to have been presented with a lot of
embarrassingly parallel problems to solve, but that isn’t the norm.

Maybe not “entirely”, but certainly close enough for government (or
corporate) work. I was under the impression we were talking about
massive-traffic server-based systems here, where throwing more hardware
at the problem (in the sense of extra blades, or whatever) is an option.
I did not think we were talking about something like a desktop app where
opportunities for parallelism are strictly limited – in which case I’d
agree that throwing more hardware at the problem is a non-starter. Of
course, I don’t know anyone who thinks endlessly adding processors to
your desktop system is the correct answer to a slow word processor.

It was probably meant as a stand-in for “more work at streamlining
design, combined with greater code cleverness needs to scale without
throwing hardware at the problem.”

No argument there, as long as it’s understood that there are limits to
what can be achieved. I don’t want to discourage anyone from seeking
linear scalability as an ideal, but it’s not a realistic thing to
promise or assume.

It’s close enough (again), for many purposes, to “realistic”. When you
can get roughly linear scaling up to 100 times as much scaling needs, as
opposed to trying to get similar scaling capabilities out of throwing
programmers (or programmer time) at the problem, that’s certainly
“realistic” in my estimation.

Obviously I’m not saying that you should write crap code and throw
hardware at it. On the other hand, there’s a sweet spot for effort
spent
in developing good, performant code – and beyond that point, you should
consider throwing hardware at the problem. In such circumstances, one
of
the primary measures of quality code is “Does it scale in a roughly
linear manner when you add compute resources?”

Daniel DeLorme wrote:

+1
threads are a misfeature

Threads are worse than GOTO.

Yes, sometimes you need GOTO. (Locally catch-throw). Do your darndest to
avoid it!

On Thu, 4 Oct 2007 03:06:03 +0900, Gary W. wrote:

I admit to being puzzled by the general fascination
with threads (native or not) for solving scaling
problems.

Good old processes have always seemed like a reasonable
way to partition problems and take advantage of
concurrency opportunities due to waiting on I/O
(single processor) or the parallel nature of a
CPU-bound calculation (multi-processor).

Ayup. At AOL, when we grew beyond the abilities of sendmail to handle
incoming mail from the Internet, we rolled our own SMTP server. It had
the
following characteristics:

  • Non-threaded, event-driven main loop
  • Asynchronous DNS calls
  • Asynchronous database calls
  • No disk I/O other than logging (which itself might have been async, I
    forget)
  • Therefore, no process ever waited on anything

The best configuration? One process per CPU.

On Thu, 4 Oct 2007 04:40:52 +0900, Chad P. [email protected]
wrote:

Maybe not “entirely”, but certainly close enough for government (or
corporate) work. I was under the impression we were talking about
massive-traffic server-based systems here, where throwing more hardware
at the problem (in the sense of extra blades, or whatever) is an option.
I did not think we were talking about something like a desktop app where
opportunities for parallelism are strictly limited – in which case I’d
agree that throwing more hardware at the problem is a non-starter. Of
course, I don’t know anyone who thinks endlessly adding processors to
your desktop system is the correct answer to a slow word processor.

I wasn’t thinking of only desktop systems, but I think you’re right:
many massive-traffic server-based systems can be embarrassingly
parallel, since jobs are often relatively independent (e.g. individual
user sessions/HTTP requests), and in that context adding more work
usually translates into simply adding more jobs. It’s going to depend
a lot on the nature of the problem domain, though: once the jobs
become sufficiently interdependent, you do start to hit a scalability
wall (often a problem in e.g. simulations or online games).

Anyway, I think I was wrong: Amdahl’s law may not be appropriate
here – we aren’t just talking about making a fixed-size job go
faster, but what happens when more work is added. Unless you’re doing
something like n-body simulations, the relative amount of
unparallelizable work can decrease as the absolute amount of work
increases, because the shared stuff that forces serialization can often
be partitioned, allowing more overall parallelism than was practical
with a smaller workload.

It’s close enough (again), for many purposes, to “realistic”. When you
can get roughly linear scaling up to 100 times as much scaling needs, as
opposed to trying to get similar scaling capabilities out of throwing
programmers (or programmer time) at the problem, that’s certainly
“realistic” in my estimation.

I’d submit that while you’re still able to get a two-orders-magnitude
increase in performance from simply improving your algorithms, you’re
probably not to the point where you could scale linearly. On the other
hand, by the time you’re scaling linearly, there’s probably not much
else to squeeze out of the thing, aside from micro-optimizations which
may get you some fractions of an order of magnitude improvement.

Obviously I’m not saying that you should write crap code and throw
hardware at it. On the other hand, there’s a sweet spot for effort spent
in developing good, performant code – and beyond that point, you should
consider throwing hardware at the problem. In such circumstances, one of
the primary measures of quality code is “Does it scale in a roughly
linear manner when you add compute resources?”

Yes, agreed.

-mental

On Wed, 03 Oct 2007 11:37:01 +0200, Robert K. wrote:

It definitively is. One aspect of Ruby that hinders scaling is the
absence of native threads IMHO. On the other hand, mechanisms are
provided for IPC (DRb for example) which are easy to use and thus may be
counted as compensating at least partially for the lack of native threading.

Depends what you mean by “scaling”. When I think of scaling, I think of
tens or hundreds of thousands of servers running
AOL/eBay/Amazon/Google-style apps. At that level, threads don’t matter
much; what matters is that your app scales near-linearly with hardware
(what we used to call “dollar scalable”).

What do you mean by scaling?

On Thu, Oct 04, 2007 at 06:09:02AM +0900, MenTaLguY wrote:

probably not to the point where you could scale linearly. On the other
hand, by the time you’re scaling linearly, there’s probably not much
else to squeeze out of the thing, aside from micro-optimizations which
may get you some fractions of an order of magnitude improvement.

Algorithms are important to performance, of course, and there’s a
minimal
amount of attention you should always want to employ in writing quality
code. My point is that ultimately, as things continue to scale upward,
you tend to pass a point where algorithm tweaking helps sufficiently any
longer to bother with it very much at about the same time you notice
that
no matter what else you do you’re going to have to add more hardware
resources.

Obviously I’m not saying that you should write crap code and throw
hardware at it. On the other hand, there’s a sweet spot for effort spent
in developing good, performant code – and beyond that point, you should
consider throwing hardware at the problem. In such circumstances, one of
the primary measures of quality code is “Does it scale in a roughly
linear manner when you add compute resources?”

Yes, agreed.

It seems that on one hand I’ve been advocating writing code so you can
scale upward in hardware resources, and that throwing hardware at the
problem is eventually the only likely option you have left, while on the
other hand you’re advocating for people writing good code in the first
place because crap code of sufficiently bad quality won’t scale very
well
no matter how much hardware you throw at it. In other words, I was
assuming reasonably good code as a baseline, and you were assuming
reasonably bad code as a baseline. These incompatible assumptions may
be
influenced by our respective work environments.

I’ll expand on my position, then:

  1. Hire good programmers.

  2. Have them write good code.

  3. Throw hardware at the scaling problem, because your good code
    written by good programmers can handle it.

On Thu, 4 Oct 2007 06:05:09 +0900, Phlip [email protected] wrote:

Daniel DeLorme wrote:

+1
threads are a misfeature

Threads are worse than GOTO.

I think the biggest issue is not threads per se, but rather threads
with shared state. A non-shared-state thread could be preferable
to an OS process.

-mental

On Wed, 3 Oct 2007 22:19:57 +0900, Charles Oliver N. wrote:

I find this perspective puzzling. In most large datacenters, the big
cost of operation is neither the cost of the servers nor the cost of the
development time to put code on them, it’s the peripheral electricity,
administration and cooling costs once the application written must be
deployed to thousands of users.

Speaking from my AOL experiences (which are admittedly getting a bit
long
in the tooth compared to today’s apps and hardware):

Yeah, but.

On the one hand, our hardware costs certainly dwarfed our development
costs. (And by “certainly”, I mean “I certainly imagine so, having
absolutely no memory of what the actual numbers were.”) And our
operating
costs were a big part of those.

What’s more, cooling and power do NOT scale linearly; at some point, the
incremental cost of one more CPU is a new data center or an extra
utility
trench. Plus, at the time, anyway, there weren’t big enough
power/cooling
products on the market to handle our needs. I imagine that’s changed
now.

On the other hand, that wasn’t our biggest problem. If handling ten
million users costs $X/user, and makes $Y/user profit, and doubling to
20
million users still costs roughly $X/user and thus makes $Y/user profit,
then doubling capacity to double the user base is really easy.
Hardware/operating costs do impact whether you can make a per-user
profit
in the first place, but they don’t impact scaling costs. Whether Ruby
takes up too much server power is a base hardware cost, not a scaling
cost.
If it’s too heavyweight, it’s too heavyweight at 1000 users, just like
it
will be at 100 million.

But our biggest problem was development - not even development costs,
but
development ability. Software development does not scale anywhere
NEAR
linearly, as Frederick Brooks well knows. Double the number of
programmers, and you’ve exponentially increased the number of
interpersonal
communications and meetings. That needs more project managers and
technical managers, which again increases the number of communications
channels. And that needs more support staff (HR, legal, facilities,
etc.)
More people need more office space and parking, and the bigger your
campus,
the more time is wasted travelling, and the more space is “wasted” on
non-productive square footage like kitchens and hallways. And so forth.

I haven’t even mentioned the sheer difficulty of hiring that many
developers. I imagine many here have had the problem of “If I could
hire
someone else, I’d have less to do; but I have too much to do to find
time
to hire someone.” That doesn’t scale either. Then there’s saturation.
At
one point, there were nearly as many technical job openings in the
Washington Post as there were total unemployed people in the Washington
area. Bad news.

Then there’s the sad fact that software simply does not scale cleanly.
Even if the core functions are dollar-scalable - and I’ll touch on that
in
another post - things like maintenance, operability, reporting,
metering,
debuggability, etc. are not. So as your business scales, you’ll spend
more
and more time doing infrastructure development instead of feature
development. There comes a point where there’s a negative economy of
scale; you’re too big to actually do any real innovation, because all
your
resources are sucked up keeping the beast alive.

The point of all this is that anything - ANYTHING - that increases
developer productivity is a huge, huge win. Sure, developer salaries
may
not have been over 50% of our expenditures. But without developer
productivity, you stagnate, and someone else zooms right past you.

On Thu, 4 Oct 2007 04:40:52 +0900, Chad P. wrote:

No argument there, as long as it’s understood that there are limits to
what can be achieved. I don’t want to discourage anyone from seeking
linear scalability as an ideal, but it’s not a realistic thing to
promise or assume.

It’s close enough (again), for many purposes, to “realistic”. When you
can get roughly linear scaling up to 100 times as much scaling needs, as
opposed to trying to get similar scaling capabilities out of throwing
programmers (or programmer time) at the problem, that’s certainly
“realistic” in my estimation.

A lot depends on your application requirements. If you design it from
the
ground up to be “shared nothing”, then you may well be lucky enough to
truly HAVE shared nothing. But you’ll also have a pretty limited
feature
set.

What’s the big buzzword today? Social networking. What did we used to
call that? “Community.” What was the single biggest sticky-paper
community feature? Buddy lists. Who does buddy lists besides the Big
Guys
(who can throw money at it) and the really small guys (who fit on a
single
server)? Nobody. Why? Doesn’t scale linearly. Think about what it
takes
to offer a feature that, for every simultaneous user, checks the list of
every other simultaneous user for people you know. Shared-nothing
that.

My area of expertise was the AOL mail system. And, looking back, there
were a number of core features we offered that simply couldn’t be done
in a
shared-nothing world over slow phone lines:

  • You could see when each recipient had read your e-mail.
  • If nobody had read it, you could unsend it.
  • You could forward large attachments and long threads without
    re-uploading
    them.
  • Corollary: the server could handle large attachments and long threads
    without storing multiple copies. (Disks were tiny then.)
  • When sending e-mail, the system would check if all your recipients
    were
    valid, not full, accepting e-mail from you, etc. If any were not, the
    message wouldn’t send. No bounces. (This requires a two-phase commit.)
  • If the sender or other recipeients of an e-mail were online, their
    address would become a hyperlink so you could IM them. (Buddy lists
    again.)
  • Your outbox pointed to the same message body as your recipients’
    inbox.
    And all recipients’ inboxes pointed to the same message as well. (Disk
    space again.)
  • The same e-mail message would appear differently to different clients,
    depending on the feature set of that client.
  • If you were the BCC recipient of an e-mail, you’d see a BCC header
    with
    your name. If you were the author, you’d see all the BCCs.
  • Large bulk mailings stored only a single copy of the message.
  • E-mail complaints could be sent to us in a manner that preserved their
    evidentiary value in court.
  • The client stayed in a wait state after sending until the servers
    could
    guarantee that your e-mail had actually been delivered. (Two-phase
    commit
    again.)

That’s just off the top of my head; I’m sure there were dozens of other
things I’ve forgotten that were designed when all the mail servers fit
on
one machine, and then had to scale to multiple replicated data centers.

Could e-mail live without these features? Sure. Internet e-mail never
had
them, a generation grew up without them, and these days nobody bemoans
the
fact that you can’t instantly know your message was delivered to its
destination; in fact, even bounces are becoming a thing of the past. If
you mistype an address, you may never know, and that’s just the way it
works. “Did you get my e-mail?” is a real question, not just a
passive-aggressive way of saying “I see you read my message, but have
not
yet responded.”

And some of the features were only important in an age where pipes (both
last-mile and LAN) were very narrow and disks and RAM were very small.
Spam, in particular, made the “one copy of each message” model obsolete,
because spammers wouldn’t play by the rules.

But restricting yourself to only shared-nothing features means ruling
out
an awful lot of features. Including anything depending on a database
index, or a table that fits completely in memory, or any sort of
rate-limiting or duplicate-detection or spam prevention, or in fact
anything that makes any assumptions at all about the state of any
database
you’re interacting with or relational integrity or any other transaction
in
the system, ever. Including whether the disk drive holding the
transaction
you just wrote to disk has disappeared in a puff of head crash.

It was always the little things that bit us. Know why AOL screen names
are
often “Jim293852”? Well, it started out as “The name ‘Jim’ is already
taken. Would you like ‘Jim2’?”. Guess how well that scales when the
first
available Jim is “Jim35000”? Not very.

Pop-quiz: Which of your core features would you have to eliminate
with
three million simultaneous users?

Jay L. wrote:

Pop-quiz: Which of your core features would you have to eliminate with
three million simultaneous users?

R&D vacations :slight_smile:

On 10/4/07, Todd B. [email protected] wrote:

Ruby will eventually (I think) become something more useful than PHP,
mostly because it is a language, like other cool ones (LUA, Erlang,
etc.) that attracts some very smart people.

Everyone seems to be focusing on the cost/benefit analysis. Have we
forgot about how fun it is to program?

Todd

Hmm… I guess I happen to be in the wrong thread, but the point
still stands (-: (Yes, I stole that smiley from someone else).

Todd

2007/10/3, Jay L. [email protected]:

much; what matters is that your app scales near-linearly with hardware
(what we used to call “dollar scalable”).

What do you mean by scaling?

To me “scaling” just means “adjusting to higher load”. What you have
in mind seems a concrete solution to provide scaling but that’s just
one (albeit a very common one).

Kind regards

robert

On Thu, 4 Oct 2007 17:08:49 +0900, Robert K. wrote:

one (albeit a very common one).
Sure; it also depends on what you mean by “higher load”. Threads help
with
what they now call “vertical scaling” - bigger and bigger boxes. Or,
rather, they can help, if you’re I/O bound. (So can adding processes,
but presumably threads have less overhead.)

If you’re CPU bound, splitting your operation into threads doesn’t help
any
more than just running more processes.

And, of course, bigger boxes are much more expensive than small boxes.
We
had some apps that used to run on 8-way or 16-way machines (ten years
ago
when such a things was a rarity), but we spent a lot of development
effort
rearchitecting them to run on more standard hardware, because that’d be
cheaper.