Ruby on Rails performance on lots of requests and DB updates per second

addis_a · October 8, 2014, 3:16pm

I’m developing a polling application that will deal with an average of
1000-2000 votes per second coming from different users. In other words,
it’ll receive 1k to 2k requests per second with each request making a DB
insert into the table that stores the voting data.

I’m using RoR 4 with MySQL and planning to push it to Heroku or AWS.

What performance issues related to database and the application itself
should I be aware of?

How can I address this amount of inserts per second into the database?

EDIT

I was thinking in not inserting into the DB for each request, but
instead
writing to a memory stream the insert data. So I would have a scheduled
job
running every second that would read from this memory stream and
generate a
bulk insert, avoiding each insert to be made atomically. But i cannot
think
in a nice way to implement this.

LZ_Olem · October 8, 2014, 3:34pm

On heroku your most important bottleneck (although not the only one) is
the average response time of your requests. You want all your response
times under 500ms, ideally under 200ms

See this document for an explanation:
https://devcenter.heroku.com/articles/request-timeout

This is the most important thing you should worry about.

The performance of the database, and its proximity to the Heroku dynos,
are also important, but those can be optimized by getting a bigger
database.

Moving anything long-running into a job queue is definitely the way to
go. Generally you do this with Resque (or Delayed Job) back-end, and in
Rails 4 you can use the ActiveJob paradigm to create your Job classes.
Most of the time jobs use a Redis back-end, which fortunately for you is
really, really performant and fast.

As far as “bulk” operations you would have to write some logic yourself
to do that, you may want to experiment with using a separate Redis
instance (separate from the one keeping track of the job queue) as your
temporary data store, then having your jobs do bulk operations reading
from Redis and moving the data into MySQL.

Check out this tool for load testing – I’ve found it slightly hard to
work with but it is very powerful:

https://www.blitz.io

Particularly if you can use it to measure your average response times on
Heroku, you will want to make sure your response times don’t slow down
at scale. Make sure you have a good understanding of Heroku random (aka
“dumb”) routing and why scale creates request queuing.

-Jason

On Oct 8, 2014, at 9:04 AM, LZ Olem [email protected] wrote:

I was thinking in not inserting into the DB for each request, but instead
writing to a memory stream the insert data. So I would have a scheduled job
running every second that would read from this memory stream and generate a bulk
insert, avoiding each insert to be made atomically. But i cannot think in a nice
way to implement this.

Jason Fleetwood-Boldt
[email protected]

All material Jason Fleetwood-Boldt 2014. Public conversations may be
turned into blog posts (original poster information will be made
anonymous). Email [email protected] with questions/concerns about
this.

LZ_Olem · October 8, 2014, 8:53pm

On Wed, Oct 8, 2014 at 6:04 AM, LZ Olem [email protected] wrote:

it’ll receive 1k to 2k requests per second with each request making a DB
insert into the table that stores the voting data.

I’m using RoR 4 with MySQL and planning to push it to Heroku or AWS.

Heroku offers PostgreSQL only, so you might want to switch over
in development to avoid any incompatibilities.

How can I address this amount of inserts per second into the database?

This seems like a perfect example of premature optimization

If you’re really concerned, set up a test app on Heroku and fire up
jmeter or ab or something and see exactly how it performs. You may
find you have nothing to worry about.

FWIW,

Hassan S. ------------------------ [email protected]

twitter: @hassan

LZ_Olem · October 8, 2014, 9:55pm

On Oct 8, 2014, at 2:51 PM, Hassan S.
[email protected] wrote:

Heroku offers PostgreSQL only, so you might want to switch over
in development to avoid any incompatibilities.

That’s not entirely true, Heroku offers an addon through ClearDB for
MySQL. Also you can use Heroku with an Amazon RDS instance too
(obviously additional setup required)

Since scale is an issue, you might want to test postgres & mysql itself
for the specific operations you are doing (at scale), then choose the
database based on the results of that test.

How can I address this amount of inserts per second into the database?

This seems like a perfect example of premature optimization

If you’re really concerned, set up a test app on Heroku and fire up
jmeter or ab or something and see exactly how it performs. You may
find you have nothing to worry about.

Yes, I agree, although if it actually is thousands of web requests per
second it will require more than 1 dyno, or a larger Heroku dyno (like
the Performance dynos), and it certainly will hit some bottlenecks at
some point.

I wouldn’t agree that thinking about how an app is going to scale (and
how to architect it so that it will scale) is necessarily premature
optimization, although over-architecting for scale before load testing
to determine where the bottlenecks are certainly could lead to premature
optimization (particularly, optimizing things that don’t need to be
optimized). I think the point is, make sure you can’t disprove the
hypothesis, “If I optimize X, I will see performance gain Y” for any
specific part of the stack you might optimize. If you can’t disprove
(and of course you must actually try to disprove) that hypothesis, it is
reasonable to conclude you have identified the bottlenecks and can
justify spending time on optimizing that area of the stack.

Since no one else mentioned it, I would add you will use New Relic
significantly here. You will need to get the New Relic “plus” or
“premium” plan, so you can drill down into the requests to see where the
bottlenecks are.

By “ab” do you mean “Apache Bench”?

I think both those tools will generate request from the programmer’s
connection. One thing I like about blitz is that it can hit your website
with thousands of concurrent requests from different data centers across
the planet, so you can see how it performs different depending on
geographic region.

-Jason

Jason Fleetwood-Boldt
[email protected]

All material Jason Fleetwood-Boldt 2014. Public conversations may be
turned into blog posts (original poster information will be made
anonymous). Email [email protected] with questions/concerns about
this.

LZ_Olem · October 9, 2014, 8:05am

Interestingly, at just 1K votes per second sustained for 10 hours, you
could record a vote for the entire population of California! At 2K/sec
for
12 hours, you could record votes for every single person in California,
Texas, New York, and Florida!

As for the question at hand, depending on how long you expect your
voting
to run, you may want to make sure your database is set up well enough to
handle inserts at that speed when you’re pushing past 80M votes,
especially
considering that the biggest rush probably occurs toward the end of a
voting period.

LZ_Olem · October 8, 2014, 10:57pm

On Wed, Oct 8, 2014 at 12:54 PM, Jason Fleetwood-Boldt
[email protected] wrote:

Heroku offers PostgreSQL only

That’s not entirely true, Heroku offers an addon through ClearDB for MySQL.

I hadn’t looked through the addons for a while - good to know.

Also you can use Heroku with an Amazon RDS instance too (obviously
additional setup required)

Of course there are multiple options if you’re not trying to keep the
deployment within Heroku’s sphere, but that kind of gets away from
the whole ‘no-admin-work’ reason to use Heroku in the first place

By “ab” do you mean “Apache Bench”?

Yes.

I think both those tools will generate request from the programmer’s
connection.

Mostly; JMeter does have the ability to run via multiple distributed
engines so you can use any systems you have available.

–
Hassan S. ------------------------ [email protected]

twitter: @hassan

LZ_Olem · October 9, 2014, 1:16pm

On Wednesday, 8 October 2014 09:04:29 UTC-4, LZ Olem wrote:

How can I address this amount of inserts per second into the database?

EDIT

I was thinking in not inserting into the DB for each request, but instead
writing to a memory stream the insert data. So I would have a scheduled job
running every second that would read from this memory stream and generate a
bulk insert, avoiding each insert to be made atomically. But i cannot think
in a nice way to implement this.

Don’t implement that, certainly not as a first thing. Build something
straightforward that does what you intend (collecting poll results),
then
load-test it. Building a hyper-scalable DB layer isn’t going to do much
good until you’re sure the layers in front of it (app servers, etc) can
handle the load.

You may also want to consider what part of these results needs to be
durable. For instance, if an exact “user -> vote” mapping isn’t needed
you
could hack something together with Redis and its INCR command.

–Matt J.