Looking for advice on how to report performance issues

aris · September 10, 2012, 10:16pm

Hello,

We have a Rails app running on Ruby 1.9.3, and a under Windows we’re
seeing
a 20x time penalty in our RSpec tests.

I have looked at some big patches from both Xavier and Luis and while
they
were very helpful, we’re still seeing it take 7-8x as long to run the
tests. That’s the difference between a minute (very usable) and 7:30
(nearly unusable).

I’m fairly new to Ruby, but not to performance analysis, so I’ve
volunteered to follow up on this problem. If I want to file/find a bug
to
associate with this problem, what sort of data should I be gathering,
and
how?

Background/random data:

We have 800+ tests, and on a good day they all run in a minute. Some do
no
I/O (pure ‘unit’ test), most do light I/O (usually a couple create+reads
to
MongoDB or a render partial). Only a few do any heavy lifting.

I don’t see any glaring hotspots. Most of the tests seem to take
several
times longer to run. Dots just accumulate slower.

RSpec report 360 seconds to run the tests, but the test run actually
completes in about 440 seconds. So the rest is what, RSpec and Ruby
spool-up time?

Thanks,

Jason

jason_marshall · September 10, 2012, 10:55pm

Profile on Linux and on Windows and look for differences.

Here’s something I just googled up, see if it works for you:
http://old.thoughtsincomputation.com/posts/profiling-with-rspec-and-ruby-prof

– Matma R.

jason_marshall · September 10, 2012, 11:35pm

Hey thanks for the reply.

It looks like I have to whitelist tests for that solution to work. I’m
worried it’ll miss the actual problems (setup time?). But it might be
worth looking at to narrow down the cause…

-Jason

jason_marshall · September 10, 2012, 11:51pm

jason marshall wrote in post #1075372:

Hello,

We have a Rails app running on Ruby 1.9.3, and a under Windows we’re
seeing
a 20x time penalty in our RSpec tests.

I have looked at some big patches from both Xavier and Luis and while
they
were very helpful, we’re still seeing it take 7-8x as long to run the
tests. That’s the difference between a minute (very usable) and 7:30
(nearly unusable).

Are your 7-8x as long numbers using the
tcs-ruby193_require_fenix_gc_hash_20120527.7z ready-to-use binary for
Windows from https://github.com/thecodeshop/ruby/wiki/Downloads

This download includes performance improvements from Luis and others and
I’m curious to see how it runs in your Windows environment.

https://groups.google.com/forum/?fromgroups=#!topic/thecodeshop/WVA4N2lJoM4

Jon

Fail fast. Fail often. Fail publicly. Learn. Adapt. Repeat.
http://thecodeshop.github.com | http://jonforums.github.com/
twitter: @jonforums

jason_marshall · September 11, 2012, 9:52am

jason marshall wrote in post #1075372:

We have a Rails app running on Ruby 1.9.3, and a under Windows we’re
seeing
a 20x time penalty in our RSpec tests.

You say you use MongoDB backend. Are you pointing to the same MongoDB
server in both cases, or are you comparing a Windows MongoDB with a
non-Windows MongoDB?

It might be interesting to run just the unit-test tests on both
machines, and just the database-lifting tests on both machines.

If the unit-tests are also much slower, then it looks there’s something
about ruby startup or rails startup which is slowing you down.

You could try creating an empty Rails app, creating a suite of 800 dummy
tests (assert true), and running it on both platforms. If you can
reproduce the problem like this then it will be much easier for people
to work on improving it.

jason_marshall · September 11, 2012, 8:28am

On Mon, Sep 10, 2012 at 10:16 PM, jason marshall [email protected]
wrote:

We have a Rails app running on Ruby 1.9.3, and a under Windows we’re seeing
a 20x time penalty in our RSpec tests.

And the hardware is comparable?

I’m fairly new to Ruby, but not to performance analysis, so I’ve volunteered
to follow up on this problem. If I want to file/find a bug to associate
with this problem, what sort of data should I be gathering, and how?

I have zero experience with Rails profiling, but maybe using the std
lib profiler can give you a first data point.

Background/random data:

We have 800+ tests, and on a good day they all run in a minute. Some do no
I/O (pure ‘unit’ test), most do light I/O (usually a couple create+reads to
MongoDB or a render partial). Only a few do any heavy lifting.

I don’t see any glaring hotspots. Most of the tests seem to take several
times longer to run. Dots just accumulate slower.

Just to exclude an obvious issue: do you have an AV engine up and
running on the Windows box? If yes, excluding some directories might
help. Some engines then still slow down operation so completely
switching off for the test might also be an option to determine
whether it’s AV.

To exclude something else: is there concurrent activity that eats CPU
and / or IO bandwidth?

Kind regards

robert

jason_marshall · September 11, 2012, 10:33pm

Oops, correction: 2000s vs 440s vs ~60s

jason_marshall · September 11, 2012, 10:30pm

Comments inline

On Mon, Sep 10, 2012 at 11:27 PM, Robert K.
[email protected]wrote:

On Mon, Sep 10, 2012 at 10:16 PM, jason marshall [email protected]
wrote:

We have a Rails app running on Ruby 1.9.3, and a under Windows we’re
seeing
a 20x time penalty in our RSpec tests.

And the hardware is comparable?

Yes, good question. On the same model hardware, it’s 20x initial, 4.5x
current. (2000, 440, vs <100 seconds)

I’m fairly new to Ruby, but not to performance analysis, so I’ve
volunteered
to follow up on this problem. If I want to file/find a bug to associate
with this problem, what sort of data should I be gathering, and how?

I have zero experience with Rails profiling, but maybe using the std
lib profiler can give you a first data point.

Yeah my readings from Google say it’s not easy to do what I need to do.
I’m hoping someone could talk me down from the ledge and point me to a
baked solution.

times longer to run. Dots just accumulate slower.

Just to exclude an obvious issue: do you have an AV engine up and
running on the Windows box? If yes, excluding some directories might
help. Some engines then still slow down operation so completely
switching off for the test might also be an option to determine
whether it’s AV.

I’ll entertain the question, but let’s take it as a given that turning
off
AV on Windows is a very, very unwise action.

Those numbers reflect my project directory already being excluded. I’m
not
sure I’m comfortable having Ruby whitelisted, though it’s an interesting
data point:

With Ruby exclusion, Regular is 1490 seconds, patched is 400 seconds.

To exclude something else: is there concurrent activity that eats CPU
and / or IO bandwidth?

Not much. Or at least, there’s not a lot of jitter in the numbers. I
just
have RubyMine and a web browser (low CPU) open.

Kind regards

robert

–
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

As a mostly unrelated anecdote, the Subversion team had conversations
like
this for a long time, and resisted making changes for a number of years.
In the 1.7 release they made some pretty substantial changes to how
they
use the filesystem, and ended up with something that was both simpler
(for
users) and happened to work a hell of a lot faster on Windows.

It sounds like you guys are on that path at least.

jason_marshall · September 11, 2012, 10:39pm

I used that for a couple months (then lost the URL, thanks for reposting
it). It took about 100-110s versus the 60 my coworkers got.

Unfortunately it would also randomnly crash with a 10 digit error code
while I was in the middle of testing. I was looking for something more
stable. Currently I’m using

https://raw.github.com/gist/3242245/5cdb932c0a544cfbb51fdee5759942d4d2c263f5/1-complete.diff

which Luis pointed me at in a stackoverflow comment.

But again I’m kind of a hard numbers guy, and I’d like to find a way to
offer something more constructive than some anecdotal numbers.