Minitest randomization

combas · October 7, 2010, 12:20am

Minitest was intended to be a minimal replacement for Test::Unit and
RSpec. It mostly succeeds in this goal; however, it violates it in one
annoying case. It adds a feature not found in RSpec, a feature that I
find noble in theory, but troublesome in practice: test randomization.
And it enables it by default.

This has no practical effect on an all-succeeding suite, or one in which
a single test is failing. But if you have a few failing tests, it
completely messes with your mind. Usually you will focus in on a single
failing test and work to make that one pass; usually this is the first
one to fail. But the randomization means that when you run it and watch
it fail again you have to squint and read through all the randomly
ordered failures to see where your current victim is. It’s a real flow
killer.

Randomization also mandates spewing options (notably seed) onto the
console, which serve only to clutter up the display and obscure the
all-important success or failure methods. And for some reason you’re
printing them twice (once at the beginning and again at the end of the
run). And even if knowing the seed is important if it provoked a
failure, why bother printing them during a successful run?

Moreover, most well-factored OO code these days does not exhibit
isolation problems. Yes, it’s a good idea to randomize your tests once
in a while – say, once before checking in, or before a release – but
the benefit is not worth the above costs. And even if I thought it were
worth it, an alleged functional replacement library should not go
changing default behavior like that.

If I submit a patch to do the following, will you accept it?

make test randomization an option (“randomize”)
make the default for that option be “off”
output the command-line options only if one of the following is true:
** “verbose” is on
** “randomize” is on AND there was a failure

Cheers -

Alex (the Wrong guy :-))

combas · October 7, 2010, 1:14am

The best solution would be to make the execution random, but the
output in order.

combas · October 7, 2010, 2:56am

On Oct 6, 2010, at 15:03 , Alex C. wrote:

This has no practical effect on an all-succeeding suite, or one in which a single test is failing.

Really? I think preventing test order dependency has a very practical
effect.

If I submit a patch to do the following, will you accept it?

make test randomization an option (“randomize”)

It is an option on a class by class basis. See ri MiniTest::Unit::TestCase::test_order.

make the default for that option be “off”

I actually disagree with this, but I think that is less important atm.

Why aren’t you using --seed when you rerun your specs? If you use --seed
with the previous value, all of your complaint about having to “squint”
to find your previous failure goes away.

Phrogz contributed some patches for 1.6.0 that ensured that specs could
be run in defined order by serializing their names. Combine that with
::test_order above and you have exactly what you want. Easy-peasy!

I could have sworn that Spec already overrode ::test_order to be
:sorted, but I have absolutely no commit to that effect. I guess Phrogz
was explicitly defining that on his specs.

I still think that random tests/specs are stronger tests/specs and
completely disagree with you that “most well-factored OO [test] code
these days does not exhibit isolation problems” on the basis that most
OO [test] code is not well-factored. minitest’s test dir flays at 535.
Wrong’s tests flay at 1150.

Anecdotally (unfortunately, nothing I can go into in great detail), I’ve
seen far too many projects with test order dependencies, which is the
reason that feature went into the library in the first place.

In the cases where I’m doing big refactorings and getting huge swaths of
errors and the randomization is bugging me, I’ll temporarily pop in
test_order to sort them and get through my work. But I always remove it
before committing.

output the command-line options only if one of the following is true:
** “verbose” is on
** “randomize” is on AND there was a failure

Unfortunately, we need to output the seed value at the beginning in the
case that your tests not only fail, but crash (like when you’re Aaron
Patterson and you’re working on C extensions instead of writing ruby
like a good person).

I can see it going the verbose route, but maybe there needs to be a
middle level verbosity? Full verbosity is damn noisy/annoying unless you
need those test method times (and then it is awesome).

combas · October 7, 2010, 2:58am

On Oct 6, 2010, at 16:11 , Steve K. wrote:

The best solution would be to make the execution random, but the
output in order.

that’s not a bad idea at all… but it would be DAMN confusing if you
did have test order dependency errors but couldn’t see why you were
getting your failures because they always showed up in sorted order…
I’ll have to think about that one for a bit.

combas · October 10, 2010, 7:15pm

P.S. I didn’t want to go there, but if you’re gonna throw down –
“flay lib” for Wrong gives 168 to Minitest’s 795, sucka! (and they
both have about 28Kloc)

Sorry, read “wc” wrong – they both have about 1100 lines of code.

And I just did a checkin that makes Wrong’s flay 0.

Sucka.

A

combas · October 11, 2010, 5:27am

On Oct 9, 2010, at 16:50 , Alex C. wrote:

I guess this comes down to idempotency. I expect that if I do
something twice in a row I will get the same result. Randomizing by
default breaks this expectation. It’s astonishing, therefore bad, no
matter how good from a theoretical standpoint, and especially
astonishing when people have 10+ years of xUnit and its heirs building
these expectations.

Idempotency is a red-herring. There is nothing about the xunit
family/philosophy of tools (or any other test tool that I have used or
studied–except rspec) that suggests that test order must be run in the
order defined (or must be run in any order at all). Just look at the new
tools coming out that distribute and multithread/process your tests and
you can see that right there, the notion has to be thrown out the window
by design.

As for your astonishment, I thought it was pretty well addressed in the
first line of my reply: “Really? I think preventing test order
dependency has a very practical effect”. If you’re still astonished
after that, then you’re probably misusing the word.

At this point I’m going to cut much of your reply and everything I’ve
written so far in response and cut to the chase:

And I’m not even disagreeing with your observation! I agree that there
should be a randomizing mode, and that people should run it fairly
often. Just not all the time and not without a config or command-line
option to turn it off.

Apparently this is the crux of our disagreement:

I do think that people should randomize their tests a MAJORITY of
the time and turn it off TEMPORARILY when they need to sort out an
issue. If it wasn’t random by default, it wouldn’t happen at all.

If you disagree with that (and still want to use minitest), you ALREADY
have multiple ways to fix it for your own tests:

define ::test_order to return :sorted. That’d be your “config”
suggestion above.
use --seed when you want the order to be fixed, via TESTOPTS if
you’re using rake. And that’d be your command line option…

If you want a third option available, feel free to propose it and I’ll
gladly consider it.

P.S. I’m still mulling over Steve K.'s suggestion that the output
be sorted. I think it could be very confusing when you do have test
dependency errors but that there might be some way to mitigate the
confusion. I’d like to hear what you think about his suggestion.

combas · October 11, 2010, 5:28am

On Sun, Oct 10, 2010 at 6:25 PM, Ryan D.
[email protected]wrote:

If you want a third option available, feel free to propose it and I’ll
gladly consider it.

P.S. I’m still mulling over Steve K.'s suggestion that the output be
sorted. I think it could be very confusing when you do have test dependency
errors but that there might be some way to mitigate the confusion. I’d like
to hear what you think about his suggestion.

when running in console, is it possible to highlight test
failures/errors
with a different (red) color? since most modern terminals support
coloring
text. (a pity that I’m stuck w/ Windows due to smth).

once test finishes, maybe an additional report can be provided with only
the
highlighted stuff collected in order. with some reporting related
options/command line arguments added to activate this behavior.
Furthermore,
as it’s an additional report, it can be not just test, maybe a temporary
html that automatically opened in my browser?

combas · October 11, 2010, 10:07am

On Sun, Oct 10, 2010 at 11:25 AM, Ryan D.
[email protected]wrote:

P.S. I’m still mulling over Steve K.'s suggestion that the output be
sorted.

I haven’t made any psychological tests on this, but I suspect that if
I’m
looking at a largish print out it might be easier to notice something
unexpected if the print out is always in the same order. So that seems a
useful option to me.

I think it could be very confusing when you do have test dependency
errors

good point

but that there might be some way to mitigate the confusion.

I hesitate to suggest this, because it seems too obvious, so there might
be
something wrong with it that I’m missing. If the output is going to be
sorted, the whole of the unsorted output will be available (?), so there
could be an option to write the unsorted output to a file, as well as
outputting the sorted output?

combas · October 10, 2010, 11:41pm

I guess this comes down to idempotency. I expect that if I do
something twice in a row I will get the same result. Randomizing by
default breaks this expectation. It’s astonishing, therefore bad, no
matter how good from a theoretical standpoint, and especially
astonishing when people have 10+ years of xUnit and its heirs building
these expectations.

Thanks for your detailed reply, but I should point out that your
responses didn’t address either my idempotency or astonishment
arguments.

Why aren’t you using --seed when you rerun your specs?

Because I say “rake” to run my tests, and seed is not preserved. How
are you running yours?

If you use --seed with the previous value, all of your complaint about having to “squint” to find your previous failure goes away.

Well, no, not all of them

And I wasn’t being metaphorical. When I’m poring over a console full
of fail, I squint. And sometimes I sigh.

make test randomization an option (“randomize”)

It is an option on a class by class basis. See ri MiniTest::Unit::TestCase::test_order.

Yeah, I saw that before, but that doesn’t fix my core complaint unless
I hack minitest to always run in consistent order. Hence my request
for a patch.

And you’re right, it seems that that patch is trivial for
MiniTest::Spec, and probably not too hard for
MiniTest::Unit::TestCase.

BTW your docs should reflect that for MiniTest::Specs, :alpha and
:sorted both really mean :defined – or whatever you want to call the
traditional “order in which they occur in the file” (which is often
roughly in order of complexity, so earlier failures are often more
essential, so should be fixed first). (And for
MiniTest::Unit::TestCases, too, but only in Ruby 1.9.)

Anecdotally (unfortunately, nothing I can go into in great detail), I’ve seen far too many projects with test order dependencies, which is the reason that feature went into the library in the first place.

I’m sure you thought you had a good reason for being inconsistent. But
in this case, where the library ships with core Ruby and is supposed
to be an invisible drop-in replacement for Test::Unit, it’s a bridge
too far.

And I’m not even disagreeing with your observation! I agree that there
should be a randomizing mode, and that people should run it fairly
often. Just not all the time and not without a config or command-line
option to turn it off.

Unfortunately, we need to output the seed value at the beginning in the case that your tests not only fail, but crash (like when you’re Aaron P. and you’re working on C extensions instead of writing ruby like a good person).

Wait, are you saying the reason we all have to look at console spam is
so that Aaron doesn’t have to type “–verbose” when he’s writing C in
Ruby? Does he have compromising photos of you with the maid or
something?

I still think that random tests/specs are stronger tests/specs and completely disagree with you that “most well-factored OO [test] code these days does not exhibit isolation problems” on the basis that most OO [test] code is not well-factored. minitest’s test dir flays at 535. Wrong’s tests flay at 1150.

That’s a nice debate trick – insert a qualifier and then disagree
with it, not with what I actually said

Anecdotally, when you see test order dependency problems, are they
because the tests are not isolated or because the production code
isn’t? I was talking about production OO code, not test code (which is
more like a bunch of functions than like an object anyway).

Flay’s a nice tool but the threshold for DRY in test code is higher
than in production code. Test code needs to be understandable above
all else, so you can hone in on the scenario leading up to a
file-and-line failure, and that can mean duplication to some extent is
desirable. That doesn’t mean the tests aren’t isolated.

A

combas · October 11, 2010, 10:36pm

(Apologies if the quotes don’t come out right in plain text – I’m using
both Apple Mail and GMail and they’re playing crazy HTML games with my
draft.)

That’s a fair point. The idempotency I was referring to was that running
“rake test” twice on a failing suite gets different results – if not
different failures, then the same failures in a different order.

As for your astonishment, I thought it was pretty well addressed in the

first line of my reply: “Really? I think preventing test order dependency
has a very practical effect”. If you’re still astonished after that, then
you’re probably misusing the word.

I’m using it in its technical sense:

And I stand by what I wrote: if your tests are all passing, and they’re well
isolated, then randomizing them has no practical effect. It’s just shuffling
a deck full of aces.

At this point I’m going to cut much of your reply and everything I’ve

written so far in response and cut to the chase:

And I’m not even disagreeing with your observation! I agree that there

should be a randomizing mode, and that people should run it fairly

often. Just not all the time and not without a config or command-line

option to turn it off.

Apparently this is the crux of our disagreement:

I do think that people should randomize their tests a MAJORITY of the
time and turn it off TEMPORARILY when they need to sort out an issue. If it
wasn’t random by default, it wouldn’t happen at all.

This is a noble position, as I said before. You’re the self-appointed
isolation vigilante, crusading against a problem you abhor. But I’ve rarely
encountered it. I feel that my tests don’t need randomization, and the extra
output clutters my console (*), and the shuffling cramps my debugging style,
so I want it off unless I ask for it. If you’re Batman, I feel like I’m the
Lorax. I speak for the trees whose pristine consoles are being polluted, but
who haven’t spoken out. (I haven’t really heard a chorus of protestors in
favor of randomization either, fwiw.)

You’re the library author, so you have the privilege of deciding what mode
is the default. I’m hoping to convince you of a few things, but if I don’t,
I won’t take it personally.

Sounds like we’re approaching a compromise, though: an option for me,
defaulting to off for you. (And option != monkey patch – it’s a clear API
like a named switch on the command line and/or a value on some Minitest
object, e.g. “Minitest::Config.randomize = false”.)

I’d also be happy with just a verbosity setting, maybe with several levels
like you suggest.

use --seed when you want the order to be fixed, via TESTOPTS if you’re

using rake. And that’d be your command line option…

TESTOPTS. Roger that. Never used it before. Maybe the Minitest README should
say something about that when it talks about --seed. (Oh, looks like it
doesn’t talk about --seed either.)

If you want a third option available, feel free to propose it and I’ll

gladly consider it.

Had a weird thought while doing the dishes… what if you write out seed
somewhere persistent like .minitest_seed, then erase it after the run but
only if the run was successful. Then when a run starts, if .minitest_seed
exists, it uses it (and says so) instead of rolling a new one. That way you
don’t have to print out anything for successful runs and the user doesn’t
have to remember anything and idempotency is preserved (if it fails once,
it’ll fail the next time, in exactly the same way) and it’ll keep failing
consistently until you fix the problem. It also works for C hackers since a
crash means it won’t erase the cached seed.

(And hey, also, for Aaron’s sake, can’t you trap SIGSEGV and print the seed
then? Not a rhetorical question since I haven’t done any C+Ruby stuff and I
know signals are sometimes flakey.)

Since --seed makes in-test randomization freeze out too, I think there
should be separate options for all three (–seed, --randomize, and
–verbose).

P.S. I’m still mulling over Steve K.'s suggestion that the output be

sorted. I think it could be very confusing when you do have test dependency
errors but that there might be some way to mitigate the confusion. I’d like
to hear what you think about his suggestion.

I like it. It’s pretty weird though. it’s a very pleasant dream I’m not sure
if it will survive in the cold light of day.

A

(*) My conosle is already way cluttered even with the minimum verbosity –
my collaborator Steve wrote some code that runs each of our tests in its own
VM process, to ensure isolation of dependencies and other stuff, so I get a
big long scroll of test runs, each of which is now 2 lines longer because of
“test run output” cruft. git clone wrong and run “rake rvm:test” to see what
I mean. Every line of output I save is multiplied by (N tests) x (M Ruby
versions). Since it’s slow, I only run it before checkin.

combas · October 11, 2010, 5:38pm

On Oct 10, 2010, at 08:04 , Colin B. wrote:

On Sun, Oct 10, 2010 at 11:25 AM, Ryan D. [email protected]wrote:

P.S. I’m still mulling over Steve K.'s suggestion that the output be
sorted.

I haven’t made any psychological tests on this, but I suspect that if I’m
looking at a largish print out it might be easier to notice something
unexpected if the print out is always in the same order. So that seems a
useful option to me.

Well, shouldn’t any failure/error be unexpected (beyond the regular test
first pattern).

combas · October 13, 2010, 12:26am

On Oct 10, 2010, at 20:15 , redstun wrote:

when running in console, is it possible to highlight test failures/errors
with a different (red) color? since most modern terminals support coloring
text. (a pity that I’m stuck w/ Windows due to smth).

once test finishes, maybe an additional report can be provided with only the
highlighted stuff collected in order. with some reporting related
options/command line arguments added to activate this behavior. Furthermore,
as it’s an additional report, it can be not just test, maybe a temporary
html that automatically opened in my browser?

Neither of these suggestions are things that I think belong in minitest.
“mini” is the prefix for a reason. I am working on an extension system
right now that should help you write them as plugins. It will be part of
the 2.0 release.