Testy.rb - ruby testing that's mad at the world

On Mar 29, 2009, at 2:09 PM, Phlip wrote:

Or even less:

name = 42
assert{ name == ultimate.answer }

well, for one that won’t run :wink: assert takes and arg and not a
block… but that aside

cfp:~ > cat a.rb
require ‘test/unit’

class TC < Test::Unit::TestCase
def test_with_shitty_error_reporting
name = 42
assert name == ‘forty-two’
end
end

cfp:~ > ruby a.rb
Loaded suite a
Started
F
Finished in 0.007649 seconds.

  1. Failure:
    test_with_shitty_error_reporting(TC) [a.rb:6]:
    is not true.

1 tests, 1 assertions, 1 failures, 0 errors

the error message ‘ is not true’, on 8th line of output (out of
possibly thousands) makes me want to hunt the programmer down that
wrote that club him to death with a 2x4. the assert api facilities
insanity for the code maintainer

now, in testy

cfp:~/src/git/testy > ruby -I lib a.rb

my lib:
name compare:
failure:
expect:
name: forty-two
actual:
name: 42

my thinking, currently, is that the test writer should be forced to

. name the test suite
. name the test
. name the check

because doing these three things allows for informative error
reporting. testing just shouldn’t facilitate obfusicating what went
wrong - imho.

cheers.

a @ http://codeforpeople.com/

On Mar 29, 2009, at 1:51 PM, Brian C. wrote:

Ara Howard wrote:

  result.check :name, :expect => 42, :actual => ultimate.answer

I’m afraid I’m missing something. Why is this better than

assert_equal 42, ultimate.answer, “name”

you can basically do that too, but i continually forget which is
expected and which is actual and, as you know, that’s a slippery error
to track down at times.

. testing should improve your code and help your users, not make
you want to
kill yourself

Hear hear to that!

:wink:

requiring programmers learn exactly 2 new method calls

Well, it would be nice to know what those 2 methods calls were, and
their semantics, without reverse-engineering the code. Are these
Testy.testing and Result#check ?

yes. that is it.

I did find the gem, and installed it, but the generated rdoc is
entirely
free of comments or explanation.

what are these comments you speak of?

seriously, this is an experiment at this point but they will be
forthcoming if it sprouts wings

at this point my feeling is that breaks ths concept of BDD

“Using examples to describe the behavior of the application, or of
units of code”

because littering the example code with esoteric testing framework
voodoo turns it into code in the testing language that does not
resemble how people might actually use the code - at least not without
mentally unraveling considerable indirection. i always end up writing
both samples and tests - one of the goals of testy is that, by having
a really simple interface and really simple human friendly output we
can just write examples that double as tests. in the latest testy you
can do this

cfp:~/src/git/testy > cat a.rb
require ‘testy’

Testy.testing ‘my lib’ do

test ‘foo’ do |result|
list = [42]
result.check :fooness, :expect => 42, :actual => list.last
end

test ‘bar’ do |result|
list = [42.0]
result.check :barness, :expect => 42.0, :actual => list.last
end

end

get a listing of which tests/examples i can run

cfp:~/src/git/testy > ruby -I lib a.rb --list

  • foo
  • bar

run one of them (actually regex matching so you can select more that
one)

cfp:~/src/git/testy > ruby -I lib a.rb bar

my lib:
bar:
success:
barness: 42.0

you can also do something like this (experimental) to just make a
simple example

cfp:~/src/git/testy > cat a.rb
require ‘testy’

Testy.testing ‘my lib’ do

test ‘just an example of summing an array using inject’ do
a = 1,2
a.push 3
sum = a.inject(0){|n,i| n += i}
end

end

cfp:~/src/git/testy > ruby -I lib a.rb ‘just an example’

my lib:
just an example of summing an array using inject:
success: 6

testy will just display the return value if no results are explicitly
checked but, of course, exceptions are still reported and cause a
failed test in the normal way.

so the goal is making it even easier to have a user play with your
tests/examples to see how they work, and even to allow simple examples
to be integrated with your test suite so you make sure you samples
still run without error too. of course you can do this with test/unit
or rspec but the output isnt’ friendly in the least - not from the
perspective of a user trying to learn a library, nor is it useful to
computers because it cannot be parsed - basically it’s just vomiting
stats and backtraces to the console that are hard for people to read
and hard for computers to read. surely i am not the only one that
sometimes resorts to factoring out a failing test in a separate
program because test/unit and rspec output is too messy to play nice
with instrumenting code? and that’s not even getting to what they do
with at_exit exception raising…

I think I’d also miss the ability to have setup and teardown before
each
test (something which ‘shoulda’ makes very simple and effective).

yeah that’s on deck for sure. i do really like contexts with
shoulda. but still

cfp:/opt/local/lib/ruby/gems/1.8/gems/thoughtbot-shoulda-2.9.1 > find
lib/ -type f|xargs -n1 cat|wc -l
3910

if we accept the research and assume that bugs scale linerarly with
the # of lines of code this is not good for robustness. this is one
of my main gripes with current ruby testing - my current rails app has
about 1000 lines of code and 25,000 lines of testing framework!

Please don’t get me wrong - I’m absolutely interested in something
which
will make testing simpler and easier, if I can understand how to use
it
effectively.

feedback is awesome - this is an new idea i’m just fleshing out so i
really appreciate it.

cheers.

a @ http://codeforpeople.com/

On Sun, Mar 29, 2009 at 11:33 PM, ara.t.howard [email protected]
wrote:

that aside
Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it’s really a great idea.

On Mar 29, 2009, at 2:44 PM, Lake D. wrote:

You see, the output does not contain the second Testy.testing()

my stupidity. i hadn’t considered having more that one in a file :wink:
actually a bit tricky with the exit status issue - but i’ll roll out a
fix.

/me hangs head

a @ http://codeforpeople.com/

ara.t.howard wrote:

On Mar 29, 2009, at 1:29 PM, Phlip wrote:

What, besides instantly fix it (or revert) do you want to do with an
error message from a broken test?

report it in your ci tool

Don’t integrate a broken build!

Are you implying that an incremental build server, such as
CruiseControl, should
have enough errors it needs to count and report them?

Sean O’Halpin wrote:

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it’s really a great idea.

Tx, but it sucks! Here’s the ideal test case (regardless of its .should
or it{}
syntax:

test_activate
x = assemble()
g = x.activate()
g == 42
end

The test should simply reflect the variables and values of everything
after the
activate line. Anything less is overhead. I only want to spend my time
setting
up situations and equating their results. No DSLs or reflection or
anything!

Now, why can’t our language just /do/ that for us? It knows everything
it needs…

On Mar 29, 2009, at 5:01 PM, Sean O’Halpin wrote:

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it’s really a great idea.

ah - reports via the binding i assume - that is a nice idea.

a @ http://codeforpeople.com/

On Mon, Mar 30, 2009 at 07:22:35AM +0900, ara.t.howard wrote:

if we accept the research and assume that bugs scale linerarly with the #
of lines of code this is not good for robustness. this is one of my main
gripes with current ruby testing - my current rails app has about 1000
lines of code and 25,000 lines of testing framework!

This, of course, just means you are not writing big enough rails apps
:-).

ara.t.howard wrote:

On Mar 29, 2009, at 5:01 PM, Sean O’Halpin wrote:

Phlip is referring to his own assert2, not the Test::Unit one. You
should check it out - it’s really a great idea.

ah - reports via the binding i assume - that is a nice idea.

If only the binding were enough…

You gotta use Ripper to convert the block to opcodes, then eval them one
by one.

On Mar 29, 2009, at 5:47 PM, Jeremy H. wrote:

This, of course, just means you are not writing big enough rails
apps :-).

doh! i knew i was doing it wrong!

a @ http://codeforpeople.com/

On Mar 29, 2009, at 5:14 PM, Phlip wrote:

Don’t integrate a broken build!

Are you implying that an incremental build server, such as
CruiseControl, should have enough errors it needs to count and
report them?

i’m saying that whatever you are doing with automated tests, be it ci
or simply editor support, it’s vastly easier for people to build tools
if the output is parsed in one line of YAML.load - that’s all.

cheers.

a @ http://codeforpeople.com/

On Mar 29, 2009, at 6:04 PM, Phlip wrote:

If only the binding were enough…

You gotta use Ripper to convert the block to opcodes, then eval them
one by one.

dude - that is hardcore :wink:

a @ http://codeforpeople.com/

Jeremy H. wrote:

On Mon, Mar 30, 2009 at 07:22:35AM +0900, ara.t.howard wrote:

if we accept the research and assume that bugs scale linerarly with the #
of lines of code this is not good for robustness. this is one of my main
gripes with current ruby testing - my current rails app has about 1000
lines of code and 25,000 lines of testing framework!

I incredibly don’t understand that. Our apps are big by Rails standards,
yet our
test:code ratio is only 2.5:1. That’s mostly because we can’t “refactor”
the
tests too much, or they are hard to read.

From: “ara.t.howard” [email protected]

On Mar 29, 2009, at 6:04 PM, Phlip wrote:

If only the binding were enough…

You gotta use Ripper to convert the block to opcodes, then eval them
one by one.

dude - that is hardcore :wink:

I think some examples might help. These are from Phlip’s
January 26, 2008 post Re: assert_{ 2.0 } on the XP list:

assert_{ ‘a topic’ == topics[‘first’] } output:

“a topic” == ( topics[“first”] ) → false
topics → {“first”=>“wrong topic”}
( topics[“first”] ) → “wrong topic”.

assert_{ [false] == shrew.accessories.map(&:active).uniq } output:

[true] == shrew.accessories.map(&:active).uniq → false
shrew → User 4
shrew.accessories → [Accessory 501]
shrew.accessories.map(&:active) → [false]
shrew.accessories.map(&:active).uniq → [false].

It’s badass.

Phlip’s work on assert_{ 2.0 } immediately left me with two
impressons:

  1. Having seen this, why would we ever accept anything less
    awesome from any testing rig? :slight_smile:

  2. The facilities needed to implement this kind of reflection
    should be part of ruby core.

…right? :slight_smile:

Regards,

Bill

On Mar 29, 2009, at 6:30 PM, Bill K. wrote:

…right? :slight_smile:

yes. it addresses one of my biggest issues with the current
standards: too many assertion methods and crappy reporting.

a @ http://codeforpeople.com/

From: “Bill K.” [email protected]

assert_{ [false] == shrew.accessories.map(&:active).uniq } output:

[true] == shrew.accessories.map(&:active).uniq → false
shrew → User 4
shrew.accessories → [Accessory 501]
shrew.accessories.map(&:active) → [false]
shrew.accessories.map(&:active).uniq → [false].

Er, sorry, I think I failed at reconstructing that from
the original email.

Presumably should be:

assert_{ [true] == shrew.accessories.map(&:active).uniq } output:

[true] == shrew.accessories.map(&:active).uniq → false
shrew → User 4
shrew.accessories → [Accessory 501]
shrew.accessories.map(&:active) → [false]
shrew.accessories.map(&:active).uniq → [false].

Regards,

Bill

On Mar 29, 2009, at 6:09 PM, Phlip wrote:

I incredibly don’t understand that. Our apps are big by Rails
standards, yet our test:code ratio is only 2.5:1. That’s mostly
because we can’t “refactor” the tests too much, or they are hard to
read.

i’m not talking about code to test ratios, but code to test
framework ratios. of course these numbers are a little bit high
(whitespace, comments, etc) but

cfp:/opt/local/lib/ruby/gems/1.8/gems > for gem in thoughtbot-* rspec*
faker* mocha*;do echo $gem;find $gem/lib -type f|grep .rb|xargs -n1
cat|wc -l;done
thoughtbot-factory_girl-1.2.0
937
thoughtbot-shoulda-2.9.1
3854
rspec-1.1.12
8773
rspec-1.1.3
7785
rspec-1.1.4
8083
faker-0.3.1
299
mocha-0.9.5
3294

most people would consider an 8000 line rails app ‘large’.

a @ http://codeforpeople.com/

ara.t.howard wrote:

thoughtbot-factory_girl-1.2.0
937
thoughtbot-shoulda-2.9.1
3854
rspec-1.1.12
8773

Well, the good news is Ruby is essentially a “post-Agile” environment.
Your boss
can no longer fire you for writing “too many” unit tests. Many
environments,
even in this day and age, have not matured so far!

The other good news is our enthusiasm has made us forget many of the
original
Agile ground rules. TDD requires rapid feedback over tiny code changes,
close to
the tested code, and setting up a Cucumber suite with a handful of mock
objects
does not qualify as rapid feedback, running close to the code. Mocks are
indeed
a way to help your code’s design couple together. So BigAgileUpFront,
Ruby-style, might indeed be a big risk here.

If it isn’t, we have to learn from it, because Ruby’s BDD community
represents
an amazing sight. The original BDD, FIT and Fitnesse, required hordes of
overpaid consultants to install, typically at big-iron sites. Ruby BDD,
by
contrast, is just as light, free, and community-supported as Rails
itself, and I
suspect it’s the leading edge of “customer testing” research in general.

You gotta use Ripper to convert the block to opcodes, then eval them
one by one.
dude - that is hardcore :wink:

Who was just fussing about “too many lines of code in the framework”?

  1. The facilities needed to implement this kind of reflection
    should be part of ruby core.

Right:

report = journal do
my_code()
end

Inside journal{}, everything the Ruby VM does gets packed into a report.
Then
you rip the report to get back to your details. That’s what assert{ 2.0
}
can’t do, so it goes in two passes.

The first pass is a normal block.call, to detect success or failure.
This is so
any bugs in the second pass don’t throw the test run.

The second pass puts the block into Ripper to decompile it, and then the
fun starts.

On Mar 29, 2009, at 7:29 PM, Phlip wrote:

Who was just fussing about “too many lines of code in the framework”?

i meant hardcore in a good way - really! :wink:

a @ http://codeforpeople.com/