How to run unit tests w/o first blowing away test db?

fearless_fool · July 13, 2010, 4:08am

The documentation on fixtures makes a tantalizing claim:

| If you preload your test database with all fixture data
| (probably in the Rakefile task) and use transactional
| fixtures, then you may omit all fixtures declarations
| in your test cases since all the dataâ€™s already there
| and every case rolls back its changes.

Yup! That’s exactly what I want: I have some LARGE constant tables that
are used pervasively, so I want to load them ONCE and exploit the
efficiency of transactional fixtures thereafter. I know I can preload
the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded
data?

ff

[some details…]

I know I can’t simply use ‘rake test:units’, defined as:

Rake::TestTask.new(:units => “db:test:prepare”) do |t|
t.libs << “test”
t.pattern = ‘test/unit/**/*_test.rb’
t.verbose = true
end
Rake::Task[‘test:units’].comment = “Run the unit tests in test/unit”

The dependency on “db:test:prepare” clears the database. I could define
a rake task “test:units_without_blowing_away_my_data_please” that
eliminates that dependency, or perhaps better, a define a task that
loads the fixtures before running the unit tests.

But the comment in the fixture documentation makes me think that there’s
an easier way. Is there?

fearless_fool · July 13, 2010, 5:23am

Fearless F. wrote:

The documentation on fixtures makes a tantalizing claim:

| If you preload your test database with all fixture data
| (probably in the Rakefile task) and use transactional
| fixtures, then you may omit all fixtures declarations
| in your test cases since all the dataâ€™s already there
| and every case rolls back its changes.

Yup! That’s exactly what I want: I have some LARGE constant tables that
are used pervasively, so I want to load them ONCE and exploit the
efficiency of transactional fixtures thereafter.

No you don’t. You don’t need all the large constant table data to run
your tests.

Besides, as you already know, fixtures are crap. What you want to do is
use factories to craft exactly the data you need for each individual
unit test. This will normally only be a few records at a time.

I know I can preload
the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded
data?

You don’t. Good test isolation requires clearing the database for
every test.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 3:11pm

On Mon, Jul 12, 2010 at 23:23, Marnen Laibow-Koser
[email protected] wrote:

Fearless F. wrote:

But then how do I run unit tests while preserving my carefully preloaded
data?

You don’t. Good test isolation requires clearing the database for
every test.

Good ideal to strive for… but not absolutely necessary all the time.
99% of the time, whether in testing Rails apps or anything else,
wisdom lies in knowing where to draw the line… and it’s rarely all
the way to one side or the other.

-Dave, foolish about many things, but at least possessed of this one
nugget of wisdom

–
Specialization is for insects. | Professional: http://davearonson.com
-Robert Anson Heinlein | Programming: http://codosaur.us
-------------------------------+ Leadership: http://dare2xl.com
Have Pun, Will Babble! -me | Et Cetera: http://davearonson.net

fearless_fool · July 13, 2010, 3:46pm

On Tue, Jul 13, 2010 at 09:19, Marnen Laibow-Koser
[email protected] wrote:

Good ideal to strive for… but not absolutely necessary all the time.

Wrong! Â If the test has any potential of touching the database, it is
absolutely necessary all the time to reset the DB to a known state.
Usuallythat means clearing it.

Ah, there’s the key: “Usually”. So, the line is at least a smidgen
never-ward of always. I’ll grant you at least a few “nines” of
always-ness, but it’s still not 100%. Except by rounding.

If a given database (or table or set of tables) is absolutely
constant, so that it will not be added to or subtracted from during
the course of any test, only referenced, as seems to be the case in
FF’s situation, then it is (IMVHO) sufficient to only clear-and-load
it at the start of the entire suite, rather than incur the penalties
of doing so on every test. Perhaps that could even be enforced (at
least for the duration of the tests) with some triggers.

On the other claw, if there aren’t a large number of tests, or he can
make do with a much smaller (i.e., faster to clear and load) database,
or the database clearing and loading can be done by a much faster
mechanism (such as pointing a symlink to a different set of files?),
the point may be moot…

-Dave

–
Specialization is for insects. | Professional: http://davearonson.com
-Robert Anson Heinlein Â Â Â Â | Programming: Â http://codosaur.us
-------------------------------+ Leadership: Â http://dare2xl.com
Have Pun, Will Babble! Â -me Â Â | Et Cetera: Â Â http://davearonson.net

fearless_fool · July 13, 2010, 4:14pm

Dave A. wrote:

On Tue, Jul 13, 2010 at 09:19, Marnen Laibow-Koser
[email protected] wrote:

Good ideal to strive for… but not absolutely necessary all the time.

Wrong! Â If the test has any potential of touching the database, it is
absolutely necessary all the time to reset the DB to a known state.
Usuallythat means clearing it.

Ah, there’s the key: “Usually”. So, the line is at least a smidgen
never-ward of always. I’ll grant you at least a few “nines” of
always-ness, but it’s still not 100%. Except by rounding.

Setting the database to a known state must be 100%. Clearing (as
opposed to resetting to known constant data) may be only 99.999…%.

If a given database (or table or set of tables) is absolutely
constant, so that it will not be added to or subtracted from during
the course of any test, only referenced, as seems to be the case in
FF’s situation,

But there is no way to guarantee that the “constant” table is in fact
constant, unless we get into using permissions in the DB (which will
probably cause more problems than they solve). The values in it could
get accidentally changed at any time.

then it is (IMVHO) sufficient to only clear-and-load
it at the start of the entire suite, rather than incur the penalties
of doing so on every test. Perhaps that could even be enforced (at
least for the duration of the tests) with some triggers.

That is only true insofar as the table can be guaranteed read-only,
which it usually can’t be. Transactional tests will probably work here,
though.

On the other claw, if there aren’t a large number of tests, or he can
make do with a much smaller (i.e., faster to clear and load) database,

The number of tests has nothing to do with it. I routinely run fairly
large test suites with a small database (by using factories, so that
each test only has a few records in the DB while it runs).

or the database clearing and loading can be done by a much faster
mechanism (such as pointing a symlink to a different set of files?),

What, set up multiple test databases? Yuck. (Of course, it’s entirely
possible that the DBMS is doing this and abstracting it anyway.)

the point may be moot…

-Dave

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 3:19pm

Dave A. wrote:

On Mon, Jul 12, 2010 at 23:23, Marnen Laibow-Koser
[email protected] wrote:

Fearless F. wrote:

But then how do I run unit tests while preserving my carefully preloaded
data?

You don’t. ï¿½Good test isolation requires clearing the database for
every test.

Good ideal to strive for… but not absolutely necessary all the time.

Wrong! If the test has any potential of touching the database, it is
absolutely necessary all the time to reset the DB to a known state.
Usuallythat means clearing it.

99% of the time, whether in testing Rails apps or anything else,
wisdom lies in knowing where to draw the line… and it’s rarely all
the way to one side or the other.

-Dave, foolish about many things, but at least possessed of this one
nugget of wisdom

–
Specialization is for insects. | Professional: http://davearonson.com
-Robert Anson Heinlein ï¿½ ï¿½ ï¿½ ï¿½ | Programming: ï¿½http://codosaur.us
-------------------------------+ Leadership: ï¿½ http://dare2xl.com
Have Pun, Will Babble! ï¿½-me ï¿½ ï¿½| Et Cetera: ï¿½ ï¿½http://davearonson.net

fearless_fool · July 13, 2010, 7:23pm

On Tue, Jul 13, 2010 at 10:45, Colin L. [email protected] wrote:

So what happens if there is a bug that accidentally writes to that
table during a test?

Reread last sentence.

-Dave

–
Specialization is for insects. | Professional: http://davearonson.com
-Robert Anson Heinlein | Programming: http://codosaur.us
-------------------------------+ Leadership: http://dare2xl.com
Have Pun, Will Babble! -me | Et Cetera: http://davearonson.net

fearless_fool · July 13, 2010, 4:47pm

On 13 July 2010 14:45, Dave A. [email protected]
wrote:

…
If a given database (or table or set of tables) is absolutely
constant, so that it will not be added to or subtracted from during
the course of any test, only referenced, as seems to be the case in
FF’s situation, then it is (IMVHO) sufficient to only clear-and-load
it at the start of the entire suite, rather than incur the penalties
of doing so on every test. Perhaps that could even be enforced (at
least for the duration of the tests) with some triggers.

So what happens if there is a bug that accidentally writes to that
table during a test?

Colin

fearless_fool · July 13, 2010, 7:34pm

Dave A. wrote:

On Tue, Jul 13, 2010 at 10:45, Colin L. [email protected] wrote:

So what happens if there is a bug that accidentally writes to that
table during a test?

Reread last sentence.

I can’t. You didn’t quote it.

Are you referring to the bit about the triggers? If so, do you really
think that that’s any less work than using transactions or clearing the
database? (Hint: it isn’t. Triggers sound great in theory, but they
are a pain in all implementations that I’m aware of.)

-Dave

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 7:54pm

Dave A. wrote:
[…]

It seemed to me
that the OP’s objective was to avoid reloading the database for each
test, which isn’t any more human work than doing it once (at least,
having it in the standard setup section versus each test). Hence his
objective is probably speed, which is also important for tests, in
order to run them frequently. Having a trigger, or constraint, or
other such mechanism, on something that isn’t supposed to happen, is
bound to be faster than frequently reloading a huge database.

Faster than reloading? Perhaps. Faster than rolling back a
transaction? Doubtful. Faster than truncating a bunch of tables and
writing tests so they don’t involve lots of unnecessary records? Almost
certainly not.

(Yes, I said “unnecessary”. You really don’t need the entire constant
table for tests.)

-Dave

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 9:11pm

In the spirit of peace love and understanding, I’ll start by saying that
I appreciate Marnen’s philosophy, and I agree that “if the test has any
potential of touching the database, it is absolutely necessary all the
time to reset the DB to a known state.”

But in my application, I have a few huge tables that are used
exclusively as reference data, are required for running any meaningful
tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10
years (that’s 85K+ rows). Each row is a “highly decorated TimeStamp”
which maps a timestamp to day of week as a string, whether or not it’s a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test. In comparison to
loading the table, MySQL’s rollback mechanism is breathtakingly fast.

[Before you ask why the heck a giant table is preferable to simply using
the functions associated with Date and Time, I’ll refer to you to:
Bộ công cụ Kho dữ liệu, Phiên bản thứ 3 - Ralph Kimball
Data Warehousing
]

Apart from the large “readonly” tables, I avoid preloaded fixtures for
all the reasons Marnen and others have pointed out: they’re “brittle”,
they tend obscure tests by separating “expected” and “observed” results.
Etc.

But in this case, it really makes sense to preload the db with large
static tables. The alternative – reloading the tables for every test
– would not give me more insights about the business logic in my
application, and would be prohibitively slow.

Peace out.

ff

fearless_fool · July 13, 2010, 7:47pm

On Tue, Jul 13, 2010 at 13:34, Marnen Laibow-Koser
[email protected] wrote:

I can’t. You didn’t quote it.

Eh? It’s quoted in the copy I have here…

Are you referring to the bit about the triggers? If so, do you really
think that that’s any less work than using transactions or clearing the
database? (Hint: it isn’t. Triggers sound great in theory, but they
are a pain in all implementations that I’m aware of.)

Work is not the only dimension being considered here. It seemed to me
that the OP’s objective was to avoid reloading the database for each
test, which isn’t any more human work than doing it once (at least,
having it in the standard setup section versus each test). Hence his
objective is probably speed, which is also important for tests, in
order to run them frequently. Having a trigger, or constraint, or
other such mechanism, on something that isn’t supposed to happen, is
bound to be faster than frequently reloading a huge database.

-Dave

–
Specialization is for insects. | Professional: http://davearonson.com
-Robert Anson Heinlein | Programming: http://codosaur.us
-------------------------------+ Leadership: http://dare2xl.com
Have Pun, Will Babble! -me | Et Cetera: http://davearonson.net

fearless_fool · July 13, 2010, 9:30pm

Fearless F. wrote:

In the spirit of peace love and understanding, I’ll start by saying that
I appreciate Marnen’s philosophy, and I agree that “if the test has any
potential of touching the database, it is absolutely necessary all the
time to reset the DB to a known state.”

But in my application, I have a few huge tables that are used
exclusively as reference data, are required for running any meaningful
tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10
years (that’s 85K+ rows). Each row is a “highly decorated TimeStamp”
which maps a timestamp to day of week as a string, whether or not it’s a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test.

But you don’t need to do that. For any given test, I’ll bet that you
don’t need more than 2 or 3 TimeDimension records. And you can create
those on the fly. You don’t need 85,000 records of data to test with.

Actually, you probably don’t need them to deploy with either, but that’s
another issue.

[…]

[Before you ask why the heck a giant table is preferable to simply using
the functions associated with Date and Time, I’ll refer to you to:
Bộ công cụ Kho dữ liệu, Phiên bản thứ 3 - Ralph Kimball
Data Warehousing
]

Skimmed the Greenspun. I don’t buy it at all, particularly not for date
information. You may need fact tables for some things, but your
TimeDimension seems to be along the lines of
| x | y | sum |
| 1 | 1 | 2 |
| 1 | 2 | 3 |

But I don’t want to lose the primary topic here…

[…]

But in this case, it really makes sense to preload the db with large
static tables.

No, it really doesn’t. It really only makes sense to create a few
representative records for your actual tests.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 9:40pm

Fearless F. wrote:
[…]

It would take a
LONG time to set it up and tear it down for each test. In comparison to
loading the table, MySQL’s rollback mechanism is breathtakingly fast.

Oh, yeah, one other thing. For a whole variety of performance and
maintainability reasons, you probably should not be using mySQL for
something like this. Check out PostgreSQL.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 13, 2010, 10:00pm

On Jul 13, 2010, at 3:11 PM, Fearless F. wrote:

tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour
for 10
years (that’s 85K+ rows). Each row is a “highly decorated TimeStamp”
which maps a timestamp to day of week as a string, whether or not
it’s a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test. In
comparison to
loading the table, MySQL’s rollback mechanism is breathtakingly fast.

I think I recall this kind of question being answered before. The
gist was that you can define a ‘facts’ section of your database.yml
(similar to your ‘development’ or ‘test’ section) and create the
connection to the special ‘facts’ database for that model. It would
be unaffected by the initialization of the ‘test’ database.

The implementation is something like:

class Fact < ActiveRecord::Base
establish_connection “facts” if
RAILS_ENV == ‘test’ && configurations.has_key?(‘facts’)
end

class TimeDimension < Fact

…

end

Then uses of TimeDimension in the test environment will use the
explicitly established connection. (Of course, you could do the same
thing regardless of environment, too.)

-Rob

they tend obscure tests by separating “expected” and “observed”

ff
Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

fearless_fool · July 13, 2010, 10:40pm

Rob B. wrote:

class Fact < ActiveRecord::Base
establish_connection “facts” [snip]
end

class TimeDimension < Fact
end

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

I like it! That’s an elegant approach, and I’m putting that on the
refactoring TODO list.

Many thanks.

ff

fearless_fool · July 13, 2010, 11:34pm

On Jul 13, 2010, at 4:58 PM, Marnen Laibow-Koser wrote:

difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000
records. Just create what you need for a particular test.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Perhaps you missed the part about this being a data warehouse. It’s
probably already a separate database. Technically, it’s just a
separate schema within a database. And why does this have to be
different from production? If this is truly read-only data, then
production or any environment can use exactly the same technique. (And
multiple environments could use one golden copy of that reference
data, too.)

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

fearless_fool · July 13, 2010, 10:58pm

Rob B. wrote:
[…]

I think I recall this kind of question being answered before. The
gist was that you can define a ‘facts’ section of your database.yml
(similar to your ‘development’ or ‘test’ section) and create the
connection to the special ‘facts’ database for that model. It would
be unaffected by the initialization of the ‘test’ database.

What, create a second database for that data? Why? More overhead, more
difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000
records. Just create what you need for a particular test.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

fearless_fool · July 14, 2010, 12:28am

Marnen, Rob:

I agree with Rob that it makes good sense to partition constant data
into a separate db and use that consistently for development, test and
production.

Marnen, maybe I misunderstand you, but even if I knew a priori which
TimeDimension records my tests were going to use, it’s a PITA to
generate TimeDimension records piecemeal (what day did Easter fall on in
2005?)

Furthermore, I have other large constant tables such as the location of
FAA certified airports in North America, cobbled from multiple sources.
Again, it would be a pain to figure out exactly which ones my test code
will reference and then ETL their records on demand.

ANYWAY, I’m in the process of setting up a “constant” database (separate
from development, test or production) and I have questions about that.
But I’ll start a new post for that one. Watch for it…

Thanks, y’all.

ff

fearless_fool · July 14, 2010, 12:52am

Fearless F. wrote:
[…]

Marnen, maybe I misunderstand you, but even if I knew a priori whichÂ
TimeDimension records my tests were going to use,

Uh…what? Â How can you not know a priori how your tests should
behave? Â If you don’t know the results you expect a priori, then you
can’t write proper tests.Â

It sounds like there’s something wrong, or at least strange, with your
testing methodology. Â Can you post some example test code so I can see
how you’re using these records?

it’s a PITA toÂ
generate TimeDimension records piecemeal (what day did Easter fall on inÂ
2005?)

Who cares? Â Just generate a TimeDimension record that has the properties
you need for the particular test. Â If that means saying Easter fell in
December, that’s perfectly OK.Â

Â
Furthermore, I have other large constant tables such as the location ofÂ
FAA certified airports in North America, cobbled from multiple sources.Â
Again, it would be a pain to figure out exactly which ones my test codeÂ
will reference and then ETL their records on demand.

You shouldn’t need to ever do that in your tests. Â Just create fake data
that has the properties you want to test.Â

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Sent from my iPhone