Testing without Fixtures (?)

jhearn · February 7, 2009, 1:33pm

I’m writing unit tests for a rails app whose behavior depends on the
relationships of tens of thousands of items in a DB. I need this data
in the test DB for a lot of tests. Storing this data as fixtures,
keeping it in sync with the production DB, and waiting HOURS for the
fixtures to load whenever the test suite is run are all untenable
options.

Here is what I think what I want to do:

Create a reference archive of the test DB once all the data is
loaded. This will be checked into the repo so anyone can load it
(quickly and easily with a capistrano task) prior to running the test
suite.
Trick rails into believing the data was loaded from fixtures, so
tests will behave normally. Presumably this means overriding the
standard procedure of deleting, re-inserting, and instantiating test
data that happens before each test method. I believe this could
probably be done with a test_helper method without having to modify
ActiveRecord.

Concerns:

This seems to be in violation of the Rails philosophy of “make right
things easy and the wrong things hard.”
I think this has to be a fairly common problem, but after lots of
google searching, I have found no instances of someone else trying to
do this sort of thing.

This makes me think there is a better way, or at least a different way
that someone has already implemented.

What I am looking for:

A reality check. Is this the right way to handle this problem?
Technical guidance. I am a relative novice (read: n00b) with ruby
development, so any input on things I should be aware of as I try to
figure this out would be helpful…

OR -

Sample code. If someone has already solved this problem and talked
about it somewhere, just point me in their direction.

Thanks,
John

jhearn · February 9, 2009, 7:51am

On Feb 6, 2:37 pm, jhearn [email protected] wrote:

I’m writing unit tests for a rails app whose behavior depends on the
relationships of tens of thousands of items in a DB. I need this data
in the test DB for a lot of tests. Storing this data as fixtures,
keeping it in sync with the production DB, and waiting HOURS for the
fixtures to load whenever the test suite is run are all untenable
options.

This is the first red flag - test data shouldn’t need to “keep in sync
with
the production DB”; that’s why it’s TEST DATA.

Concerns:

This seems to be in violation of the Rails philosophy of “make right
things easy and the wrong things hard.”

This does sound hard, and I think it’s the wrong thing…

Even in cases where the production DB is large, there should be a way
to trim down
the data to get test fixtures. For example, I’ve been working recently
with some code
using zipcode distances; the production DB mapping zips to coordinates
is 50k+ records.
But my fixture only contains a few zips, that match the ones needed by
the rest of the
test data set. I’d advise a similar strategy for your data set.

–Matt J.

jhearn · February 9, 2009, 6:25pm

Because of an NDA, I can’t really talk about what is in the DB or why
it all needs to be there with any specificity, but will try to explain
by analogy.

Suppose I have an app that tries to smartly generate a list for
grocery shopping. The db has a table for all the items sold by the
grocery store. There is also a table for categories of groceries, and
a join table that associates the items with the categories. Cheddar,
for instance, could be joined with the categories: cheeses, dairy,
items_that_require_refrigeration, etc. We also a have table for
different scenarios we might be shopping for, e.g. general_grocery,
camping_trip, dinner_party, christmas_baking, etc. and some items are
joined with certain scenarios.

The way this might play out in our app: We select dinner_party as our
scenario and indicate that we need to buy some cheese. The app
provides a list of cheeses to pick from. If we add camembert, perhaps
the list suggests a bottle of white wine that pairs well with the
cheese, but if we add parmesan, the list gives us the option of adding
spaghetti, pasta sauce, and/or all the raw ingredients to make our own
pasta sauce.

Suppose the scenario is camping_trip, and we add graham crackers.
Chocolate and marshmallows are added to the list automatically.
However, if we add graham crackers while shopping for general_grocery,
those items are not added.

Some of that behavior may sound a bit obnoxious for generating a
shopping list, but remember this is an analogy, and in the real app,
that sort of thing is more appropriate. The point is that the behavior
of the application is heavily dependent on the grouping relationships
of a large data set. So really, our tests need to verify the data
relationships as much as the code. I can’t really do that if I use a
different data set, or restrict the data to a smaller, more manageable
load.

jhearn · February 16, 2009, 3:22pm

On 9 Feb., 18:24, jhearn [email protected] wrote:

Suppose I have an app that tries to smartly generate a list for
grocery shopping. The db has a table for all the items sold by the
grocery store. There is also a table for categories of groceries, and
a join table that associates the items with the categories. Cheddar,
for instance, could be joined with the categories: cheeses, dairy,
items_that_require_refrigeration, etc. We also a have table for
different scenarios we might be shopping for, e.g. general_grocery,
camping_trip, dinner_party, christmas_baking, etc. and some items are
joined with certain scenarios.

There are some tools with which it is possible to trim down the data
in a way
that respects the relationships between the tables (
http://jailer.sf.net/
for instance)

jhearn · February 17, 2009, 8:58pm

Thanks, thats quite helpful.