Scenarios on production data

johanatan · September 4, 2008, 8:12pm

I’m just thinking out loud here…
It could be useful to have a way to run scenarios on a copy of a
fully populated production database, as an alternative to normal use.
Not sure how that’d work, maybe replace the Given’s but leave the
Whens and Thens?

linoj

johanatan · September 5, 2008, 5:57pm

On 4 Sep 2008, at 18:55, Jonathan L. wrote:

I’m just thinking out loud here…
It could be useful to have a way to run scenarios on a copy of a
fully populated production database, as an alternative to normal use.
Not sure how that’d work, maybe replace the Given’s but leave the
Whens and Thens?

Hi Jonathan

Every time someone asks me this my answer is always the same…

Don’t. Determine what class of issue is being exposed by your
production database, distil it into suitable stories and specs, fix
the code (migrating as necessary), then deploy to a staging
environment running off a recent production backup DB.

Trying to run tests against production database risks blurring the
line between the well specified behaviour of your app and the pile of
crap users inevitably fill it with. IMHO.

Ashley

–
http://www.patchspace.co.uk/

johanatan · September 5, 2008, 6:31pm

On Sep 5, 2008, at 11:50 AM, Ashley M. wrote:

Ashley

thanks, i agree. I probably would not use it to diagnose a problem.
Rather to ferret out any problems I might not know about.
That is, if my stories run with my well controlled, relatively small
setups, I’d like to ensure they run on a large, fully populated,
somewhat ‘random’ set of real data.

johanatan · September 5, 2008, 7:13pm

Jonathan L. wrote:

Whens and Thens?
Trying to run tests against production database risks blurring the

What you are describing sounds a lot like fuzzing… Have you checked
out tarantula yet?

-Ben

johanatan · September 5, 2008, 7:44pm

On Fri, Sep 5, 2008 at 9:11 AM, Jonathan L.
[email protected]wrote:

That is, if my stories run with my well controlled, relatively small
setups, I’d like to ensure they run on a large, fully populated, somewhat
‘random’ set of real data.

If you think there are situations where that approach will discover
bugs,
then I’d agree with Ashley that you ought to define those situations
specifically in your tests. But if you just want to throw a bunch of
random
data at them, then I’d expect your discoveries will also be random. That
wouldn’t give me the degree of confidence to make it worth the effort.

Are you maybe looking for performance testing?

///ark

johanatan · September 5, 2008, 10:12pm

On 2008-09-05, at 14:14, Jonathan L. wrote:

it just seems to me that while i’m running my tests on toy data
sets, it could be reassuring to see the same run on real production
data. Or, maybe not.
I agree its not real scientific, but neither is the weather…
(aka, shit happens)

I know what you mean about it feeling reassuring to know that your
tests/specs passed when run on production data. However, if your
specs, scenarios, tests, etc cover all behaviours, situations, edge
cases, etc, then you needn’t worry. All is well.

You don’t truly control what’s in your production data. Thus, you’ll
be testing random things, and will obtain random results. While said
results should pass (provided that your specs, scenarios, tests, etc
are complete and all pass), the results won’t give you any sort of
quantifiable coverage other than “this set of production data passed”.
Is that actually useful?
-Nick

johanatan · September 6, 2008, 9:16am

On 5 Sep 2008, at 19:14, Jonathan L. wrote:

it just seems to me that while i’m running my tests on toy data
sets, it could be reassuring to see the same run on real production
data. Or, maybe not.
I agree its not real scientific, but neither is the weather…
(aka, shit happens)

Can you give us a concrete example of something that you’re worried
might go wrong?

cheers,
Matt

johanatan · September 5, 2008, 8:18pm

On Sep 5, 2008, at 1:27 PM, Mark W. wrote:

throw a bunch of random data at them, then I’d expect your
discoveries will also be random. That wouldn’t give me the degree
of confidence to make it worth the effort.

Are you maybe looking for performance testing?

///ark

rspec-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/rspec-users

it just seems to me that while i’m running my tests on toy data
sets, it could be reassuring to see the same run on real production
data. Or, maybe not.
I agree its not real scientific, but neither is the weather…
(aka, shit happens)

johanatan · September 6, 2008, 10:16am

On Sep 5, 2008, at 3:11 PM, Matt W. wrote:

might go wrong?
Well, actually linoj ran into one regarding gems installed on the
production box. For some reason ruby2ruby was being required, which
redefines method_missing on nil to return nil - meaning that if a
method on nil got called, he wouldn’t see it - more then likely, a
template would just display “nil” instead of raising an error.

Scott

johanatan · September 8, 2008, 8:38pm

2008/9/8 Jonathan L. [email protected]

button
That sounds like it might be a useful feature to implement - say export
project (and all its dependencies) and a corresponding import project
feature. Then you could export a project from the production environment
and
have it as canned data to pre-populate your project scenarios. Also your
users could then create offline backups of their projects.

That would be specific to your application though, and not an rspec
feature.
Or am I missing something?

Cheers,
Dan

johanatan · September 8, 2008, 6:24pm

On 4 Sep 2008, at 18:55, Jonathan L. wrote:

I’m just thinking out loud here…
It could be useful to have a way to run scenarios on a copy of a
fully populated production database, as an alternative to normal use.
Not sure how that’d work, maybe replace the Given’s but leave the
Whens and Thens?

On Sep 5, 2008, at 3:11 PM, Matt W. wrote:

Can you give us a concrete example of something that you’re worried
might go wrong?

Here’s one example: lets say my app is a specialized CMS, where
account owners can setup their own projects, pages and forms. I’d
like to run scenarios against setups that users have created.
Thinking further, wouldn’t it be neat to make this a user feature, eg
via “Validate This Project” button

johanatan · September 8, 2008, 8:49pm

On Sep 8, 2008, at 12:35 PM, Dan N. wrote:

That sounds like it might be a useful feature to implement - say
export project (and all its dependencies) and a corresponding
import project feature. Then you could export a project from the
production environment and have it as canned data to pre-populate
your project scenarios. Also your users could then create offline
backups of their projects.

That would be specific to your application though, and not an rspec
feature. Or am I missing something?

right, i’m not looking for an rspec feature, just suggestions how to
manage running against production data. Export/import utilities may
be the way to go. Or maybe export to yaml, and run stories with those
as fixtures?

johanatan · September 8, 2008, 10:01pm

2008-09-05 16:50, Ashley M.:

Every time someone asks me this my answer is always the same…

Don’t.

I think I happen have a case where it would be plausible to run
specs/stories against production data.

I need a sanity checker to detect “data smells” of various magnitude
from (a copy of) our production data. Not all the fumbles are
detectable at action time using validations. Not all that needs to be
or has to be allowed is sane or does any good. Also the app has come
a long way and the validity checking has not always been as good as it
is now, and will most certainly still improve over time.

That would of course be using RSpec for something else than testing,
but would it work? Would it be kosher? And most importantly how
would I do that?

rake db:what?
rake db:whatelse?
rake spec:smells

Trying to run tests against production database risks blurring the
line between the well specified behaviour of your app and the pile
of crap users inevitably fill it with. IMHO.

That’s exactly why you should not test against production data.
However, could RSpec be handy tool to build a gadget to point out the
poop?

johanatan · September 8, 2008, 10:51pm

2008-09-05 16:10, Nick H.:

I know what you mean about it feeling reassuring to know that your
tests/specs passed when run on production data. However, if your
specs, scenarios, tests, etc cover all behaviours, situations, edge
cases, etc, then you needn’t worry. All is well.

I think you are overlooking the often stressed point of tests. You
are supposed to gain confidence on your codebase by testing. If you
have covered everything you can think of and still have the gut
feeling that some end user voodoo hiding in production environment
will kill your mighty app, why not run tests against production data
too?

You know very well, that if you have that icky feeling, it’s not gonna
go away if somebody says “you needn’t worry”.

Of course, if you hit something in production data, you should
immediately isolate and reproduce the case in your tests and then go
debugging. That way you aren’t actually using production data as test
data, but using it to develop the test data. And isn’t that
(reproducing a crack in production env into tests) just what the
Laziness[1] is for too?

[1] http://agilewebdevelopment.com/plugins/laziness

johanatan · September 9, 2008, 4:38pm

On 9 Sep 2008, at 14:54, Ashley M. wrote:

should just take (very) regular backups, and when a problem occurs,
the ones that real data doesn’t exist for. So maybe being able to
can always go in and prod it if I’m not sure it’s working right live.

Sometimes it feels like waste, but then insurance products have an
expected net loss, and people still consider them valuable.

Ashley

–
http://www.patchspace.co.uk/
http://aviewfromafar.net/

The great thing about using tools like RSpec is that you write a lot
less bugs in your code, freeing testers up to do exploratory testing
[1] which is where you find the sort of issues I think Jonathan is
worried about.

[1]Exploratory testing - Wikipedia

cheers,
Matt

http://blog.mattwynne.net

In case you wondered: The opinions expressed in this email are my own
and do not necessarily reflect the views of any former, current or
future employers of mine.

johanatan · September 9, 2008, 4:21pm

On 8 Sep 2008, at 17:21, Jonathan L. wrote:

Here’s one example: lets say my app is a specialized CMS, where
account owners can setup their own projects, pages and forms. I’d
like to run scenarios against setups that users have created.
Thinking further, wouldn’t it be neat to make this a user feature,
eg via “Validate This Project” button

The more I think about it, the more I come to the conclusion you
should just take (very) regular backups, and when a problem occurs,
isolate the situation.

Dan’s export suggestion would be really useful here. In case of data
loss, you could fire up a backup version of the app, export missing/
corrupt pages, and re-import into the production environment.

This would encourage you to define a robust data format too. It
wouldn’t directly lead you to catching edge cases, but it will give
you more confidence about recovering from them. And anyway, if your
tests-on-real-data throw up edge cases, they will still miss the ones
that real data doesn’t exist for. So maybe being able to say “well,
I’m only 98% sure your data is safe, but I’m 99.9% sure I can get it
back if there’s a problem”, is more reassuring than hoping you covered
everything.

I’ve found that testing can sometimes drive out new features from an
app. The one I’m starting to work on now, for example, is a daemon
process that sits and polls some RSS and XML services. But to test it
I’ve been driven to start writing a (crude, right now) socket-based
remote control and corresponding client interface. I can’t say if it
will ever be used, but it’s reassuring to know I can always go in and
prod it if I’m not sure it’s working right live.

Sometimes it feels like waste, but then insurance products have an
expected net loss, and people still consider them valuable.

Ashley

–
http://www.patchspace.co.uk/

Scenarios on production data

cheers, Matt

cheers,
Matt