Unit Testing and Instance Vars

S-gw · July 25, 2009, 1:27pm

I find myself often wondering about some design choices that result from
code which looks similar to the contrived example below.

class Places

attr_reader :addresses

def initialize(some_source)
@addresses = []
@raw_addresses = []

load_addresses(some_source)

end

def load_addresses(some_source)
some_source.each do |this_source|
@raw_addresses << this_source.address
end
clean_addresses
end

private

def clean_addresses
@raw_addresses.each do |this_address|
@addresses << GisTools.normalize_address(this_address)
end
end
end

Testing for the final results of @addresses is easy as the instance var
is exposed through the attr_reader, but there’s two problems:

A) Testing clean_addresses is dependent on running load_addresses first
in order to acquire data.

B) Seeing the intermediate results of @raw_addresses is not possible for
testing load_addresses independently because @raw_addresses is not
exposed.

To fix (A), we could insist on passing parameters. So, we would have:

def load_addresses(some_source)
some_source.each do |this_source|
@raw_addresses << this_source.address
end
clean_addresses(@raw_addresses) # <----- passed explicitly
end

private

def clean_addresses(raw_addresses) # <----- passed explicitly
raw_addresses.each do |this_address|
@addresses << GisTools.normalize_address(this_address)
end
end

That would allow clean_addresses to be tested independently and be fed
data directly by the test which could be a good thing. Regardless of
that advantage, I’ve often wondered if requiring the passing of instance
vars as parameters really is a ‘higher standard’ or not. I see a lot of
code both ways. I tend to believe it’s better to be explicit, but admit
it feels simpler in many cases to just use the hard-coded instance var.
I find myself flip flopping (always a bad thing).

Either way, I still can’t see the results of @raw_addresses. I find
myself having to expose :raw_addresses so that tests can see the
results, but there really is no need for :raw_addresses to be exposed,
and it shouldn’t be a part of the interface.

Questions:

Any consensus on the parameters issue (always pass params, or
usually)?
What’s the ‘best practice’ approach to seeing the results of
@raw_addresses?

Many thanks for the discussion.

– gw

S-gw · July 25, 2009, 1:55pm

Hi –

On Sat, 25 Jul 2009, Greg W. wrote:

private

some_source.each do |this_source|
end
Either way, I still can’t see the results of @raw_addresses. I find
myself having to expose :raw_addresses so that tests can see the
results, but there really is no need for :raw_addresses to be exposed,
and it shouldn’t be a part of the interface.

Questions:

Any consensus on the parameters issue (always pass params, or
usually)?

What’s the ‘best practice’ approach to seeing the results of
@raw_addresses?

The key word, I think, is “interface”, which in your example consists
of initialize and load_addresses. A year or so ago someone asked on
either this list or the Rails list about testing private methods, and
I said, yes, test all your private methods. David C. chimed in
with the observation that testing private methods is not generally
considered best practice; rather, the private methods are there to
help the class do what it has to do so that its public interface will
act correctly.

So I’m not sure you’d want to test clean_addresses directly. And the
fact that @raw_addresses isn’t exposed is fine. Exposing instance
variables is just an implementation technique for attributes anyway.
You should be able to move instance variables around in various ways
without changing your tests – whereas “attributes” are methods and
therefore part of the public interface.

What you really need to verify, in the large, is that calling
initialize with known input produces the expected array of addresses.
You could get more granular and unit-test load_addresses, since it’s
also part of the public API. After that, you sink below the horizon of
implementation detail.

At least, that’s my take on it.

David

S-gw · July 25, 2009, 7:54pm

David A. Black wrote:

On Sat, 25 Jul 2009, Greg W. wrote:

private

some_source.each do |this_source|
end
Either way, I still can’t see the results of @raw_addresses. I find
myself having to expose :raw_addresses so that tests can see the
results, but there really is no need for :raw_addresses to be exposed,
and it shouldn’t be a part of the interface.

Questions:

Any consensus on the parameters issue (always pass params, or
usually)?

What’s the ‘best practice’ approach to seeing the results of
@raw_addresses?

The key word, I think, is “interface”, which in your example consists
of initialize and load_addresses. A year or so ago someone asked on
either this list or the Rails list about testing private methods, and
I said, yes, test all your private methods. David C. chimed in
with the observation that testing private methods is not generally
considered best practice; rather, the private methods are there to
help the class do what it has to do so that its public interface will
act correctly.

Ah, yes. I recall reading some debates about that. When thinking about
this, I was indeed going after the test all methods stance, but I
actually like the idea of testing the interface. If changing an
implementation doesn’t break the application, it could be argued that it
shouldn’t break the tests either.

– gw

S-gw · July 25, 2009, 2:54pm

I’d agree with what David said. But I’d also point out that if you
really want to test those methods independently, you have the option of
using instance_variable_set and instance_variable_get - so you can set
up the object directly into a known state (even if the public API does
not allow you to do this), then run a method, then check the state
(ditto).

Alternatively, you can refactor into a more functional style. You got
part way with this:

def clean_addresses(raw_addresses) # <----- passed explicitly
raw_addresses.each do |this_address|
@addresses << GisTools.normalize_address(this_address)
end
end

However you could take this further to:

def clean_addresses
@addresses = do_clean_addresses(@raw_addresses)
end

def do_clean_addresses(raw_addresses)
addresses = []
raw_addresses.each do |this_address|
addresses << GisTools.normalize_address(this_address)
end
addresses
end

Now do_clean_addresses is very easy to test, and you can also
drastically simplify it using #map or #collect:

def do_clean_addresses(raw_addresses)
raw_addresses.map do |this_address|
GisTools.normalize_address(this_address)
end
end

At which point, you realise that do_clean_addresses probably doesn’t
need a unit test, because it’s just applying GisTools.normalize_address
to a collection.

Now, you’ve probably spotted the subtle API change I’ve implemented
here. Each time clean_addresses is called, @addresses is reset. In your
original API you just appended to @addresses, so that means if you
called clean_addresses more than once, @addresses would grow and grow.

It’s up to you to decide whether this is what you actually want. If you
do, you can revert to your original behaviour like this:

def clean_addresses
@addresses.concat(do_clean_addresses(@raw_addresses))
end

Maybe that’s irrelevant, since clean_addresses is a private method, and
only called once at initialization time.

But in that case, it seems pointless to set @addresses and
@raw_addresses to [] in your initialize method, before appending to them
later, when load_addresses and clean_addresses could be responsible for
initializing the instance variables too. Indeed, why not just have:

def initialize(raw_addresses)
@raw_addresses = raw_addresses
clean_addresses(@raw_addresses)
end

where clean_addresses is responsible for initializing @addresses as
described above. (Another subtle change: @raw_addresses contains the
same array as was passed in. Usually this is considered fine, but to get
the exact same semantics as you had before, you can do @raw_addresses =
raw_addresses.dup)

Another option to consider is memoisation. If clean_addresses is an
expensive computation, then you can leave it until when it is needed.

def initialize(raw_addresses = [])
@raw_addresses = raw_addresses
@addresses = nil
end

def addresses
@addresses ||= clean_addresses(@raw_addresses)
end

So ‘addresses’ looks like a read accessor for the @addresses variable,
but it also does the cleaning “on demand” when first called.

Hope this gives you a few ideas.

Regards,

Brian.

S-gw · July 25, 2009, 8:06pm

Brian C. wrote:

I’d agree with what David said. But I’d also point out that if you
really want to test those methods independently, you have the option of
using instance_variable_set and instance_variable_get

Mmm, wasn’t aware of those. Yes, that could be handy.

Alternatively, you can refactor…

Yeah, this was a quickly contrived example to show the testing
questions. I tend to like lazy loading (and therefore memoization)
approaches to things where its possible, and that would have driven me
to end up down a path similar to what you did if this were real code.

– gw