Iterating over sub arrays

James_H.on · May 17, 2010, 12:19am

Hey folks,

I feel like there’s a really, really simple solution to this problem,
but I’m just missing it.

<-----code----->
array1 = [“0”, “1”, “2”, “3”]
array2 = [“jam”, “bees”, “please”]
array3 = [“6”, “7”, “8”]

array1.each do |a|
array2.each do |b|
array3.each do |c|
puts “#{a} #{b} #{c}”
end
end
end
<-----/code----->

The output of that little snippet is all of the possible iterations of
the three arrays.

I want to generalise this, such that if I have an arbitrary length array
of arrays:

arrayCollection = [[“0”, “1”, “2”, “3”], [“jam”, “bees”, “please”],
[“6”, “7”, “8”]]

I can get the same output.

In my mind’s eye, I see:

arrayCollection.each do |subarray|
end

as the start, to get to each subarray, but I can’t see what the next
step would be.

Any pointers in the right direction?

thanks!

James

James_H.on · May 17, 2010, 12:52am

On Sunday, May 16, 2010 05:19:05 pm James H.on wrote:

end
“7”, “8”]]
Minor nit: by convention, Ruby people tend to use CamelCase for
constants
only, and underscores for variables. Call it array_collection.

Anyway, seems like one obvious way would be recursion:

def each_join array, context=[], &block
if array.length == 0
yield context
else
first = array.first
rest = array[1…array.length]
first.each do |elem|
each_join rest, context+[elem], &block
end
end
end

Not pretty, and I’m sure someone could improve it, but it works.

It does seem pretty weird, though. Out of curiosity, what do you need
this
for?

James_H.on · May 17, 2010, 2:50am

Minor nit: by convention, Ruby people tend to use CamelCase for constants
only, and underscores for variables. Call it array_collection.

When I keep to that convention, I find my code’s more readable. I can
tell what something is by looking at it. Thanks for the reminder

 each_join rest, context+[elem], &block
end
end
end

Not pretty, and I’m sure someone could improve it, but it works.

That it does, that it does. Unfortunately, I don’t understand it. If
you’ve got a minute, would you give a hand?

It looks to me like:

each_join takes three arguments. The first is an array, the second is
predefined to be an empty array, the third is a reference to a block.

First, I don’t understand why you need to pass in a reference to a
block, or why one would want to do that in general. Programming Ruby
covers it by saying that this allows a block to be treated as a Proc
object. I guess my misunderstanding, then, is what the hell a Proc
object is really used for. I’ve been putting this off, and it meant that
a simple solution escaped me, apparently. Onwards with understanding,
then!

A Proc object lets you assign a block to a variable. My naive reading of
class Proc - RDoc Documentation leads me to imagine that the
first benefit of this is that you can assign what’s essentially a method
call to a variable, almost like you’re making a new object out of the
variable. What does the object do? Take some number of arguments to its
call method, and do something with those arguments.

Why would you want to use this? I don’t know. But it looks like that’s
all it does.

And what does this have to do with each_join?

Okay. The second argument, context. I’m not sure what it’s doing. It’s
an empty array which is handed back to the block when the length of the
original argument array is zero. I don’t see how the argument array ever
gets set to zero, though.

Okay, here goes my explanation:

Why is &block passed in? So that on successive calls to each_join, the
block is correctly associated with the method.

At the if statement, array.length is non-zero. As such, we go to the
else clause.
The first sub-array is assigned to the variable “first”. The other
subarrays are assigned to the variable “rest”.
So in the case of an array of arrays:
[[“bacon”, “time”, “ostrich”], [“jam”, “bees”, “please”], [“1”, “2”,
“3”]]

After the first pass:

first = [“bacon”, “time”, “ostrich”]
rest = [[“jam”, “bees”, “please”], [“1”, “2”, “3”]]

And then you start iterating over first.

first.each do |elem| #first time around, elem == “bacon”
each_join rest, context+[elem], &block

So what happens now?

first = [“jam”, “bees”, “please”]
rest = [“1”, “2”, “3”]

elem is set to “jam” on the iteration over first, context becomes
[“bacon”, “jam”] and we enter the whole thing again.

Okay, the above is a nice start for any other new folks looking to trace
this, but I just finished up and my mind == blown. It took me a while to
realise that the control structure here is actually:

first.each do |elem|

and that’s what’s driving everything. For anyone else wanting to follow
along, try this out:

arrayCollection = [[“bacon”, “1”, “2”, “4”], [“jam”, “bees”, “please”],
[“6”, “7”, “8”]]

def each_join array, context=[], &block
if array.length == 0
puts “now context is #{context}”
yield context
else
first = array.first
puts “context is #{context}”
puts “first is #{first}”
rest = array[1…array.length]
puts “rest is #{rest}”
first.each do |elem|
puts “elem is #{elem}”
puts “”
each_join rest, context+[elem], &block
end
end
end

each_join(arrayCollection) do |b|
puts “-------”
puts “#{b}”
puts “-------”

end

It does seem pretty weird, though. Out of curiosity, what do you need this
for?

Okay, you gave me quite an education today, so turn about is fair play.

I have a program which accepts input and outputs output.
It has both a gui and a scripting environment.

For the scripting environment, I documented every option that you can
pass in. The options look like:

foo=bar

or
foo=1

and several can be handed in at a time.

foo=bar;jet=bam;

and so on. Each option can only be specified once, but each option can
have multiple valid values. In order to verify that every option that
can be handed in actually works as expected, I want to generate a set of
scripts from the documentation I wrote. The parser I have for the
documentation finds a series of lines of type:

foo=bar – does something fooish
foo=bam – does something else fooish

jet=bar – does something jetish
jet=bam – does something else jetish

and so on. I’m popping each option string into an array:

[[foo=bar, foo=bam], [jet=bar, jet=bam]]

And from there, generate every possible combination of all options. I’d
done everything else, but that last item on the list was killin’ me.

James_H.on · May 17, 2010, 8:44pm

On Sunday, May 16, 2010 07:49:56 pm James H.on wrote:

first = array.first

Not pretty, and I’m sure someone could improve it, but it works.

That it does, that it does. Unfortunately, I don’t understand it. If you’ve
got a minute, would you give a hand?

It looks to me like:

each_join takes three arguments. The first is an array, the second is
predefined to be an empty array, the third is a reference to a block.

The second is defaulted to be an empty array. You can override it –
and the
idea is not for you to override it, but for me to when I call that
method
recursively.

A Proc object lets you assign a block to a variable. My naive reading of
class Proc - RDoc Documentation leads me to imagine that the
first benefit of this is that you can assign what’s essentially a method
call to a variable, almost like you’re making a new object out of the
variable. What does the object do? Take some number of arguments to its
call method, and do something with those arguments.

Why would you want to use this? I don’t know. But it looks like that’s all
it does.

Well, one example would be something like Rake. I can specify a task
like
this:

task :foo do

stuff

end

Rake will then take my block and store it in some data structure, as
part of a
Task object, which tracks its dependencies and everything else. When
it’s time
to actually run that code, it digs it up and calls it.

It’s really quite useful, but you should avoid it when possible – use
yield
and block_given? instead – because it’s slightly more expensive to
actually
convert it to a proc object.

And what does this have to do with each_join?

Because the outer call to each_join never calls ‘yield’. What it’s doing
is
taking that same block and passing it to the inner calls, so that
eventually,
when you get to that first part of the ‘if’ statement (if the array is
empty,
which I should’ve written as ‘if array.empty?’), it will be calling the
same
block you passed in in the first place.

Okay. The second argument, context. I’m not sure what it’s doing. It’s an
empty array which is handed back to the block when the length of the
original argument array is zero. I don’t see how the argument array ever
gets set to zero, though.

Let’s step through it:

Basically, I’m using the Lisp idea of dealing with the first element,
and then
the rest of the array. So, look here:

first = array.first
rest = array[1…array.length]

So, if your array was [1, 2, 3],
first is 1
rest is [2, 3]

Then, look at the actual recursive call:

each_join rest, context+[elem], &block

So here, I’m calling each_join with ‘array’ set to ‘rest’, and I’m also
adding
an element from ‘first’ to the context. So the first call to each_join
has
array as [1, 2, 3]. As it gets deeper, you get a call of [2, 3], then
one with
[3], then one with []. At each step, it’s adding the current element of
first
(from that loop) to the context.

Okay, here goes my explanation:

Why is &block passed in? So that on successive calls to each_join, the
block is correctly associated with the method.

Not “successive”, but recursive.

And then you start iterating over first.

first.each do |elem| #first time around, elem == “bacon”
each_join rest, context+[elem], &block

So what happens now?

first = [“jam”, “bees”, “please”]
rest = [“1”, “2”, “3”]

And, here’s the important point, for the first iteration,

context = [“bacon”].

Then, after that entire inner loops finish, you get called again, like
this:

context = [“time”]
first = [“jam”, “bees”, “please”]
rest = [“1”, “2”, “3”]

And so on.

Let’s follow it to an inner iteration:

context = [“bacon”, “jam”]
first = [“1”, “2”, “3”]
rest = []

And finally, to the place you actually yield:

context = [“bacon”, “jam”, “1”]
array = []

I don’t know if that makes more sense – you probably saw most of this
with
your trace.

It does seem pretty weird, though. Out of curiosity, what do you need
this for?

Okay, you gave me quite an education today, so turn about is fair play.

I have a program which accepts input and outputs output.
It has both a gui and a scripting environment.
[snip]
jet=bam – does something else jetish

and so on. I’m popping each option string into an array:

[[foo=bar, foo=bam], [jet=bar, jet=bam]]

And from there, generate every possible combination of all options. I’d
done everything else, but that last item on the list was killin’ me.

Are all of them required, though? Because that’s the assumption I was
working
with. One way to avoid that might be to add an iteration to each of
these
where you don’t add that item. That is, change the inner loop to:

first.each do |elem|
each_join rest, context+[elem], &block
end

call again, without anything from this array

each_join rest, context, &block

And if you really want to balloon the number of test cases, you could
make
sure it works in any order. That is, when you use this:

each_join(array_collection) do |options|
options.permutation.each do |perm|
# do something with each permutation
end
end

Also, as a mental exercise, you could try re-implementing this without
recursion. I wouldn’t do that unless you actually start having thousands
of
items, though – I would hope you won’t run into stack overflows,
because that
would be an insane number of iterations!

Ok, I thought I’d be done, but this is just bothering me. I’m going to
make it
prettier:

class Array
def each_join
return enum_for(:each_join) unless block_given?

if empty?
  yield []
else
  array.first.each do |elem|
    first = [elem]
    array[1...array.length].each_join do |subarray|
      yield first + subarray
    end
  end
end

end
end

I think that still does the same thing. It’s probably less efficient,
but I’m
really not sure. It might be a little easier to understand, though.