Extraction of single subarrays from multidimensional array

okkezSS · October 26, 2010, 1:05am

Hi there,

 I am a new member of this group and a newbie about Ruby. After a

not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = [“tcg”, “agt”, “tct”, “agc”, “tca”, “tcc”], [“aaa”, “aag”],
[“ctg”, “tta”, “ctt”, “cta”, “ctc”, “ttg”]]

What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> [“tcg”, “agt”, “tct”, “agc”, “tca”, “tcc”]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

I tried iterations like:

v"#{n}"= ss.clone(n) do |n|
end
or

ss(n).each do |n|
v"#{n}" = ss.clone(n)
end

with no success.
Any help is greatly appreciated. Thanks.

– Maurizio

Maurizio_C · October 26, 2010, 1:40am

You’re working too hard. The most simple way usually works. Just

ss.each do |codon_cluster|
#do something with codon_cluster, like;
p codon_cluster
end

is enough.
Since all these strings are different objects (eating memory) you might
want to convert them into symbols (google them) as soon as possible.

Maurizio_C · October 26, 2010, 8:32am

On 26.10.2010 01:01, Maurizio C. wrote:

  I am a new member of this group and a newbie about Ruby. After a
not successful and
extensive search on this topic, I ask your help solving a problem in
picking out single subarrays
from a multidimesional array.
In short, I somehow stored the following m-array (single strings are
DNA codons):
ss = [“tcg”, “agt”, “tct”, “agc”, “tca”, “tcc”], [“aaa”, “aag”],
[“ctg”, “tta”, “ctt”, “cta”, “ctc”, “ttg”]]

Btw, I assume you do bioinformatics. If all your Arrays contain three
letter sequences you should probably change entries to Symbols (there
are only 4 ^ 3 = 64 of them).

I’d probably replace these Arrays by a custom class for handling
sequences and work with that. Then you can optimize internal
representation (e.g. use a Fixnum to code a three letter sequence) to
save even more memory. I believe there are libraries for bioinformatics
out there which probably do exactly that.

What I would like to get are the separated arrays s[0], s[1] and s[2]
by iteration over array ss.
The method array.clone looks perfect for this aim:

irb(main):039:0> v0 = ss[0].clone
=> [“tcg”, “agt”, “tct”, “agc”, “tca”, “tcc”]

but I did not find the right way to iterate this method over the m-
array and get indexed subarrays.

Do you actually need a copy or do you want to reference the original?
If you need the original here’s the simplest approach

a, b, c = *ss

For copy you can do (in 1.9.*)

a, b, c = ss.map &:clone # 1.9.
a, b, c = *ss.map {|x| x.clone} # 1.8.6 and earlier

Note that then you still share String instances! So if you want to
manipulate individual strings you need to take a different approach
(e.g.)

a, b, c = *ss.map {|arr| arr.map {|s| s.dup}}
a, b, c = *Marshal.load(Marshal.dump(ss))

I tried iterations like:

v"#{n}"= ss.clone(n) do |n|
end
or

Apart from that it does not work, where’s the point in creating
variables with calculated names with indexes if you can do indexed
access via the Array already? That does not seem like a viable
approach.

Kind regards

robert

Maurizio_C · October 26, 2010, 4:01pm

Thanks a lot Siep for your prompt help.

– Maurizio

Maurizio_C · October 26, 2010, 4:08pm

On Oct 26, 1:29am, Robert K. [email protected] wrote:

Do you actually need a copy or do you want to reference the original?
If you need the original here’s the simplest approach

a, b, c = *ss

This is simpler.

a, b, c = ss

Maurizio_C · October 26, 2010, 4:06pm

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.
Btw, bioinformatics libraries to Ruby community are provided by
the BioRuby project guys.

– Maurizio

Maurizio_C · October 26, 2010, 5:25pm

On Tue, Oct 26, 2010 at 3:05 PM, Maurizio C.
[email protected]wrote:

the BioRuby project guys.

There’s an explanation of *array in the online Programming Ruby,
probably in
the sections on assignment and/or method calls: I did think about
searching
for it, but the link below looks as though it has a reasonable
explanation.
Subject to correction by anyone more knowledgeable than me, the second
statement below (extracted from the linked page) also applies to
assignment,
so you can do something like:
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment
statement
is an array, and there are two or more variables on the left hand side
of
the assignment statement, then Ruby automatically expands the array for
you,
so you can omit the “*” operator if you want to…

http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Method_Calls
…
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which
is
sometimes called the ‘splat’ operator. This indicates that more
parameters
may be passed to the function. Those parameters are collected up and an
array is created.
…
The asterisk operator may also precede an Array argument in a method
call.
In this case the Array will be expanded and the values passed in as if
they
were separated by commas.
…

Maurizio_C · October 26, 2010, 4:59pm

On Tue, Oct 26, 2010 at 4:05 PM, Maurizio C. [email protected]
wrote:

Thanks a lot Robert for your clear explanation and help.
In order to fully understand the code you provided, could you
please to tell what is the role of the asterisk in the
statement:

a, b, c = *ss

I did not find (or probably I just missed) this operator in the Ruby
docs I have.

It’s usually called the splat operator, and its function in the above
expression is to take the array elements one by one and use them in
the parallel assigment, so that the first element is assigned to a,
the second to b, the third to c, and any other is discarded.

It’s also used to collect the rest of the parameters in an assigment
or in a method call:

irb(main):001:0> ss = [1,2,3,4,5]
=> [1, 2, 3, 4, 5]
irb(main):002:0> a,b,c = *ss
=> [1, 2, 3, 4, 5]
irb(main):003:0> a
=> 1
irb(main):004:0> b
=> 2
irb(main):005:0> c
=> 3
irb(main):006:0> a,b,*c = *ss
=> [1, 2, 3, 4, 5]
irb(main):007:0> a
=> 1
irb(main):008:0> b
=> 2
irb(main):009:0> c
=> [3, 4, 5]
irb(main):010:0> def test a,b,*c
irb(main):011:1> p [a,b,c]
irb(main):012:1> end
=> nil
irb(main):013:0> test 1,2,3,4,5,6
[1, 2, [3, 4, 5, 6]]

Jesus.

Maurizio_C · October 26, 2010, 6:05pm

Thank you all for the very very instructive replies-
I have the very last question: how to make this iteration
through splat operator general i.e. flexible covering cases
in which the number of subarrays (a,b,c) in the above example
is unknown? I mean, the splatter operator, doing iteration
automatically,
does not return any count on the columns of the ss input m-array so
how to know
how many variables to put on the left side of the assignment
a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!)

Thanks again.

Maurizio

Maurizio_C · October 26, 2010, 6:14pm

On 10/26/2010 11:05 AM, Maurizio C. wrote:

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!)

It is not possible to do what you’re proposing. If you just want to
iterate over the array contents, us the each method of the array object:

ss.each do |item|

Do something with the item here.

end

-Jeremy

Maurizio_C · October 27, 2010, 10:01am

On Tue, Oct 26, 2010 at 6:11 PM, Jeremy B. [email protected] wrote:

a, b, c = *ss ?

Maybe such question is trivial but not for me: I spent several hours
thinking about that and still I have no clue how to do that (the hard
life
of the beginners!!)

It is not possible to do what you’re proposing.

I go further and say: it is not even reasonable to do that. That’s
the same as setting local variables with calculated names like v1, v2,
v3 etc. If someone wants to do that he must be aware that access to
these variables (since they are generated) must be generated as well.
In this case using an Array indexing is the more appropriate
mechanism.

If you just want to
iterate over the array contents, us the each method of the array object:

Exactly!

Kind regards

robert

Maurizio_C · October 27, 2010, 12:15pm

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

– Maurizio

Maurizio_C · October 27, 2010, 12:35pm

On Wed, 27 Oct 2010 19:15:13 +0900, Maurizio C.
[email protected]
wrote:

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

– Maurizio

It almost certainly can. I think you just need to rephrase your question
so people can see what exactly you want to do. Here is an irb session
showing you one way of doing what I think you want to do:

ss = [[1,2,3],[4,5,6],[7,8,9]]
=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
main_scope = binding()
=> #Binding:0x1011c2828
ss.each_with_index{|x,i| eval(“v#{i} = x.clone”,main_scope) }
=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
v1
=> [4, 5, 6]
v0
=> [1, 2, 3]
v2
=> [7, 8, 9]

It’s almost certainly not the ‘right’ way to do what you really want
though.

Maurizio_C · October 27, 2010, 12:47pm

On Wed, Oct 27, 2010 at 12:15 PM, Maurizio C. [email protected]
wrote:

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

I can only chime in to what Alex wrote: you can extract arrays from
arbitrary nested arrays but generating variable names is almost
certainly the wrong way to go about it. What exactly do you want to
do? Can you describe the input and what you want to do with it with
more context than you provided so far?

Cheers

robert

Maurizio_C · October 27, 2010, 9:17am

On Tue, Oct 26, 2010 at 5:21 PM, Colin B.
[email protected] wrote:

docs I have.
aa = [1, 2]
bb = [4, 5]
cc = [7, 8]
a, b, c, d, e, f, g = *aa, 3, bb, 6, *cc
which sets a to 1, b to 2, c to 3, d to [4, 5], e to 6, f to 7, g to 8.

As w_a_x_man pointed out, if the right hand side of an assignment statement
is an array, and there are two or more variables on the left hand side of
the assignment statement, then Ruby automatically expands the array for you,
so you can omit the “*” operator if you want to…

It even works with one variable to the left - but then you need a comma:

09:11:30 ~$ ruby19 -e ‘a=%w{foo bar baz};b,=a;p b’
“foo”

While splat alone does not work in this case:

09:11:50 ~$ ruby19 -e ‘a=%w{foo bar baz};b=*a;p b’
[“foo”, “bar”, “baz”]

You need to add the comma here as well

09:12:24 ~$ ruby19 -e ‘a=%w{foo bar baz};b,=*a;p b’
“foo”

Of course, you could also do

09:12:50 ~$ ruby19 -e ‘a=%w{foo bar baz};b=a.first;p b’
“foo”
09:13:18 ~$ ruby19 -e ‘a=%w{foo bar baz};b=a[0];p b’
“foo”

Or, if destruction is allowed:

09:13:23 ~$ ruby19 -e ‘a=%w{foo bar baz};b=a.shift;p b’
“foo”

Ruby Programming/Syntax/Method Calls - Wikibooks, open books for an open world
…
Variable Length Argument List, Asterisk Operator

The last parameter of a method may be preceded by an asterisk(*), which is
sometimes called the ‘splat’ operator. This indicates that more parameters
may be passed to the function. Those parameters are collected up and an
array is created.
…

Actually this is not correct any more for 1.9.*: here the splat
operator can occur at any position and Ruby will do the pattern
matching for you:

09:13:29 ~$ ruby19 -e ‘def f(a,*b,c) p a, b, c end;f(1,2,3,4,5)’
1
[2, 3, 4]
5
09:15:03 ~$ ruby19 -e ‘def f(*a,b,c) p a, b, c end;f(1,2,3,4,5)’
[1, 2, 3]
4
5
09:15:42 ~$ ruby19 -e ‘def f(a,b,*c) p a, b, c end;f(1,2,3,4,5)’
1
2
[3, 4, 5]

Kind regards

robert

Maurizio_C · October 27, 2010, 1:10pm

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this “variable” otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote “unknown”
dimension.
Sorry again for misunderstanding.

Maurizio

Maurizio_C · October 27, 2010, 12:50pm

On Oct 27, 5:14am, Maurizio C. [email protected] wrote:

OK, so looks like there is no way with Ruby to extract
single subarrays from md-arrays with unknown dimensions.

Thanks all for help.

– Maurizio

If you don’t know the length of ss when you write your program,
then obviously you don’t know how many variables to use in your
assignment statement. You don’t know whether to say

a,b,c = ss

or

a,b,c,d = ss

or

a,b,c,d,e = ss

However, those variables are not needed at all. To extract the
first subarray, say ss[0] or ss.first. To extract the last
subarray, say ss[-1] or ss.last. To extract each in turn along
with its index:

ss.each_with_index{|x,i| p i, x}
0
[“tcg”, “agt”, “tct”, “agc”, “tca”, “tcc”]
1
[“aaa”, “aag”]
2
[“ctg”, “tta”, “ctt”, “cta”, “ctc”, “ttg”]

All of this will become obvious after you have some programming
experience.

Maurizio_C · October 27, 2010, 2:00pm

On Wed, Oct 27, 2010 at 1:10 PM, Maurizio C. [email protected]
wrote:

number of subarrays can vary. In this respect I wrote “unknown”
dimension.

Then why don’t you just iterate the outermost Array and be done?

robert

Maurizio_C · October 27, 2010, 1:27pm

Sorry for my bad explanation, in my case I actually know how many
columns are in md-array
i.e how many sub-arrays are to be extracted because I read them as
backtraslated aminocids
from a gene database but this number can change from case to case. So
to make my code of
general use I have to take in consideration this “variable” otherways
I have to change by hand
this number every time I run the program. In the example I provided
this number is 3 but this
number of subarrays can vary. In this respect I wrote “unknown”
dimension.
Sorry again for misunderstanding.

Maurizio

Maurizio_C · October 27, 2010, 2:01pm

On Wed, Oct 27, 2010 at 1:10 PM, Maurizio C. [email protected]
wrote:

number of subarrays can vary. In this respect I wrote “unknown”
dimension.
Sorry again for misunderstanding.

The question is: what do you do with those subarrays? What’s common in
both cases?
When you take a look at that answer you can come up with a way of
making a generalized algorithm that can work for different sizes, and
I bet that you don’t need to generate a local variable for each
subarray.
For example, say that you have to take each subarray, join the strings
together comma separated and pass it to a method for further
processing:

ss = [ … your array of arrays …]

ss.each {|subarray| process_subarray(subarray.join(“,”)) }

See? You don’t need to know there were 2, 3 or 20 subarrays, or even
the length of each subarray.

So, take a look at the requirement from a higher perspective and you
will get to a generalized algorithm.

Jesus.