Select "columns" from multidimensional array?

There’s probably a simpler answer to this than the ways I’ve come up
with.
What’s the best way to select columns from a two-dimensional array?

I build arrays to match excel-style formatting, like this but larger:


a = [
[ ‘A1’, ‘A2’, ‘A3’ ],
[ ‘B1’, ‘B2’, ‘B3’ ],
[ ‘C1’, ‘C2’, ‘C3’ ]
]

def get_cols (multi_array, headers )

indices = []
headers.each { |val| indices << multi_array[0].index(val) }
indices.compact!

multi_array.map do |ar|
indices.map { |idx| ar[idx] }
end

end

get_cols a, %w(A1 A3)

=> [[“A1”, “A3”], [“B1”, “B3”], [“C1”, “C3”]]


I haven’t been able to work out a way to do this without writing
long-winded code. Is there a simple solution?

Thanks.

On Thu, Jan 31, 2013 at 10:46 AM, Joel P. [email protected]
wrote:

[ ‘C1’, ‘C2’, ‘C3’ ]
end

end

get_cols a, %w(A1 A3)

=> [[“A1”, “A3”], [“B1”, “B3”], [“C1”, “C3”]]

Is this the desired output?

For getting the columns filtering the ones that have a header:

def get_cols(multi_array, headers)
multi_array.transpose.select {|(header,_)| headers.include? header}
end

1.9.2p290 :001 > a = [
1.9.2p290 :002 > [ ‘A1’, ‘A2’, ‘A3’ ],
1.9.2p290 :003 > [ ‘B1’, ‘B2’, ‘B3’ ],
1.9.2p290 :004 > [ ‘C1’, ‘C2’, ‘C3’ ]
1.9.2p290 :005?> ]
1.9.2p290 :008 > def get_cols(multi_array, headers)
1.9.2p290 :009?> multi_array.transpose.select {|(header,_)|
headers.include? header}
1.9.2p290 :010?> end
=> nil
1.9.2p290 :011 > get_cols a, %w(A1 A3)
=> [[“A1”, “B1”, “C1”], [“A3”, “B3”, “C3”]]

Jesus.

Praise Jesus!

My initial output was the desired one, but the “transpose” method is
what I was missing. Thanks :slight_smile:

I didn’t know about that “|header,_|” trick either, very nice.

Documenting the end result.
I added a sort to keep the input column order, and reordered the inputs
to make the array itself optional for ease-of-use within the parent
class.

def get_cols( headers, multi_array=nil)
multi_array = @data if multi_array.nil?
multi_array.transpose.select { |header,| headers.include?(header)
}.sort_by { |header,
| headers.index(header) || headers.length
}.transpose
end

irb(main):017:0> get_cols %w(A3 A1), a
=> [[“A3”, “A1”], [“B3”, “B1”], [“C3”, “C1”]]

Thanks robert

My current approach is
HTML Table -> Nokogiri Nodeset -> Multidimensional Array -> Excel / TSV

A Matrix looks like a useful way of grabbing the values I need when I
have to alter specifics in the data.

On Thu, Jan 31, 2013 at 11:34 AM, Joel P. [email protected]
wrote:

Praise Jesus!

My initial output was the desired one, but the “transpose” method is
what I was missing. Thanks :slight_smile:

I’d really start by creating a class for this - or use Matrix from the
standard library.

irb(main):008:0> m = Matrix[[1,2,3],[4,5,6]]
=> Matrix[[1, 2, 3], [4, 5, 6]]
irb(main):009:0> m.row 1
=> Vector[4, 5, 6]
irb(main):010:0> m.column 1
=> Vector[2, 5]

Kind regards

robert

On Thu, Jan 31, 2013 at 3:14 PM, Joel P. [email protected]
wrote:

Thanks robert

You’re welcome!

My current approach is
HTML Table → Nokogiri Nodeset → Multidimensional Array → Excel / TSV

A Matrix looks like a useful way of grabbing the values I need when I
have to alter specifics in the data.

Whatever you do - reuse class Matrix, write your own - it is the most
reasonable thing to have a specific class for handling this instead of
writing functions which work with a nested Array structure. It will
make your life much easier because then you Matrix class can enforce
proper internal state which you cannot as easily when using a set of
functions to manipulate an Array structure.

Kind regards

robert

Looks like good advice, both. My concern with matrices is being able to
modify elements in the same way as I could in in a multidimensional
array, but I assume that’s the reason for creating a child class.
There’s a whole thread full of people waxing philosophical about the
subject!

I’ve never written a class based on someone else’s before, sounds like
fun. I’ll see what happens when I play with it a bit.

Hi Joel,

Worth mentioning that transpose requires each “row” to contain the same
number of “columns” or you will get an index error.

Here’s a Gist on how to alter Array#transpose to allow a block for
populating missing elements.

Best

randym

Robert K. wrote in post #1094583:

write your own - it is the most
reasonable thing to have a specific class for handling this instead of
writing functions which work with a nested Array structure.

It’s only just dawned on me just how useful this kind of thing could be.
I could have a class tailor-made to my expected outputs, yet versatile
enough to adapt to new challenges… If I learn how to use blocks
effectively as well, then it could do pretty much anything I need.
All hail Ruby!

On Sat, Feb 2, 2013 at 10:04 PM, Joel P. [email protected]
wrote:

Robert K. wrote in post #1094583:

write your own - it is the most
reasonable thing to have a specific class for handling this instead of
writing functions which work with a nested Array structure.

It’s only just dawned on me just how useful this kind of thing could be.
I could have a class tailor-made to my expected outputs, yet versatile
enough to adapt to new challenges…

That’s the whole point of OO - or rather software engineering in
general: create proper abstractions. Advantage of OO is that one can
first reason about the interface and hide all the details behind that

  • including the state necessary to make the interface of the class
    work as needed.

If I learn how to use blocks
effectively as well, then it could do pretty much anything I need.
All hail Ruby!

:slight_smile:

Cheers

robert

And my first attempt at using a block with it:

def skip_headers
yield self[1…-1]
end

test.skip_headers do |row| end
p row
end

It works!

I decided to try and build on the Array class as I don’t really
understand Matrices yet. I’ve added a few handy methods. The hidden Bang
stuff is justified, I think, as this class is intended to mimic Excel’s
layout.

I’ll add more useful bits as I come up with them, this is just an
experiment at the moment.

class Excel_Sheet<Array

def initialize( val=[] )
fail ArgumentError, ‘Must be multidimensional array’ unless
val[0].class == Array || val.empty?
super( val )
end

def columns
ensure_shape
self[0].length
end

def rows
self.length
end

def ensure_shape
max_size = self.max_by(&:length).length
self.map! { |ar| ar.length == max_size ? ar : ar + Array.new(
max_size - ar.length, nil) }
end

def get_cols( headers )
ensure_shape
self.transpose.select { |header,| headers.include?(header)
}.sort_by { |header,
| headers.index(header) || headers.length
}.transpose
end

def get_cols!( headers )
self.replace get_cols
end

def to_s
self.map { |ar| ar.map { |el| “#{el}”.strip.gsub( /\s/, ’ ’ ) }.join
“\t” }.join “\n”
end

end

On Sun, Feb 3, 2013 at 11:43 PM, Joel P. [email protected]
wrote:

I decided to try and build on the Array class as I don’t really
understand Matrices yet. I’ve added a few handy methods. The hidden Bang
stuff is justified, I think, as this class is intended to mimic Excel’s
layout.

class Excel_Sheet<Array

I wouldn’t do that. With the basic types it is usually much better to
use delegation (i.e. have a member of that type) than exposing the
full API via inheritance. The whole point of OO is to control
internal state which is usually quite difficult when exposing a
complete API of Array because anybody can insert and remove elements.

Btw. the “self.” in your code are superfluous.

Kind regards

robert

class Excel_Sheet<Array

I wouldn’t do that. With the basic types it is usually much better to
use delegation (i.e. have a member of that type) than exposing the
full API via inheritance. The whole point of OO is to control
internal state which is usually quite difficult when exposing a
complete API of Array because anybody can insert and remove elements.

Btw. the “self.” in your code are superfluous.

Kind regards

robert

Ok, I wasn’t sure how to create an object which behaved like an array.
I’ve tried this instead:

class Excel_Sheet

def initialize( val=[] )
val = %w(A1 A2 A3 B1 B2 B3 C1 C2 C3).each_slice(3).to_a if val ==
‘test’
fail ArgumentError, ‘Must be multidimensional array’ unless
val[0].class == Array || val.empty?
replace val
end

end

It seems to work ok.

I didn’t know how to use self at first, I think I understand it a bit
better now.

Thanks for the advice :slight_smile:

I’ve decided to inherit from array after all, since all I want to do
with this is extend support for multidimensional arrays, but without
overwriting any of Array’s methods.

Anyway, the obstacle I’ve hit is one I can avoid, but I was wondering
whether I’m doing something wrong, or whether there’s a nice Rubyish way
around this. Here’s a simplified version to demonstrate the issue:


class Excel_Sheet<Array

def initialize( val=[] )
val = %w(A1 B1 C1 A2 B2 C2 A3 B3 C3).each_slice(3).to_a if val ==
‘test’
super ( val )
end

def skip_headers
block_given? ? ( [ self[0] ] + yield( self[1…-1] ) ) : (
self[1…-1] )
end

def filter( header, regex )
idx = self[0].index header
skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
end

end


When I do this sort of thing:
result = object.filter(‘Header’, /value1|value2/)

I get the return as an Array, so I can’t use my extra methods on it
anymore.

Here’s my current workaround. It’s the only way I could think of doing
this but it doesn’t look right.


def filter( header, regex )
idx = self[0].index header
Excel_Sheet.new skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex
} }
end


So in short, my question is how can I return my class type after using
Array’s methods on my child-class?

On Tue, Feb 12, 2013 at 1:19 PM, Joel P. [email protected]
wrote:

I’ve decided to inherit from array after all, since all I want to do
with this is extend support for multidimensional arrays, but without
overwriting any of Array’s methods.

I usually do not engage in predictions since I don’t have a crystal
ball but in this case I’ll say: you won’t get happy with that
approach. For example, anybody can override header values or complete
rows / columns violating your class’s idea of internal state.

super ( val )

end

def skip_headers
block_given? ? ( [ self[0] ] + yield( self[1…-1] ) ) : (
self[1…-1] )
end

What is this supposed to do? Ah, I think I see. I’d probably name it
differently, i.e. each_data_cell or something.

def filter( header, regex )
idx = self[0].index header
skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
end

end

That combines too much logic in one method IMHO. I’d rather select a
row based on header and then I would use #select on that.

So in short, my question is how can I return my class type after using
Array’s methods on my child-class?

Do you mean as return value from #map and the like? Well, you can’t
without overriding all methods with this approach, I’m afraid. That’s
one of the reasons why this approach does not work well. :slight_smile:

Kind regards

robert

On 12.02.2013 13:06, Joel P. wrote:

active
Using the class I’ve built I can easily pare down the data to the
@data[:EXT] = @WIP_data.filter ‘Job Status’, /External Repair/
of
sequential events where I’d like to have my own class type returned.
Is
passing it through the initializer again really the best way to do
that?

As Robert says I’m not sure there is any (clean) way to do what you
want, which is why delegation as opposed to inheritance is often used in
this scenario. I’m sure this isn’t a complete solution, but does this
rewritten snippet (untested!) help at all?

class ExcelSheet

def initialize( val=[] )
val = %w(A1 B1 C1 A2 B2 C2 A3 B3 C3).each_slice(3).to_a if val ==
‘test’
@arr = val
end

def skip_headers
block_given? ? ( [ @arr[0] ] + yield( @arr[1…-1] ) ) : (
@arr[1…-1] )
end

def filter( header, regex )
idx = @arr[0].index header
@arr = skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
end

def method_missing(meth,*args,&block)
ret = @arr.send meth, *args, &block
ret.instance_of?(Array) ? ExcelSheet.new(@ret) : ret
end

end

Robert K. wrote in post #1096452:

anybody can override header values or complete
rows / columns violating your class’s idea of internal state.

This class only gets added into scripts, and I’m the only one who knows
Ruby where I work, so I don’t really see this being a problem yet.

def skip_headers
block_given? ? ( [ self[0] ] + yield( self[1…-1] ) ) : (
self[1…-1] )
end
What is this supposed to do? Ah, I think I see. I’d probably name it
differently, i.e. each_data_cell or something.

This basically allows me to make summaries or edits within the data
itself without having to worry about the headers, then it tacks the
headers back on again once I’ve made the changes. The switch is just so
I can rip the headers off data and also read my code back later and see
what it’s doing.

def filter( header, regex )
idx = self[0].index header
skip_headers { |xl| xl.select { |ar| ar[idx] =~ regex } }
end
That combines too much logic in one method IMHO. I’d rather select a
row based on header and then I would use #select on that.

In this case, I’m filtering the data like Excel does. This means I’m
keeping all of the columns, but only specific rows based on the values
in a single column.

So in short, my question is how can I return my class type after using
Array’s methods on my child-class?
Do you mean as return value from #map and the like? Well, you can’t
without overriding all methods with this approach, I’m afraid. That’s
one of the reasons why this approach does not work well. :slight_smile:

You’re probably thinking in terms of classes which are constantly active
as objects accessible to multiple users, whereas I’m just using this
class to make scripts easier to write.

My usage in this case is:

  1. Run a report which returns a table of data
  2. Filter the data on a set of criteria
  3. Output the results to Excel
  4. Exit

Using the class I’ve built I can easily pare down the data to the
important sections, I’ve added other handy methods suitable to a
multidimensional array such as “find” which returns the header and row,
“to_s” which converts to TSV format, and several more. It saves me
rewriting the same stuff over and over.

It’s great when I want to build excel-style tabs out of the data, like
this little snippet:


@data[:EXT] = @WIP_data.filter ‘Job Status’, /External Repair/
@data[:Awaiting_Parts] = @WIP_data.filter ‘Job Status’, /Awaiting Parts/

@data.each { |name, data| @data[:Summary].push [ name.to_str, (
data.length -1 ) ] }

reporter.output_to_excel_tabs @data, filename


Ok, in that case I’m happy with an array return, but there are plenty of
sequential events where I’d like to have my own class type returned. Is
passing it through the initializer again really the best way to do that?

Thanks for the advice and examples, I’ll see whether I can understand
how the classes and methods work with each other there and set about
experimenting with them.

Once thing which put me off generating a custom class “from scratch” is
that Array appears to be equal to its content (I assume this is a
language shortcut), but it seems “custom” objects’ values have to be
accessed via their accessors.
I was hoping for some more succinct syntax than this sort of thing:
puts [] #Array is so easy to create
puts CustomObject.new([]).value #This looks clunky next to that

I’d love to get accustomed proper OO thinking, but I’ll inevitably make
all the rookie mistakes in the process. It’s a lot to get used to all at
once given that I’ve been using Ruby for less than a year, and I have no
training other than helpful hints and googling. Thanks again for your
patience.