Parsing excel CVS data on a mac OSX to extract blocks of cel


#1

I would like to parse some excel CVS data which has a repetitive block
pattern

“Experiment ID: 1”,
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
response,
26,undefined,14828,KEY,RETURN,UNUSED,DCS,Sat Oct 15 17:48:04 GMT-0400
2005,
23,undefined,15078,KEY,RETURN,UNUSED,244,Sat Oct 15 17:48:19 GMT-0400
2005,
7,nixontrialleft copy.pct [TAG: 1],5953,KEY,1,UNUSED,Sat Oct 15
17:49:24 GMT-0400 2005,
8,nixontrialfront copy.pct [TAG: 3],6250,KEY,3,UNUSED,Sat Oct 15
17:49:31 GMT-0400 2005,
9,nixontrialright copy.pct [TAG: 2],2469,KEY,2,UNUSED,Sat Oct 15
17:49:34 GMT-0400 2005,

more data

,4374.347222,
,1.00
,0.93

and a new block starts

“Experiment ID: 3”,0.92

Question 1:

Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
has a length of 1, why?
I noticed that the puts has ^M everywhere
Arr =
IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"),sep_string="^M")
seems to split the array into lines.
Arr = Arr.split(",")
gives me a message, private method `split’ called
puts Arr.length gets me no response. I expected split to take the 1D
array and transform it into a 2D array.

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

Question 2:

Once I have a 2D array, what is the best way to find the index of the
key word Trial (which starts the data I want).

In matlab, I would do something like,
Find the index pair (I,J) at which Array = Trial
If J==1
Extract the array starting at row I+1 to I+82, for columns [2, 3, and
5]
and I would love some pointers at Ruby structures, methods, etc I need
to look at.

  • First ruby code – Thanks for your help

#2

PS: I have found a tutorial which should help me through part 1.
It extracts columns from a CVS file, and working my way through it, I
should figure out how to use split.

http://www.linuxdevcenter.com/pub/a/linux/2003/09/18/ruby_csv.html?page=1

I would love some pointers to question 2.


#3

On Mon, 21 Nov 2005, anne001 wrote:

Question 1:

require “csv”

path = File::expand_path “~/Desktop/FaceRetest.cvs”

table = [] # 2d array
CSV::open(path, ‘r’){|row| table << row}

that ought to get you going.

-a


#4

On Nov 20, 2005, at 7:37 PM, anne001 wrote:

2005,
,1.00
,0.93

and a new block starts

“Experiment ID: 3”,0.92

Question 1:

Arr = IO.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"))
has a length of 1, why?

Well, if you’re on Windows, Ruby is looking for a \015\012 sequence
to end the line. On Unix it would be looking for a \012…

I noticed that the puts has ^M everywhere

^M (control-M) is just \015. So it’s not seeing the line endings.
Some OSes actually used this line ending, like Mac OS 9 and lower,
but it’s pretty rare.

Arr =
IO.readlines(File.expand_path("~/Desktop/
FaceRetest.cvs"),sep_string="^M")

That’s a fine fix. You should be able to replace sep_string="^M"
with just “\r”, I think.

seems to split the array into lines.
Arr = Arr.split(",")
gives me a message, private method `split’ called

First, don’t start Ruby variable names with a capital letter. This
isn’t your problem here, but it’s still not a habit to get into. A
capital variable is a constant in Ruby.

The real problem here is that Arr is an Array, and you are calling a
String function on it, split(). Try:

Arr.first.split(",")

… or …

Arr.map { |row| row.split(",") }

But it’s better to use a real parser as Ara suggested.

puts Arr.length gets me no response. I expected split to take the 1D
array and transform it into a 2D array.

What is the best way to get the excel CVS data into a Ruby 2D array
for
further analysis?

Try:

require “csv”
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

It gives two-dimensional arrays:

Neo:~/Desktop$ cat data.csv
1,2,3
4,5
Neo:~/Desktop$ ruby -r pp -r csv -e ‘pp CSV.read(“data.csv”)’
[[“1”, “2”, “3”], [“4”, nil, “5”]]

Question 2:

Once I have a 2D array, what is the best way to find the index of the
key word Trial (which starts the data I want).

Hmm, what about something like:

in_section = false

csv.each do |row|
if row.first == “Trial”
in_section = true
next
elsif in_section
# process row here…
end
end

Hope that gets you going.

James Edward G. II


#5

On 11/21/05, Gregory B. removed_email_address@domain.invalid wrote:

On 11/20/05, anne001 removed_email_address@domain.invalid wrote:

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

This might be overkill and more useful when it is more complete, but
Ruby Reports has basic support for stuff like this, via DataSets.

As a side not, you do not need to install DBI or any database related
dependencies to use this set of features.


#6

On 11/20/05, anne001 removed_email_address@domain.invalid wrote:

What is the best way to get the excel CVS data into a Ruby 2D array for
further analysis?

This might be overkill and more useful when it is more complete, but
Ruby Reports has basic support for stuff like this, via DataSets.

require “rubygems”
require “ruportlib”

my_data = Report::DataSet.load(“foo.csv”)

this will create a DataSet filled with DataRows which can be indexed
ordinally or by field name. (The first row of the CSV is used to
define field names)

example:

my_data.each do |row|
puts row[“name”]
end

would print the value of the “name” column if it existed.

both DataSet and DataRow are Enumerable, which might make finding your
data easier.

if you want to try this approach,

sudo gem install ruport

should get you the latest version

The rather sparse documentation is housed at:
http://ruport.rubyforge.org/docs/

You’d want to look at (and inspect the source on) Report::DataSet and
Report::DataRow

HTH,
Greg


#7

Thank you so much for your help
The data was collected by another lab. I don’t know what they use. I
saved their excel file data into a csv file. I don’t know if there is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

require “csv”
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row’:
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara’s formulation does the same thing.

I think there is a way to replace special characters in emacs but what
I tried so far does not work.

I thought I could use readlines and a puts and then copy the ruby
output, which would not longer have the ^M, but somehow, ruby gets
mixed up, and suddenly stop parsing the lines, so I can’t relie on ruby
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

thank you


#8

anne001 wrote:

/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row’:
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

thank you

There’s a few ways to handle converting newlines between O/S:

  • require ‘ptools’; File#nl_convert here:
    http://ruby-miscutils.sourceforge.net/ptools.html

  • specify $/ input record separator and $\ output separators for
    IO#gets, readline, each_line etc.

  • Komodo has a Code / Clean Line Endings which will convert to the O/S
    format you’re
    running it on

but there’s no automatic conversion like python
open(“filename.ext”,“U”), I think


#9

On Nov 21, 2005, at 6:47 AM, anne001 wrote:

Thank you so much for your help
The data was collected by another lab. I don’t know what they use. I
saved their excel file data into a csv file. I don’t know if there
is a
way of telling excel to do a better job. I am on panther, with
office 2004 for mac.

Excel might be legacy enough that it’s still using \r as a return
here, for the old Mac OS. Try this:

ruby -p -i.bak -e ‘sub("\r", “\n”)’ your_csv_file_here.csv

require “csv”
arr = CSV.read(File.expand_path("~/Desktop/FaceRetest.cvs"))

I get an error:
/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row’:
CSV::IllegalFormatError (CSV::IllegalFormatError)

Ara’s formulation does the same thing.

I think this is a line ending issue. The CSV format does not allow a
naked \r outside of a quoted field. Try my fix above and see if it
gets you going, or you can switch the line ending for CSV’s readlines():

require “csv”
arr = CSV.readlines(File.expand_path("~/Desktop/FaceRetest.cvs"), “\r”)

If you’re still having trouble and it’s not a privacy concern, you
are welcome to email me the CSV file off list, and I will try to
figure out how to read it.

James Edward G. II


#10

I have moved onto Tiger. Still much installing to do. Sorry in the
delay in processing your responses.

The line ending is extremely simple,
http://www.delorie.com/gnu/docs/emacs/emacs_220.html

…-unix newline to separate lines.
…-dos carriage-return + linefeed to separate lines
…-mac carriage-return to separate lines

But the editors show what they want. Text wrangler lets you replace \r
with \n
but when you look, it still looks as a \r. Did it make the
substitution?
Emacs has a command to change the file to unix. But the ^M are not
changed. Did something change? hard to say. So I gave up on that.

I like the command, thank you
ruby -p -i.bak -e ‘sub("\r", “\n”)’ your_csv_file_here.csv

It seems to work, it creates your_cvs_file_here.csv.bak and csv read no
longer gives an error, but… It seems to parse each cell, with realms
of NIL in between bits of text.
Conceptually I think it is easier to process line by line. But I will
look into CVS to see what it can do.

In any case, I can combine the ruby line command with the readlines
ruby function. So I should be OK.


#11

On Mon, 21 Nov 2005, anne001 wrote:

/usr/local/lib/ruby/1.8/csv.rb:607:in `get_row’:
readline either. It sounds like I need to figure out how to format the
input file better. And then I will be able to use your suggestions.

run dos2unix on it. if it still fails you’ll need see why the file is
badly
formatted.

regards.

-a


#12

anne001 removed_email_address@domain.invalid wrote:

5]
and I would love some pointers at Ruby structures, methods, etc I need
to look at.

This should get you started:

#------------------------------------------------

ary = [
[“this”, “is”, “some”, “code”],
[“test”, “please”, “ignore”],
[“Trial”, “section”, “begins”, “here”],
[“foo”, “bar”, “baz”, “quux”]
]

index = nil
searchterm = “Trial”
catch(:done) do
ary.each_with_index {|row, i|
row.each_with_index {|cell, j|
puts “testing [#{i},#{j}]”
if cell =~ /^#{searchterm}/
index = [i,j]
throw :done
end
}
}
end

puts “-----------------”

puts “found #{searchterm} at #{index.inspect}”

#------------------------------------------------

martin


#13

I’ve had a similar problem. It sounds like you used Excel somewhere.
My experience has been that Excel for the mac likes to save CSV files
with MacOS 9 line endings which Ruby doesn’t like. In the future, you
can choose ‘Windows CSV’ or something to that extent. It’s really
silly, but it works.

Ruby’s CSV library will you give you back arrays if you open the file
with the CSV module.
I did notice that your file had this in it:
"Subject ID: 1013938829432171e868c340.
Trial,stimulus,time,type,field1,field2,text_response,Abs. time of
response,

with the " not closing anywhere. You might want to look at that, Excel
might be generating invalid CSV.

Anyways, goodluck.
.adam