Parsing text into usablle numerical data

Hey total ruby n00b here…
I’m having trouble with parsing data into ruby for statistical analysis.

The data looks like this:
32 0 0 0 0 0 0 0 0 8412803500 0 0 0 0 0 0 0 46655166 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 240554000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85321000 0 0
0 0 0 0 0 479719000 0 0 0 97823285 283432000 0 73887750 0 0 157225000
88659750 285211000 70285000 0 161747000 161167000 234739666 120400000
300083000 0 0 202327250 111865000 183127000 0 161027000 0 0 0

33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I need to store the index of the entry, in this case 32 and 33, as the
name of the 2 Dimensional array (array of arrays as ruby handles it?)
and then each value (alot of zeros in the case of 33) as a unique entry.
The format is plain tyext right now and I have had no luck using
File.readline

For some reason this does NOT work, the array dimensions do not match
expected structure:
elsif /aqua_t/=~(files_to_parse[i])
line_counter = 0
line = File.readlines(files_to_parse[i]).each do |line|
line.each{|x| x.to_i}
raw_data_t[line_counter]=line.split
aqua = [0]
line_counter+=1
end#ends block over lines

Any advice would be much appreciated.
–m

On Tue, Jun 17, 2008 at 7:44 PM, Cthulhu __ [email protected]
wrote:

33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I need to store the index of the entry, in this case 32 and 33, as the
name of the 2 Dimensional array (array of arrays as ruby handles it?)
and then each value (alot of zeros in the case of 33) as a unique entry.

If you need a “name” for the entry then you might be thinking of a hash,
in
which 32 and 33 would be the keys and the rest of the line the value.
What do you mean by “a unique entry”? Maybe the rest of the string, as a
string?
An array with an element for each number in the string?

The format is plain tyext right now and I have had no luck using
File.readline

For some reason this does NOT work, the array dimensions do not match
expected structure:
elsif /aqua_t/=~(files_to_parse[i])
line_counter = 0
line = File.readlines(files_to_parse[i]).each do |line|

The each method returns the full enumerable, so after the
iteration, the line variable will reference an array of all lines.
Not sure that’s what you want.

line.each{|x| x.to_i}

This line does nothing, you don’t assign the return values of the block
anywhere or modify anything inside the block. But this line makes
me think you want an array of numbers.

raw_data_t[line_counter]=line.split
aqua = [0]
line_counter+=1
end#ends block over lines

This works for me:

irb(main):001:0> line_counter = 0
irb(main):005:0> raw = []
irb(main):006:0> File.readlines(“data.txt”).each do |line|
irb(main):007:1* raw[line_counter] = line.split
irb(main):008:1> line_counter += 1
irb(main):009:1> end

irb(main):010:0> raw
=> [[“32”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “8412803500”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “46655166”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “240554000”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “85321000”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “479719000”, “0”, “0”, “0”, “97823285”, “283432000”, “0”,
“73887750”, “0”, “0”, “157225000”, “88659750”, “285211000”,
“70285000”, “0”, “161747000”, “161167000”, “234739666”, “120400000”,
“300083000”, “0”, “0”, “202327250”, “111865000”, “183127000”, “0”,
“161027000”, “0”, “0”, “0”], [“33”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”,
“0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”, “0”]]

So I’m not sure what the exact problem you are having is. The above
could be simplified, though, to avoid the counter:

raw = []
File.readlines(“data.txt”).each do |line|
raw << line.split
end

If you need the hash with keys 32 and 33, and as value an array of
numbers, you can do this:

raw = {}
File.readlines(“data.txt”).each do |line|
key, *value = line.split
value.map! {|x| x.to_i}
raw[key] = value
end

If the key needs to be a number you can to_i it too.

Hope this helps,

Jesus.

that is exactly what I was looking for. Thank you for the very elegant
solution.
–m

Still having some problems with this program. I’m getting a TypeError
when the value of

dx, *dat = line.scan(/\d+/).map.compact
is nil

Any suggestions on how to handle this? my current implementation looks
like this:

raw_data = Array.new
File.foreach files_to_parse[i].to_s do |line|
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

but if I include a conditional statement that excludes nil inside this
block
{|s| s.to_i}

then the size of the array changes, which i need to avoid.

–m

oh… and also, I’m not sure exactly how idx is being incremented?
–m

On 17.06.2008 23:34, Jesús Gabriel y Galán wrote:

What do you mean by “a unique entry”? Maybe the rest of the string, as a string?
An array with an element for each number in the string?

I believe he wants to use the first number as index and the rest of the
line as array of integers thus yielding a two dimensional array.

This is probably what I’d do:

data = []
File.foreach “data.txt” do |line|
idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
data[idx] = dat
end

Kind regards

robert

On Jun 18, 2:10 pm, Cthulhu __ [email protected] wrote:

File.foreach files_to_parse[i].to_s do |line|
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

looks like the blank line between records is the culprit
could just skip it explicitly:

File.foreach files_to_parse[i].to_s do |line|
next if line.strip == ‘’ #skip line if it is empty
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

In response to your question re where idx comes from,
the code in that line is creating an array of ints from the current
line,
then the parallel assignemnt is puting the first number in the array
into idx, and putting the rest
of the array into dat

hth
Chris
cheers

Hi –

On Thu, 19 Jun 2008, Cthulhu __ wrote:

raw_data = Array.new
File.foreach files_to_parse[i].to_s do |line|
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

but if I include a conditional statement that excludes nil inside this
block
{|s| s.to_i}

then the size of the array changes, which i need to avoid.

There won’t be any nils in the array resulting from line.scan(/\d+/).
It will either be empty or contain strings. So you shouldn’t need to
compact it. That line is a bit tangled in general. What happens if you
run Robert’s code as posted?

David

Sot he code as taken from Chris’ post returns this error:

read_data.rb:72:in []=': no implicit conversion from nil to integer (TypeError) from read_data.rb:72 from read_data.rb:69:inforeach’
from read_data.rb:69

The code looks like this with line numbers…

 69         File.foreach files_to_parse[i].to_s do |line|
 70                 next if line.strip == ''
 71                 idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
 72                 raw_data[idx] = dat
 73         end
 74         puts raw_data

I suspect it may be a problem in the formatting of the data… perhaps
there is some way to remove any non-integer characters eg. delimiting
chars before parsing?
–m

Hi –

On Thu, 19 Jun 2008, Chris H. wrote:

raw_data = Array.new
next if line.strip == ‘’ #skip line if it is empty
idx, *dat = line.scan(/\d+/).map.compact! {|s| s.to_i}
raw_data[idx] = dat
end

That scan/map/compact! line is still wrong. The block should go with
map, not compact!, and compact! returns nil if its receiver doesn’t
change:

[1,2,3].compact! # => nil

There’s no reason that line.scan(/\d+/) would ever contain nil, so
compact! will always return nil and map, in that position, does
nothing in 1.8 and makes compact! blow up in 1.9 :slight_smile:

David

Hi –

On Thu, 19 Jun 2008, Cthulhu __ wrote:

69         File.foreach files_to_parse[i].to_s do |line|
70                 next if line.strip == ''
71                 idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
72                 raw_data[idx] = dat
73         end
74         puts raw_data

I suspect it may be a problem in the formatting of the data… perhaps
there is some way to remove any non-integer characters eg. delimiting
chars before parsing?

The scan(/\d+/) will scan for digits, and will ignore everything else,
so you don’t have to pre-treat the lines.

I would throw in:

puts line if idx.nil?

before line 72, and see which lines are giving you the problems. All
the lines you showed in your sample data were either blank or
contained only digits, so they shouldn’t cause this problem.

David

2008/6/18 Cthulhu __ [email protected]:

69         File.foreach files_to_parse[i].to_s do |line|
70                 next if line.strip == ''
71                 idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
72                 raw_data[idx] = dat
73         end
74         puts raw_data

I suspect it may be a problem in the formatting of the data… perhaps
there is some way to remove any non-integer characters eg. delimiting
chars before parsing?

Your condition for next is pretty weak because it will let all lines
pass which contain some garbage. Rather do this

data = []
File.foreach “data.txt” do |line|
idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
data[idx] = dat if idx
end

or

replace 42 with the number you need

data = []
File.foreach “data.txt” do |line|
idx, *dat = line.scan(/\d+/).map {|s| s.to_i}
data[idx] = dat if idx && dat.length == 42
end

or even

data = []
File.foreach “data.txt” do |line|
(idx, *dat = line.scan(/\d+/).map {|s| s.to_i}).empty? or
data[idx] = dat
end

Kind regards

robert

Thank you all for the valuable suggestions.
The program works fine now, reads the data in very nicely and I have
learned alot about Ruby. I’ll certainly be using it a lot more in the
future.

cheers
–m