Forum: Ruby reading data files with headers and columns

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
9e952fcaa85f3170d41b3c6cbb990894?d=identicon&s=25 Yaj Bhattacharya (Guest)
on 2009-03-13 18:10
(Received via mailing list)
Hello,

As a noob to Ruby, I need to read a data file with the first few lines
as header, then the rest of the file as columns of different variables
as arrays (in this case, 5 columns).
After reading in the variables, I want to calculate some operations
with each unique data field address (e.g. column3, row4), then write
out the file with headers and data in columns.

Could someone help with a few lines of basic code?

Thanks in advance
Yaj
43a95279874adcdd785a067b40e9dea3?d=identicon&s=25 Luc Traonmilin (Guest)
on 2009-03-13 18:26
(Received via mailing list)
Yaj Bhattacharya a écrit :
>
> Thanks in advance
> Yaj
>
>
>
You could start with this basic piece of code (replace <tags> with your
code):
  # here's the columns as arrays
  @column1 = []
  <rest_of_columns>
  # open file in read mode (r)
  file = File.new("<your_filename", "r")
  # read each line one after another and process it
  file.each_line do |line|
    # process as header if necessary
    # or
    # extract your variables in the line into an array
    array = line.split("<your_column_separator>")
    # assign variable to column
    @column1 << array[0]
    # ... process other columns
  end

Additionnally you can google "ruby array" and "ruby file", the API docs
are well documented.

Luc
9e952fcaa85f3170d41b3c6cbb990894?d=identicon&s=25 Yaj Bhattacharya (Guest)
on 2009-03-13 19:06
(Received via mailing list)
Thanks very much Luc, here are a few follow up questions (complete
noob)

# here's the columns as arrays
  @column1 = []  # should this be @column1=[variable1] or just the
empty square brackets?
  <rest_of_columns>
  # open file in read mode (r)
  file = File.new("<your_filename", "r")
  # read each line one after another and process it
  file.each_line do |line|   # where do I specify how the header is to
be skipped, how many lines to
#be skipped or otherwise processed?
    # process as header if necessary
    # or
    # extract your variables in the line into an array
    array = line.split("<your_column_separator>") # I am guessing that
a space column separator
#would be " ", does it work with arbit number of multiple spaces
between the columns i.e. if the
#column width is variable but between two
# columns there are one or several spaces?
    # assign variable to column
    @column1 << array[0]
# if @column1=[variable1] comment in the second line of the above code
was not correct, then
#where do the variables get their names?
    # ... process other columns
  end
43a95279874adcdd785a067b40e9dea3?d=identicon&s=25 Luc Traonmilin (Guest)
on 2009-03-13 19:33
(Received via mailing list)
Well [] only initializes the array as empty, if you want to initialize
it
with variables it is [var1, var2...]. Alternatively, you can use a Hash:
@column1 = {"var1" => value, ...}
and then access the values with @column1["var1"] if that is what you
want to
do.

I don't know what your header looks like so it is difficult to say how
to
skip it. Maybe it is possible for you to match the line against the
pattern
of your columns if you just want to skip the header. Or if you have a
fixed
number of lines, you can use a counter.
Ef3aa7f7e577ea8cd620462724ddf73b?d=identicon&s=25 Rob Biedenharn (Guest)
on 2009-03-13 20:22
(Received via mailing list)
If you know how many header lines, I'll assume 3, just read them first
before you start the loop:


headers = []
rows = []
File.open("your_filename", 'r') do |file|
   3.times { headers << file.gets }

   file.each do |line|
     rows << line.split(' ') # or ',' or /,\s*/ depending
                             # on how "columns" are formed
   end
end  # the block given to File.open will automatically close the file

# for the value in 3rd column, 4th row (counting from ZERO)

rows[3][2]

# do your calculations, then open the same or a new file for writing:

File.open("your_output", 'w') do |file|
   file.puts headers
   rows.each do |row|
     file.puts row.join(' ') # or however you separate columns
   end
end

-Rob

Rob Biedenharn    http://agileconsultingllc.com
Rob@AgileConsultingLLC.com
9e952fcaa85f3170d41b3c6cbb990894?d=identicon&s=25 Yaj Bhattacharya (Guest)
on 2009-03-17 13:45
(Received via mailing list)
Thanks Rob!
Thanks Luc!
This topic is locked and can not be replied to.