Input file, change data, write to file

paulbutcher · November 2, 2006, 2:30am

I’m a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt’s â€œEnterprise
Integration with Rubyâ€ book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

For what it’s worth, I’ll be working with files that contain between
20,000 â€“ 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here’s what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

Filename: fixtest1.rb

class FixedLengthRecordFile
def FixedLengthRecordFile.open(filename, field_sizes)

if field_sizes.nil? or field_sizes.empty?
  raise ArgumentError, "Empty field sizes not allowed!"
end

field_pattern = 'a' + field_sizes.join('a')
IO.foreach(filename) do |line|
  record = line.chomp.unpack(field_pattern)
  record.map { |f| f.strip! }
  yield record
end

end
end

Filename: rw1.rb

require ‘fixtest1’

FixedLengthRecordFile.open(‘test1.abc’, [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
puts
“#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}”

Any feedback is greatly appreciated!

paulbutcher · November 2, 2006, 10:56am

I’m not sure I understood what you want to do. Do you want to write the
modified data to another file or to the same file?

In the first case, all you need to do is the following (in file rw1.rb):

File.open(‘output_file’,‘w’){|f|
FixedLengthRecordFile.open(‘test1.abc’, [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
f.write
“#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n”
end
}

Instead, if you want to write the data back to the same file, you could
write your FixedLengthRecordFile.open method as

def FixedLengthRecordFile.open(filename, field_sizes)
if field_sizes.nil? or field_sizes.empty?
raise ArgumentError, “Empty field sizes not allowed!”
end

field_pattern = 'a' + field_sizes.join('a')
File.open(filename, 'r+'){|file|
  IO.foreach(filename) do |line|
    record = line.chomp.unpack(field_pattern)
    record.map { |f| f.strip! }
  	file.write(yield(record))
  end
}

end

or you could write

def FixedLengthRecordFile.open(filename, field_sizes)

if field_sizes.nil? or field_sizes.empty?
  raise ArgumentError, "Empty field sizes not allowed!"
end

field_pattern = 'a' + field_sizes.join('a')
lines=File.readlines(filename)
File.open(filename, 'w'){|file|
  lines.each do |line|
    record = line.chomp.unpack(field_pattern)
    record.map { |f| f.strip! }
  	file.write(yield(record))
  end
}

end

I don’t know whether this approach would lead to worst performances,
given the length of your files.

In both cases, the block you pass to the open method should return the
string to write:
FixedLengthRecordFile.open(‘test1.abc’, [2, 3, 12, 7, 2, 1, 2, 1, 4,
2,
1, 2, 1, 4, 10]) do |row|
“#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n”
end

A couple of notes:

you need to add the “\n” at the end of your string in the rw1 file,
otherwise all the rows in the original file will be written as one line
this method will only work when all the lines of the data file have
the same structure (for example, it won’t work with the first line of
your data file example above)

paulbutcher · November 2, 2006, 4:21pm

Stefano,

Thanks for your reply!

I want to write the modified data to another file. Your solution was
terrific!

I should have been clearer in the initial post about the long row of
numbers. That shouldn’t have been part of the data sample, as its
purpose was to document character spacing.

Thanks for the alternate solutions too. You’ve provided this ruby
newbie with lots of valuable tidbits!

Paul

paulbutcher · November 7, 2006, 10:53am

Paul Br wrote:

For what it’s worth, I’ll be working with files that contain between
  raise ArgumentError, "Empty field sizes not allowed!"
Filename: rw1.rb

require ‘fixtest1’

FixedLengthRecordFile.open(‘test1.abc’, [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
puts

“#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}”

Any feedback is greatly appreciated!

It is very important in a case like this to define the problem clearly.

For example, it would greatly improve the code if you were to clearly
say
what the field sizes are. A list of field sizes would be a first step
toward a much more elegant and understandable program. In my solution
below, I guess about some of the field sizes.

Also, you do not want fixed width fields in your output file, as in your
diagram. The first step in the project is to understand that modern
database files use variable width fields, separated by delimiters like
tabs. In your example, you refer to tabs as delimiters, but you still
show
the output format with a column scale as though fixed withs were in
force.
It’s not clear from your diagram that you understand that the output
record’s fields won’t fall on specific columns, and don’t need to.

Sample code:

#!/usr/bin/ruby -w

data = [
“00123 random text 3.1210/20/200610/21/2006 -3.45”,
“00253 more text 275.8707/01/200606/12/2006 13.46”,
“00254 more text 777.3407/01/200606/12/2006 14.47”,
“00255 more text 555.2107/01/200606/12/2006 15.48”
]

out_file = File.open(“outfile.txt”,“w”)

data.each do |record|
fields = [ record[0 … 4],record[5 … 17],record[18 … 23],
record[24 … 33],record[34 … 43],record[44 … 51] ]
fields[3 … 4].each do |field|
field.gsub!(%r{/},"-")
end
out_record = fields.join("\t") + “\n”
out_file.write out_record
end

out_file.close

Output (may wrap when posted):

00123 random text 3.12 10-20-2006 10-21-2006 -3.45
00253 more text 275.87 07-01-2006 06-12-2006 13.46
00254 more text 777.34 07-01-2006 06-12-2006 14.47
00255 more text 555.21 07-01-2006 06-12-2006 15.48

paulbutcher · November 7, 2006, 10:53am

Paul L. wrote:

/ …

I’m a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

A correction. I just noticed that you mentioned MySQL, and your output
has
the date format 2006-07-01, typical of MySQL, something I managed to
overlook on the first read. So (note the single changed line):

#!/usr/bin/ruby -w

data = [
#01234567890123456789012345678901234567890123456789012
“00123 random text 3.1210/20/200610/21/2006 -3.45”,
“00253 more text 275.8707/01/200606/12/2006 13.46”,
“00254 more text 777.3407/01/200606/12/2006 14.47”,
“00255 more text 555.2107/01/200606/12/2006 15.48”
]

out_file = File.open(“outfile.txt”,“w”)

data.each do |record|
fields = [ record[0 … 4],record[5 … 17],record[18 … 23],record[24
…
33],record[34 … 43],record[44 … 51] ]
fields[3 … 4].each do |field|
field.gsub!(%r{(\d+)/(\d+)/(\d+)},"\3-\1-\2")
end
out_record = fields.join("\t") + “\n”
out_file.write out_record
end

out_file.close

Output:

00123 random text 3.12 2006-10-20 2006-10-21 -3.45
00253 more text 275.87 2006-07-01 2006-06-12 13.46
00254 more text 777.34 2006-07-01 2006-06-12 14.47
00255 more text 555.21 2006-07-01 2006-06-12 15.48

paulbutcher · November 7, 2006, 10:53am

Paul Br [email protected] wrote:

I’m a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt’s “Enterprise
Integration with Ruby” book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

I think the most newbie-appealing approach to files is with

open() do |f|
end

because when the block finishes the file closes automatically. The docs
on the modes for opening files, and on the methods you need after that,
are here:

http://www.ruby-doc.org/core/classes/IO.html

With big data that comes in lines, where each line is to be processed
independently, you presumably want two files, reading and writing a line
at a time, so the whole operation could be structured like this:

def munge(s)
return s.gsub(/[aeiou]/, ‘’) # but do your own task here instead
end
open(“path1”, “r”) do |f1|
open(“path2”, “w”) do |f2|
f1.each { |line| f2.puts munge(line) }
end
end

m.