Parsing a zip file for rows of string data

I am really new to Ruby and could use some help with a program. I need
to open a zip file that contains multiple text files that has many rows
of data (eg.)

CDI|3|3|20100515000000|20100515153000|2008|XXXXX4791|0.00|0.00
CDI|3|3|20100515000000|20100515153000|2008|XXXXX5648|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX3276|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX4342|0.00|0.00
MITR|3|3|20100515000000|20100515153000|0000|XXXXX7832|0.00|0.00
HR|3|3|20100515000000|20100515153000|1114|XXXXX0238|0.00|0.00

I first need to read through the zip file, read the text files located
in the zip file and write only the complete rows that start with (CDI
and CHO) to two output files, one for the rows of data starting with CDI
and one for the rows of data starting with CHO (basically parsing the
file). I have to do it with Ruby and possibly try to set the program to
an auto function for arrival of continuous zip files of the same
stature. I completely appreciate any advice, direction or help via some
sample anyone can give.

I have been given a little help with reading the text files located in
the zip file with the following.

require ‘zip/zip’

To open the zip file and pass each entry to a block

Zip::ZipFile.foreach(path_to_zip) do |text_file|

Read from entry, turn String into Array, and pass to block

text_file.read.split("\n").each do |line|
if line.start_with?(“CDI”) || line.start_with?(“CHO”)
# Do something
end
end
end

I am hoping to have some help finishing this program with some help
direction or sample as it reads the zip file and sending those rows of
data to my 2 output text files

Thank you in advance for any help with this,
Kind regards, Jay

Well, you’re almost done?

To open a file for writing, you can use f = File.open(‘filename.txt’,
‘w’). (‘w’ means that any data in the file will be erased first. To
open it in append-mode, use ‘a’ instead. To open for reading, use
‘r’.)

To write data to it, use f.write data; in your case, f.write line.

To close the file when you’re done writing, use f.close.

That’s it - just open the two files at the beginning of your code,
write to them in the loop, and close them at the end.

– Matma R.

Thank you for the reply. When I try running this I get a syntax error.

require ‘zip/zip’

To open the zip file and pass each entry to a block

Zip::ZipFile.foreach(C:\Data\file.zip) do |text_file|

Read from entry, turn String into Array, and pass to block

text_file.read.split("\n").each do |line|
if line.start_with?(“CDI”) || line.start_with?(“CHO”)

cdi_output = File.open(“cdiout.txt”, “a”) # Open an output file for CDI
cho_output = File.open(“choout.txt”, “a”) # Open an output file for CHO

while line = f.gets # Read each line in the input
cdi_output.puts line if /^CDI/ =~ line # Print if line starts with
CDI
cho_output.puts line if /^CHO/ =~ line # Print if line starts with
CHO
end
end

cdi_output.close # Close cdi_output file
cho_output.close # Close cho_output file

I appreciate any help

Thanks you, Jay

Hi Jay,

You Should you move the File.open outside of the
Zip::ZipFile.foreach…
loop first.

You don’t need the while… loop, it is part of the
text_file.read.split("\n").each loop.
nor the if line.start_with? …

i.e.

require ‘zip/zip’

cdi_output = File.open(“cdiout.txt”, “a”) # Open an output file for CDI
cho_output = File.open(“choout.txt”, “a”) # Open an output file for CHO

To open the zip file and pass each entry to a block

Zip::ZipFile.foreach(C:\Data\file.zip) do |text_file|

Read from entry, turn String into Array, and pass to block

text_file.read.split("\n").each do |line|
cdi_output.puts line if /^CDI/ =~ line
# Print to file if line starts with CDI
cho_output.puts line if /^CHO/ =~ line
# Print to file if line starts with CHO
end
end

cdi_output.close # Close cdi_output file
cho_output.close # Close cho_output file

Didin’t test it but it should work this way.

Regards,

Eduardo

Thank you for the reply and the help Eduardo. I will give this a test
run and let you know.

I really appreciate everyone’s help with this.

Kind regards,
Jay

Hi Eduardo,
I tried that out and it through an ArgumentError: wrong number of
arguments (0 for 2…3)

It has me wondering if its my files. The zip file has a file folder that
contains 8 text files that have those rows of data I am needing to
output.
I really appreciate the help thus far and open to any ideas in regards
to the ArgumentError. I am really feeling close on the project. Also,
should I be assigning a class in the program?

Thank you,
Jay

Jason P. wrote in post #1067719:

Thank you for the reply. When I try running this I get a syntax error.

There are several errors. You have to quote the string with the file
path, you have to escape backslashes (or use normal slashes), you forgot
to end two of the blocks, a ZipEntry doesn’t have a “read” method, the
variable “f” isn’t defined anywhere, and “cdi_output” and “cho_output”
are in the wrong scope.

So this really cannot work. Ruby doesn’t have magical powers to guess
what you mean. :wink:

Jason P. wrote in post #1067738:

I tried that out and it through an ArgumentError: wrong number of
arguments (0 for 2…3)

For which expression/line do you get this error?

I don’t get an ArgumentError (Ruby 1.9.3). If i correct the two copy and
paste errors in the script, it does work:

  • use a valid string for the file path: ‘C:/Data/file.zip’
  • replace “text_file.read” with “text_file.get_input_stream {|str|
    str.read}” or shorter in Ruby 1.9 “text_file.get_input_stream(&:read)”

It has me wondering if its my files. The zip file has a file folder that
contains 8 text files that have those rows of data I am needing to
output.

Folders are no problem. The “foreach” only iterates over the actual
files.

I really appreciate the help thus far and open to any ideas in regards
to the ArgumentError. I am really feeling close on the project. Also,
should I be assigning a class in the program?

No, I think that would be way over the top for this simple script –
unless you plan to extend it.