Changing the format of a text file

Hello everyone,

i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:

1000000 name
Status :A
Basetype :2
Version :1.0
|
|
(more
fields)
|
Name :/file/name/etc
1000001 name
Status :B
Basetype :2
Version :a20
|
|
Name :/file/name/etc
1000002 name
Status :C
|

… and so on

so for each 200mb file there are lot of entries.

What i want to do is to open the file, read the data into an array,
reformat the text and save it into another file with the following
output:

id, Status, Basetype, … , Name
1000000, A, 2, … , /file/name/etc
1000001, B, 2, … , /file/name/etc

i tried to write a script in ruby to do that task but i dont get any
output so far.

def getfile(file_name)
entry = []
IO.foreach(file_name) do |fl|
if fl.include? ‘name’
entry.push fl.scan(/\d+/)[0]
elsif fl.strip =~ /\A\d/
end
end
entry
end

def writefile(file, *linedata)
linedata.each do |line|
file << line.join(", “) +\n”
end
end

def readfile(file, outputfile)
out = File.new(outputfile, “w+”)
info = []

wline = [‘id’, ‘Status’, ‘Basetype’, … ‘Name’]

IO.foreach(file) { |line|

if line =~ //
wline[0]= line.scan(/\d+/)
elsif line =~ /Status/
wline[1]= line.split(":")[1].scan(/[a-zA-Z]+/).join("")
elsif line =~ /Basetype/
wline[2]= line.split(":")[1].scan(/\d+/).join("")
|
|
|
wline all fields
|
writefile(out, wline)
end
out.close
end

readfile(‘filename’, ‘outputfile’)

this is what ive done so far, can someone tell me whats wrong and i dont
get any output at all…

Thanks in advance

2009/2/25 Bary B. [email protected]

Basetype :2
Version :a20
so for each 200mb file there are lot of entries.

What i want to do is to open the file, read the data into an array,
reformat the text and save it into another file with the following
output:

id, Status, Basetype, … , Name
1000000, A, 2, … , /file/name/etc
1000001, B, 2, … , /file/name/etc

I would strongly recommend looking at Treetop
(http://treetop.rubyforge.org/).
It’s a parser generator that produces tree structures from text files
using
a grammar that you specify. If you know regular expressions, it
shouldn’t be
too big a leap to use Treetop’s grammar language.

For this particular task it may be overkill, but certainly worth looking
at.

On Feb 25, 2009, at 5:30 AM, Bary B. wrote:

Hello everyone,

Hello and welcome.

i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:

1000000 name
Status :A
Basetype :2
Version :1.0

id, Status, Basetype, … , Name
1000000, A, 2, … , /file/name/etc
1000001, B, 2, … , /file/name/etc

Do you just read the log file replacing variables holding Status,
Basetype, Version, and Name then spit out a new entry each time you
run across a number?

i tried to write a script in ruby to do that task but i dont get any
output so far.

I’ll try to give some feedback…

def getfile(file_name)
entry = []
IO.foreach(file_name) do |fl|
if fl.include? ‘name’
entry.push fl.scan(/\d+/)[0]
elsif fl.strip =~ /\A\d/
end
end
entry
end

I don’t see this method used anywhere in the code.

def writefile(file, *linedata)
linedata.each do |line|
file << line.join(", “) +\n”

You are missing a quote there. It should be:

… + “\n”

IO.foreach(file) { |line|

if line =~ //

Don’t do that. It doesn’t do what you think it does. :slight_smile:

What are you looking for here? A line that starts with a digit? If
so, use this:

if line =~ /\A\s*(\d+)/
# the digit is in the $1 variable here…

wline[0]= line.scan(/\d+/)
elsif line =~ /Status/
wline[1]= line.split(":")[1].scan(/[a-zA-Z]+/).join("")

The above two lines can be simplified to:

elsif line =~ /\A\sStatus\s:\s*([a-zA-Z]+)/
wline[1] = $1

The other assignments could be handled in a similar way.

end

readfile(‘filename’, ‘outputfile’)

this is what ive done so far, can someone tell me whats wrong and i
dont
get any output at all…

It’s not real easy for me to tell why you don’t see output. It looks
like outputs might only happen in that last elsif. If that’s the
case, you won’t se output unless the code makes it there. I’m
guessing it’s not. Maybe because of the line =~ // condition, which
is problematic.

I believe the code below does something like what you want. I hope it
can be adapted to your needs.

James Edward G. II

#!/usr/bin/env ruby -wKU

fields = [“id”]
fields_written = false
entry = { }

DATA.each do |line|
case line
when /\A\s*(\d+)/
unless entry.empty?
unless fields_written
puts fields.join(", “)
fields_written = true
end
puts fields.map { |f| entry[f] }.join(”, ")
entry.clear
end
entry[“id”] = $1
when /\A\s*([a-zA-Z]+)\s*:\s*(\S+)/
fields << $1 unless fields.include? $1
entry[$1] = $2
end
end

END
1000000 name
Status :A
Basetype :2
Version :1.0
Name :/file/name/etc
1000001 name
Status :B
Basetype :2
Version :a20
Name :/file/name/etc
1000002 name
Status :C