Forum: Ruby changing the format of a text file

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Bary B. (Guest)
on 2009-02-25 13:30
Hello everyone,

i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:
----------------------------------------------------------
1000000             name
Status            :A
Basetype          :2
Version           :1.0
   |
   |
 (more
  fields)
   |
Name              :/file/name/etc
1000001             name
Status            :B
Basetype          :2
Version           :a20
   |
   |
Name              :/file/name/etc
1000002             name
Status            :C
   |

... and so on


so for each 200mb file there are lot of entries.

What i want to do is to open the file, read the data into an array,
reformat the text and save it into another file with the following
output:

id, Status, Basetype, .... , Name
1000000, A, 2, ..... , /file/name/etc
1000001, B, 2, ..... , /file/name/etc

i tried to write a script in ruby to do that task but i dont get any
output so far.

def getfile(file_name)
 entry = []
 IO.foreach(file_name) do |fl|
  if fl.include? 'name'
   entry.push fl.scan(/\d+/)[0]
  elsif fl.strip =~ /\A\d/
  end
 end
 entry
end

def writefile(file, *linedata)
 linedata.each do |line|
  file << line.join(", ") +\n"
  end
end

def readfile(file, outputfile)
 out = File.new(outputfile, "w+")
 info = []

 wline = ['id', 'Status', 'Basetype', .... 'Name']

 IO.foreach(file) { |line|

 if line =~ //
  wline[0]= line.scan(/\d+/)
 elsif line =~ /Status/
  wline[1]= line.split(":")[1].scan(/[a-zA-Z]+/).join("")
 elsif line =~ /Basetype/
  wline[2]= line.split(":")[1].scan(/\d+/).join("")
    |
    |
    |
 wline all fields
    |
 writefile(out, wline)
 end
out.close
end

readfile('filename', 'outputfile')


this is what ive done so far, can someone tell me whats wrong and i dont
get any output at all..

Thanks in advance
James C. (Guest)
on 2009-02-25 13:43
(Received via mailing list)
2009/2/25 Bary B. <removed_email_address@domain.invalid>

> Basetype          :2
> Version           :a20
> so for each 200mb file there are lot of entries.
>
> What i want to do is to open the file, read the data into an array,
> reformat the text and save it into another file with the following
> output:
>
> id, Status, Basetype, .... , Name
> 1000000, A, 2, ..... , /file/name/etc
> 1000001, B, 2, ..... , /file/name/etc



I would strongly recommend looking at Treetop
(http://treetop.rubyforge.org/).
It's a parser generator that produces tree structures from text files
using
a grammar that you specify. If you know regular expressions, it
shouldn't be
too big a leap to use Treetop's grammar language.

For this particular task it may be overkill, but certainly worth looking
at.
James G. (Guest)
on 2009-02-25 16:17
(Received via mailing list)
On Feb 25, 2009, at 5:30 AM, Bary B. wrote:

> Hello everyone,

Hello and welcome.

> i am new to ruby and im having some problems trying to reformat a text
> file.
>
> Basically, i have a large log file which is around 200mb in the
> following format:
> ----------------------------------------------------------
> 1000000             name
> Status            :A
> Basetype          :2
> Version           :1.0

> id, Status, Basetype, .... , Name
> 1000000, A, 2, ..... , /file/name/etc
> 1000001, B, 2, ..... , /file/name/etc

Do you just read the log file replacing variables holding Status,
Basetype, Version, and Name then spit out a new entry each time you
run across a number?

> i tried to write a script in ruby to do that task but i dont get any
> output so far.

I'll try to give some feedback…

> def getfile(file_name)
> entry = []
> IO.foreach(file_name) do |fl|
>  if fl.include? 'name'
>   entry.push fl.scan(/\d+/)[0]
>  elsif fl.strip =~ /\A\d/
>  end
> end
> entry
> end

I don't see this method used anywhere in the code.

> def writefile(file, *linedata)
> linedata.each do |line|
>  file << line.join(", ") +\n"

You are missing a quote there.  It should be:

   … + "\n"

> IO.foreach(file) { |line|
>
> if line =~ //

Don't do that.  It doesn't do what you think it does.  :)

What are you looking for here?  A line that starts with a digit?  If
so, use this:

   if line =~ /\A\s*(\d+)/
     # the digit is in the $1 variable here...

>  wline[0]= line.scan(/\d+/)
> elsif line =~ /Status/
>  wline[1]= line.split(":")[1].scan(/[a-zA-Z]+/).join("")

The above two lines can be simplified to:

   elsif line =~ /\A\s*Status\s*:\s*([a-zA-Z]+)/
     wline[1] = $1

The other assignments could be handled in a similar way.

> end
>
> readfile('filename', 'outputfile')
>
>
> this is what ive done so far, can someone tell me whats wrong and i
> dont
> get any output at all..

It's not real easy for me to tell why you don't see output.  It looks
like  outputs might only happen in that last elsif.  If that's the
case, you won't se output unless the code makes it there.  I'm
guessing it's not.  Maybe because of the line =~ // condition, which
is problematic.

I believe the code below does something like what you want.  I hope it
can be adapted to your needs.

James Edward G. II

#!/usr/bin/env ruby -wKU

fields         = ["id"]
fields_written = false
entry          = { }

DATA.each do |line|
   case line
   when /\A\s*(\d+)/
     unless entry.empty?
       unless fields_written
         puts fields.join(", ")
         fields_written = true
       end
       puts fields.map { |f| entry[f] }.join(", ")
       entry.clear
     end
     entry["id"] = $1
   when /\A\s*([a-zA-Z]+)\s*:\s*(\S+)/
     fields << $1 unless fields.include? $1
     entry[$1] = $2
   end
end

__END__
1000000             name
Status            :A
Basetype          :2
Version           :1.0
Name              :/file/name/etc
1000001             name
Status            :B
Basetype          :2
Version           :a20
Name              :/file/name/etc
1000002             name
Status            :C
This topic is locked and can not be replied to.