Forum: Ruby Parse csv similar file

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Rebhan, Gilbert (Guest)
on 2007-02-06 16:33
(Received via mailing list)
Hi,

<newbie>

i have a txtfile with a format like that =

AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
...

i want to get a collection for every E followed by digits,
so with the example above, i want to get =

collections:
  E023889
  E052337
  E050441
  ...

each collection should contain datasets with the rest of the line, so
f.e.
 E023889 would have =

[AP850KP;INCLIB;AP013;240107;0730,AP850SDI;AP013;240107;0730]

questions=
what kind of collection is the best ? is an array sufficient ?

right now i have =

efas=Array.new
File.open("mycsvfile", "r").each do |line|
        if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/

         efas<<$3.to_s<<',' unless efas.include?($3.to_s)

        end
     end
     puts efas.to_s.chop

So i have all Ed\+, but  how to get further ?

Are there better ways as regular expressions ?
Any ideas ?

<newbie/>

Regards, Gilbert
Brian C. (Guest)
on 2007-02-06 16:38
(Received via mailing list)
On Tue, Feb 06, 2007 at 11:32:27PM +0900, Rebhan, Gilbert wrote:
> questions=
> what kind of collection is the best ? is an array sufficient ?

Depends what you want to do with it. If you want to be able to find an
entry
E123456 quickly, then you'd use a hash. If you want to keep only the
first/last entry for a particular key (as it seems you do), using a hash
speeds things up here too.

>      puts efas.to_s.chop
Try:

efas = Hash.new
...
    efas[$3] = [$1,$2,$4,$5,$6] unless efas.has_key?($3)
...
puts efas.inspect

> Are there better ways as regular expressions ?

You could look at String#split instead

HTH,

Brian.
Rebhan, Gilbert (Guest)
on 2007-02-06 16:55
(Received via mailing list)
Hi,
Brian C. (Guest)
on 2007-02-06 17:14
(Received via mailing list)
On Tue, Feb 06, 2007 at 11:54:59PM +0900, Rebhan, Gilbert wrote:
> that belong to the different E.....
> ...
> puts efas.inspect
> */
>
> that gives me only one dataset in the hash, but there are more
> entries that have E123456 in it.

I was just following your original example, which only kept the first
line
for a particular E key.

If you want to keep them all, then I'd use a hash with each element
being an
array.

     efas[$3] ||= []               # create empty array if necessary
     efas[$3] << [$1,$2,$4,$5,$6]  # add a new line

So, given the following input

aaa,bbb,E123,ddd,eee,fff
ggg,hhh,E123,iii,jjj,kkk

you should get

efas = {
  "E123" => [
         ["aaa","bbb","ddd","eee","fff"],
         ["ggg","hhh","iii","jjj","kkk"],
  ],
}

puts efas["E123"].size   # 2
puts efas["E123"][0][3]  # "eee"
puts efas["E123"][1][3]  # "jjj"

In practice, to make it easier to manipulate this data, you'd probably
want
to create a class to represent each object, rather than using a
5-element
array.

You would give each attribute a sensible name. I don't know what these
values mean, so I've just called them a to e here.

class Myclass
  attr_accessor :a, :b, :c, :d, :e
  def initialize(a, b, c, d, e)
    @a = a
    @b = b
    @c = c
    @d = d
    @e = e
  end
end

...
     efas[$3] ||= []
     efas[$3] << Myclass.new($1,$2,$4,$5,$6)

HTH,

Brian.
Gavin K. (Guest)
on 2007-02-06 17:30
(Received via mailing list)
On Feb 6, 7:32 am, "Rebhan, Gilbert" <removed_email_address@domain.invalid>
wrote:
> i want to get a collection for every E followed by digits,
> so with the example above, i want to get =

lines = DATA.readlines.map{ |line|
  line.chomp.split( ';' )
}
lookup = {}
lines.each{ |data|
  key = data.find{ |value| /^E/ =~ value }
  lookup[ key ] = data
}
p lookup[ "E050441" ]
#=> ["AP850SDS", "INCLIB", "E050441", "AP013", "240107", "0730"]
__END__
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
Drew O. (Guest)
on 2007-02-06 17:36
Gavin K. wrote:
> On Feb 6, 7:32 am, "Rebhan, Gilbert" <removed_email_address@domain.invalid>
> wrote:
>> i want to get a collection for every E followed by digits,
>> so with the example above, i want to get =
>
> lines = DATA.readlines.map{ |line|
>   line.chomp.split( ';' )
> }
> lookup = {}
> lines.each{ |data|
>   key = data.find{ |value| /^E/ =~ value }
>   lookup[ key ] = data
> }
> p lookup[ "E050441" ]
> #=> ["AP850SDS", "INCLIB", "E050441", "AP013", "240107", "0730"]
> __END__
> AP850KP;INCLIB;E023889;AP013;240107;0730
> AP850SD$;INCLIB;E052337;AP013;240107;0730
> AP850SDA;INCLIB;E050441;AP013;240107;0730
> AP850SDI;INCLIB;E023889;AP013;240107;0730
> AP850SDO;INCLIB;E052337;AP013;240107;0730
> AP850SDS;INCLIB;E050441;AP013;240107;0730

I think he wants to append this array with information each time he sees
the same key, so modify your code like so:

lines = DATA.readlines.map{ |line|
  line.chomp.split( ';' )
}
lookup = {}
lines.each{ |data|
  key = data.find{ |value| /^E/ =~ value }
  lookup[ key ] ||= []
  lookup[ key ] << data
}
Gregory B. (Guest)
on 2007-02-06 17:53
(Received via mailing list)
On 2/6/07, Rebhan, Gilbert <removed_email_address@domain.invalid> wrote:
> AP850SDI;INCLIB;E023889;AP013;240107;0730
>         E050441
>         ...
>
> each collection should contain datasets with the rest of the line, so
> f.e.
>  E023889 would have =
>
> [AP850KP;INCLIB;AP013;240107;0730,AP850SDI;AP013;240107;0730]
>
> questions=
> what kind of collection is the best ? is an array sufficient ?

Just for fun, here's a Ruport example:

require "rubygems"
require "ruport"
DATA = <<-EOS
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
EOS

table = Ruport::Data::Table.parse(DATA, :has_names => false,
                                        :csv_options=>{:col_sep=>";"})

table.column_names = %w[c1 c2 c3 c4 c5 c6] # BUG! you shouldn't need
colnames

e = table.column(2).uniq
e.each { |x| table.create_group(x) { |r| r[2].eql?(x) } }

groups = table.groups

>> groups.attributes
>> ["E023889", "E052337", "E050441"]

>> groups["E023889"].map { |r| r[0] }
>> ["AP850KP", "AP850SDI"]

>> groups.each { |t| p t[0].c1 }
"AP850KP"
"AP850SD$"
"AP850SDA"

===============

note that in making this example, I found a small bug in Ruport's
grouping support which I will fix :)
Gavin K. (Guest)
on 2007-02-06 18:55
(Received via mailing list)
On Feb 6, 8:36 am, Drew O. <removed_email_address@domain.invalid> wrote:
>   lookup[ key ] << data
>
> }

Curses, I didn't read carefully enough. Right you are. (And, though
it's not clear from his example, he might not even need to split the
original line into arrays of pieces, but just keep the lines.)
Gavin K. (Guest)
on 2007-02-06 19:00
(Received via mailing list)
On Feb 6, 8:36 am, Drew O. <removed_email_address@domain.invalid> wrote:
> I think he wants to append this array with information each time he sees
> the same key [...]

So here's another version:

lookup = Hash.new{ |h,k| h[k]=[] }

DATA.each_line{ |line|
  line.chomp!
  warn "No key in '#{line}'" unless key = line[ /\bE\w+/ ]
  lookup[ key ] << line
}

p lookup[ "E050441" ]
#=> ["AP850SDA;INCLIB;E050441;AP013;240107;0730",
"AP850SDS;INCLIB;E050441;AP013;240107;0730"]

require 'pp'
pp lookup
#=> {"E050441"=>
#=>   ["AP850SDA;INCLIB;E050441;AP013;240107;0730",
#=>    "AP850SDS;INCLIB;E050441;AP013;240107;0730"],
#=>  "E052337"=>
#=>   ["AP850SD$;INCLIB;E052337;AP013;240107;0730",
#=>    "AP850SDO;INCLIB;E052337;AP013;240107;0730"],
#=>  "E023889"=>
#=>   ["AP850KP;INCLIB;E023889;AP013;240107;0730",
#=>    "AP850SDI;INCLIB;E023889;AP013;240107;0730"]}

__END__
AP850KP;INCLIB;E023889;AP013;240107;0730
AP850SD$;INCLIB;E052337;AP013;240107;0730
AP850SDA;INCLIB;E050441;AP013;240107;0730
AP850SDI;INCLIB;E023889;AP013;240107;0730
AP850SDO;INCLIB;E052337;AP013;240107;0730
AP850SDS;INCLIB;E050441;AP013;240107;0730
Rebhan, Gilbert (Guest)
on 2007-02-07 10:48
(Received via mailing list)
Hi,
Brian C. (Guest)
on 2007-02-07 11:43
(Received via mailing list)
On Wed, Feb 07, 2007 at 05:47:26PM +0900, Rebhan, Gilbert wrote:
> AP540RBP;INCLIB;E052337;AP013;240107;0730
>
> in the subfolder which is field 2
> the format might look like
> File.open("mycsvfile", "r").each do |line|
>         if line =~ /(\w+.?);(\w+);(\w+);(\w+);(\w+);(\w+)/
>
>          efas<<$3.to_s<<',' unless efas.include?($3.to_s)
>
> i get an array with all ticketnr
> then i create a folderstructure for every index in that array
> and put the files in it, but i don't get it.
>
> Any ideas ?

I'd do all the work on-the-fly. Untested code:

require 'fileutils'
SRCDIR="/path_to_src"
DSTDIR="/path_to_dst"

def copy_ticket(filename, folder, ticket, user, date, time)
  srcdir = SRCDIR + File::SEPARATOR + folder
  dstdir = DSTDIR + File::SEPARATOR + ticket + File::SEPARATOR + folder
  FileUtils.mkdir_p(dstdir)
  FileUtils.cp(srcdir + File::SEPARATOR + filename,
               dstdir + File::SEPARATOR + filename)

  # write out status file
  statusfile = dstdir + File::SEPARATOR + "status.txt"
  unless FileTest.exists?(statusfile)
    File.open(statusfile, "w") do |sf|
      sf.puts "user=#{user}"
      sf.puts "date=#{date}"
      sf.puts "time=#{time}"
    end
  end
end

def process_meta(f)
  f.each_line do |line|
    next unless line =~ /^(\w+);(\w+);(\w+);(\w+);(\w+);(\w+)$/
    copy_ticket($1,$2,$3,$4,$5,$6)
  end
end

# Main program
File.open("mycsvfile") do |f|
  process_meta(f)
end

If you want to build up a hash of ticket IDs seen, you can do that in
process_meta as well. I'd pass in an empty hash, and update it in the
each_line loop.

HTH,

Brian.
Rebhan, Gilbert (Guest)
on 2007-02-07 12:29
(Received via mailing list)
Hi,
Rebhan, Gilbert (Guest)
on 2007-02-07 13:33
(Received via mailing list)
Hi,
Brian C. (Guest)
on 2007-02-07 16:31
(Received via mailing list)
On Wed, Feb 07, 2007 at 07:28:03PM +0900, Rebhan, Gilbert wrote:
>   srcdir = SRCDIR + File::SEPARATOR + folder
>   dstdir = DSTDIR + File::SEPARATOR + ticket + File::SEPARATOR + folder
>   filename=filename<<EXT
> ...
>
> is there a better way ?

That's OK, just beware that the way you've done it you've modified the
string which was passed in. e.g.

a="foobar"
copy_ticket(a, "/tmp", "E123", "x", "y", "z")
puts a

will print "foobar.txt"

To avoid that:

   filename = filename + EXT

(which creates a new String object, and then updates the local variable
'filename' to point to this new object)

This is an interesting "small" file-chomping task. I wonder what the
equivalent Java program would look like :-)

B.
Rebhan, Gilbert (Guest)
on 2007-02-07 16:51
(Received via mailing list)
Hi,

>   filename=filename<<EXT
> ...
>
> is there a better way ?

/*
That's OK, just beware that the way you've done it you've modified the
string which was passed in. e.g.
...
*/

yup, i know, but somewhere i read  that
string concatenation via << would be better/quicker as +
because no new String object gets created.


Regards, Gilbert
Erik V. (Guest)
on 2007-02-07 18:16
(Received via mailing list)
Just an idea...

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

 hash =
 File.open("input.txt") do |f|
   f.readlines.collect do |line|
     k   = line.scan(/;(E\d+);/).flatten.shift
     v   = line.scan(/;E\d+;(.*)/).flatten.shift

     [k, v]
   end.select do |k, v|
     k and v
   end.inject({}) do |h, (k, v)|
     (h[k] ||= []) << v ; h
   end.inject({}) do |h, (k, v)|
     h[k] = v.join(",") ; h
   end
 end

 p hash
Erik V. (Guest)
on 2007-02-07 20:27
(Received via mailing list)
Nice abstraction... ;]

(By heart: This group_by is part of one of the Rails packages.)

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

 module Enumerable
   def hash_by(&block)
     inject({}){|h, o| (h[block[o]] ||= []) << o ; h}
   end

   def group_by(&block)
     #hash_by(&block).values
     hash_by(&block).sort.transpose.pop
   end
 end

 hash =
 File.open("input.txt") do |f|
   f.readlines.group_by do |line|
     line.scan(/;(E\d+);/)
   end.collect do |group|
     group.collect do |string|
       string.scan(/;E\d+;(.*)/).flatten.shift
     end.join(",")
   end
 end

 p hash
This topic is locked and can not be replied to.