Need Help with creating classes for log parsing

I am creating a ruby script to parse ftp logs collected on client
computers. I am very new to Ruby but have the basics of the script
functioning. However, it feels very procedural. Basically the script
recursively searches directories for files and then begins parsing them.
I would like to use this script as a basis to learn more OOP but my
attempts so far have not worked.

Eventually, I want to submit the results to a database for other uses.

Any advice you could give would be greatly appreciated.

Thanks

Example of the log being parsed:
Connected to ftp.server.com.
220 ftp-server-8 FTP server (Version 6.00LS) ready.
331 Password required for ftpuser.
230 User ftpuser logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Type set to I.
250 CWD command successful.
250 CWD command successful.
250 CWD command successful.
local: /Volumes/Mvol/Test/BLT/10-0000/123654-BLT-10-0000–A.mov remote:
123654-BLT-10-0000–A.mov
229 Entering Extended Passive Mode (|||58539|)
150 Opening BINARY mode data connection for ‘123654-BLT-10-0000–A.mov’
(23448938061 bytes).
226 Transfer complete.
23448938061 bytes received in 04:15 (87.63 MB/s)
221 Goodbye.

Current Code:

def proc_dir(dir_name)
Dir.foreach(dir_name) do |dir|
dir_path = dir_name + ‘/’ + dir
if File.directory?(dir_path) then
if dir != ‘.’ && dir != ‘…’ then
proc_dir(dir_path)
end
else
log_file = File.new(dir_path)
parsed_log =[]
log_file.each do |line|
if line =~ /^Connected./
remote_server = line.sub!(/^Connected to /,‘’).chomp.chop
parsed_log << remote_server
end
if line =~ /^Remote.
/
remote_type = line.split(/\W/).last
parsed_log << remote_type
end
if line =~ /^local:/
local_file = line.sub(/ remote.*/, ‘’).sub!(/^local: /,
‘’).chomp
remote_file = line.sub(/^local:.*remote: /, ‘’).chomp
parsed_log << local_file << remote_file
end
if line =~ /bytes received/
line = line.split(/\s/)
bytes_transferred = line[0]
transfer_time = line[-3]
avg_transfer_rate = line[-2,2].join(’ ‘).gsub(/(|)/,’')
parsed_log << bytes_transferred << transfer_time <<
avg_transfer_rate
end
end
puts parsed_log.join(“\t”)
end
end
end

------

proc_dir(‘/Volumes/Mvol/log’)

I’d say you’ve got this about right, and there’s no need for “objects”
here. An object would be something which keeps state - for example, if
you want to open a file and then call a method to parse one line, so it
keeps the open file between calls.

There are however several nice ruby idioms you could use.

(1) case statements:

case line
when /foo/
do_this
when /bar/
do_that
end

The case statement uses the === operator with the arguments reversed,
i.e. the above expands to

if /foo/ === line
do_this
elsif /bar/ === line
do_that
end

(2) regexp captures:

Rather than chomp’ing and sub’ing the lines, let the regexps do the work
for you.

case line
when /^Connected to (.+).$/
parsed_log << $1
when … etc

end

(3) using ‘next’ to break out of a loop

A bit tidier than a big hanging ‘else’ clause:

if File.directory?(dir_path) then
  if ['.', '..'].include?(dir)
    proc_dir(dir_path)
  end
  next
end
... carry on processing files here

Perhaps where there is an argument for using an object is for
parsed_log, say a Struct rather than an Array. Then it can have members
for each of the elements of interest, making it easier to use by the
consumer. But that may be overkill if you’re just writing out a
tab-separated line as you’re doing.

HTH,

Brian.

Oops, I got that ‘if’ condition the wrong way round. But also:

(4) for single expressions you can put the ‘if’ or ‘unless’ condition at
the end of the line.

if File.directory?(dir_path)
proc_dir(dir_path) unless [’.’, ‘…’].include?(dir)
next
end

Thanks very much for your feedback. The suggestions have greatly
simplified what I had written.

As for the tab delimited file that is just a temporary holding place.
Though I have yet to begin this, my intent is to submit this to a
database. Would a Struct as opposed to an Array be more beneficial to
this process?

Thanks again for your help.

Brad

Brian C. wrote in post #1008694:

I’d say you’ve got this about right, and there’s no need for “objects”
here. An object would be something which keeps state - for example, if
you want to open a file and then call a method to parse one line, so it
keeps the open file between calls.

There are however several nice ruby idioms you could use.

(1) case statements:

case line
when /foo/
do_this
when /bar/
do_that
end

The case statement uses the === operator with the arguments reversed,
i.e. the above expands to

if /foo/ === line
do_this
elsif /bar/ === line
do_that
end

(2) regexp captures:

Rather than chomp’ing and sub’ing the lines, let the regexps do the work
for you.

case line
when /^Connected to (.+).$/
parsed_log << $1
when … etc

end

(3) using ‘next’ to break out of a loop

A bit tidier than a big hanging ‘else’ clause:

if File.directory?(dir_path) then
  if ['.', '..'].include?(dir)
    proc_dir(dir_path)
  end
  next
end
... carry on processing files here

Perhaps where there is an argument for using an object is for
parsed_log, say a Struct rather than an Array. Then it can have members
for each of the elements of interest, making it easier to use by the
consumer. But that may be overkill if you’re just writing out a
tab-separated line as you’re doing.

HTH,

Brian.

Brad A. wrote in post #1009286:

As for the tab delimited file that is just a temporary holding place.
Though I have yet to begin this, my intent is to submit this to a
database. Would a Struct as opposed to an Array be more beneficial to
this process?

Perhaps; but depending on what database access layer you use, it may
give you an ORM model class anyway. e.g.

l = FTPLogEntry.new
l.remote_server = “ftp.example.com
l.remote_type = “UNIX”
l.save!

You can do this sort of stuff with ActiveRecord, Datamapper, Sequel…

Otherwise, a Struct or OpenStruct may keep things tidier, if you end up
doing some low-level SQL like:

l = OpenStruct.new
l.remote_server = “ftp.example.com
l.remote_type = “UNIX”

c.execute(“insert into ftplog (remote_server,remote_type) values
(?,?)”,
l.remote_server, l.remote_type)

(e.g. directly using one of the low-level APIs like mysql-ruby, or
ruby-dbi; however I’d say most people go for the ORMs these days)

Regards,

Brian.