I have a log file which is text based which has records in two formats
of the following form A|B|C|D\n A|B|C|D|E\n \n Exception\n \n \tstack trace line1\n \tstack trace line2\n \tstack trace line3\n \n A|B|C|D\n
The first form (A|B|C|D) has statically defined columns delimited by a
pipe symbol. The second form has the last character “E” which implies an
exception record. If it is an exception record the information about the
exception follows. The exception information starts with a line
“Exception”, followed by another newline and stacktrace on multiple
lines. Each stacktrace element starts with a tab.
I am parsing this file with ruby. Currently I am reading line by line
and building the log records. This is working fine.
I am wondering if I could rely on regular expressions to do it instead
of reading line by line - I could read a chunk of the file and apply two
regular expressions to see if there is a match and if I find the match
process the record and move to the next record. If there is no match,
then I combine multiple chunks until I find a match. Is this approach a
valid
consideration? Is this doable with Ruby? If there are any open source
projects, that do something like this, can someone point me to it? Also
any thoughts which one is more efficient and why? Appreciate any
feedback.
\tstack trace line3\n
I am parsing this file with ruby. Currently I am reading line by line
and building the log records. This is working fine.
I am wondering if I could rely on regular expressions to do it instead
of reading line by line - I could read a chunk of the file and apply two
regular expressions to see if there is a match and if I find the match
process the record and move to the next record. If there is no match,
then I combine multiple chunks until I find a match. Is this approach a
valid consideration?
Question is: why do you want to do that? Line based parsing is simple
and has the advantage that you always get a complete record. Note
also that underneath Ruby uses buffered reading - just in case you
wonder about IO efficiency.
Is this doable with Ruby?
Yes, certainly.
If there are any open source
projects, that do something like this, can someone point me to it? Also
any thoughts which one is more efficient and why? Appreciate any
feedback.
My implementation of this would use a single regular expression with
an optional part for the “|E”. That way you need to match only once
and you can immediately distinguish record types.
untested
Record = Struct.new :a, :b, :c, :d, :e
last = nil
ex = false
def parse
ARGF.each do |line|
if %r{^([^|])|([^|])|([^|])|([^|])(|E)?} =~ line
ex = $5
r = Record.new $1, $2, $3, $4
r.e = “” if ex
yield last if last
last = r
elsif ex
last.e << line
else
warn "Dunno what to do with line %{line.inspect}"
end
Thanks for the prompt response. Apprecite your taking the time to
respond with sample code. I have just started on this as a pet project
to learn Ruby. The task is to build a log analysis web application. The
log file is not a standard one - in the sense that it is dynamically
constructed where some columns are optional, but all of them are
seperated by ‘|’ character. Initially I am starting with reading a
static file but at some point my plan is to use SSH to read the live
file contents and provide realtime inforation. So I was considering what
other alternatives might work well in the realtime scenario as well.
Robert K. wrote in post #979584:
My implementation of this would use a single regular expression with
an optional part for the “|E”. That way you need to match only once
and you can immediately distinguish record types.
untested
Record = Struct.new :a, :b, :c, :d, :e
last = nil
ex = false
def parse
ARGF.each do |line|
if %r{^([^|])|([^|])|([^|])|([^|])(|E)?} =~ line
ex = $5
r = Record.new $1, $2, $3, $4
r.e = “” if ex
yield last if last
last = r
elsif ex
last.e << line
else
warn "Dunno what to do with line %{line.inspect}"
end
end
yield last if last
end
parse do |rec|
p rec
end
Cheers
robert
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.