Parsing text files

I have some textfiles with about the following format

Lorem ipsum|And so on|And on|And on
5|3|4|77|2|3|5
More lorem|And more ipsum|And so forth

Just several more lines :slight_smile:

What would be the easiest way to parse these files? Just iterate line
by line and regex-split on ā€œ|ā€?
There are also some of the files that might have the number 5 on line
3 (for example) which means that there will be 5 ā€œblocksā€ of
information (spanning say 3 lines per block), starting with line 4…

Any suggestions?

Christian…

Hi Christian,

So you have 2 formats to parse: a single-line format and a multi-line
format. It sounds like you could just use String.split for the single
line
format:

data = line.split(ā€˜|’)

after which, data will be an array with an element for each column. For
line 2 of your example below, the array would be [5, 3, 4, 77, 2, 3, 5].
See here for more info on split:
class String - RDoc Documentation.

For the second format, how do you know the number of lines each block
will
span?

-Dan

Yes I thought the simplest ones would be simple :wink:

The number of lines is a constant, if there is no info in a line,
there is just a newline there.
So a 4 line block could look like this

Line1
Line2

Line4

Christian…

From your original email:

There are also some of the files that might have the number 5 on line
3 (for example) which means that there will be 5 ā€œblocksā€ of
information (spanning say 3 lines per block), starting with line 4.

Does the 5 mean 5 ā€œblocksā€ of information, or 5 lines of information?
I’ll
assume 5 lines for now, let me know if that’s incorrect.

This isn’t the prettiest code, but it should work:

in_block = false
count = 0
max = nil
open(file).each do |line|
if in_block
# do something with the next line in the block, could add it to an
array, etc…
# update block counters
count = count + 1
in_block = false if count == max # exit block if necessary, could
process the block now
next
end
if line =~ /\d+$/
# entering a block
count = 1
max = line.strip.to_i
in_block = true
else
# parse a regular line
data = line.split(’|’)
# do something with the line data here…
end
end

-Dan