Splitting a multirecord per file format to a single record p

randy_k · January 19, 2007, 4:30pm

I’m trying to write essentially what I guess you’d call a filter (or
maybe not
quite exactly). It needs to:

read multi-line records from a file (one record at a time)
then, with that one record:
- prepend some additional lines
- make substitutions for some of the lines already in the record
- grab some other portions of the record (less than a line, but
  usually
  multiple words), find the “non-null” pieces, and incorporate those in
  another header line
- create a unique filename
- write that (single) record to that file

I got started (maybe) by finding a likely looking piece of code in the
Ruby
Cookbook, and tried to modify it to fit my situation:

open(’/rhk/work/ask_notes/politics.twk’) { |f|
f.each(’\x80\x81\x82\x83’) { |
record| p record } }

At this point, I’m stuck, and need some clues to move forward. (In
addition,
I have a few not completely essential to understand questions, below.)

I think the next step is, within the code block / continuation (is that
(or
one of those) the right name?), to slurp the entire record into a
string,
prepend the additional lines, do the substitutions, …, and finally
write a
single record to the new filename.

Main Question:

Am I on the right track, or must I take some different approach to be
able to
process the content of a single record at a time? (I mean, I did a
little
experiment (possibly a bad experiment like this:

rec_num = 0

open(’/rhk/work/ask_notes/politics.twk’) { |f|
f.each(’\x80\x81\x82\x83’) { |
record| rec_num = rec_num + 1 } }

p rec_num

It only counts to one–instead of 70 to reflect the 70 records I know
are in
that particular file (and which are all printed out with the earlier
version
which has the line “{ |record| p record }”).

Other questions: (I could start a thread for each, but I’ll start this
way and
split them up if I either get too much or not enough response

What is the right name for that construction: is that a
continuation, a
(code?) block, or something else. (Is it possibly that Ruby calls this
a
code block and some other languages call it a continuation, or it is an
example of one kind of continuation available in Ruby?)
What’s the story on white space in that kind of structure. I
experimented
with trying to format it to make it (possibly) easier to read, something
like
this:

open(’/rhk/work/ask_notes/politics.twk’) {
|f| f.each(’\x80\x81\x82\x83’) {
|record| p record

     <anticipated location of code to process a single record>

}
}

But any whitespace (i.e., newlines) that I added just caused syntax
errors.
Is there a way to “prettyformat” that structure?

The content of the files I have to convert is actually more like
this:

Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header (’\x80\x81\x82\x83’)

Record (with blank lines)

The Ruby code that I copied from the Ruby Cookbook is more aimed at
separating
records that end with a record separator (instead of starting with a
record
header). I can work this way–I mean, worst case I modify every input
file
to do something like remove the first record header from the file and
add a
record header at the end of the file, but that’s probably not really
necessary.

But, it seems like I’m using not quite the right tool. Is there a
better
approach that more exactly fits the format of my files?

Thanks!
Randy K.

randy_k · January 19, 2007, 4:31pm

On 12.01.2007 15:02, Randy K. wrote:

  * create a unique filename
  * write that (single) record to that file

[…]

(trailing blank line)
Record header (’\x80\x81\x82\x83’)

Record (with blank lines)

You could do:

create a class for your records or use OpenStruct

YourRecord = Struct.new :name, :length, :foo, :bar
def dump()
File.open(file_name, “w”) do |io|
# whatever
end
end
end

current = nil

File.foreach(‘your file’) do |line|
line.chomp!

case line
when /^$/
current = YourRecord.new
when /^$/
current.dump
current = nil
when /Record header/
…
else
# ignore or whatever
end
end

Kind regards

robert

randy_k · January 19, 2007, 4:31pm

On Friday 12 January 2007 09:15 am, Robert K. wrote:

On 12.01.2007 15:02, Randy K. wrote:

I’m trying to write essentially what I guess you’d call a filter (or
maybe not quite exactly). It needs to:

You could do:

Thanks–that will get me started!

Randy K.