Replace Text at Specific Positions Across Files

shinyhydra · March 17, 2010, 7:27pm

Hello everyone,

I’m new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. I’m currently working on a project that
essentially deals with a relational DB in text format. There is a
standard layout throughout a variety of text files and each of them have
corresponding information. For example, if positions 10-17 is populated
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. There can be multiple
lines, each relating to a different object.

After looking through the file and directory classes I can’t find a
obvious way to code this. How would I write/overwrite a specific
position number in a certain extension based on information from another
file? I know to read the information I just use readlines and store the
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I’m stuck. I
apologize for the basic question, but I would greatly appreciate the
help.

Thanks!

shinyhydra · March 18, 2010, 11:18am

2010/3/17 Shiny H. [email protected]:

I’m new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. I’m currently working on a project that
essentially deals with a relational DB in text format. There is a
standard layout throughout a variety of text files and each of them have
corresponding information. For example, if positions 10-17 is populated
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. There can be multiple
lines, each relating to a different object.

So your file has fixed width records? This is important to know,
otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).

After looking through the file and directory classes I can’t find a
obvious way to code this. How would I write/overwrite a specific
position number in a certain extension based on information from another
file? I know to read the information I just use readlines and store the
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I’m stuck. I
apologize for the basic question, but I would greatly appreciate the
help.

You have basically two options:

do it in memory
do it on disk

ad 1: You can read a complete file by using IO#read into a String,
then you manipulate it via String manipulation methods and write it
out. This obviously only works up to a certain file size.

ad 2: Use IO#seek to find the proper write position and use IO#write
to overwrite bytes at this position.

Btw, is there a particular reason why you create what looks like a
relational database based on text files?

Kind regards

robert

shinyhydra · March 18, 2010, 2:52pm

So your file has fixed width records? This is important to know,
otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).

Yes, the records are all fixed width. The width of the file is based on
the extension type.

Btw, is there a particular reason why you create what looks like a
relational database based on text files?

I’m working off of a standard format that has been used for years
(called Mail.dat). Editing the files has been an extremely time
consuming process, so I’m trying to write an automated script to batch
replace specific parameters. After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

I’m currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. I tried looking through the classes and methods online, but
without a strong foundation in the language it’s difficult to navigate
that amount of information. If you could provide any additional
information it would be immensely helpful.

Thanks again!

shinyhydra · March 18, 2010, 5:30pm

2010/3/18 Shiny H. [email protected]:

I’m working off of a standard format that has been used for years
(called Mail.dat). Editing the files has been an extremely time
consuming process, so I’m trying to write an automated script to batch
replace specific parameters. After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

That’s good! I hope you continue to enjoy your journey.

I’m currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. I tried looking through the classes and methods online, but
without a strong foundation in the language it’s difficult to navigate
that amount of information. If you could provide any additional
information it would be immensely helpful.

You could start with searching the archives of ruby-talk for “File”
and “seek”. That should give you some bits of code which deal with
file IO different from sequentially reading or writing.

Kind regards

robert

shinyhydra · March 19, 2010, 3:05pm

This got me excited, my file manipulation isn’t very good, so thought
I’d
give it a try. Here is what I got file.rb · GitHub

Thank you very much for writing up a sample of how I could begin making
a script to manipulate this data. This provides a fantastic spot to jump
in and begin building this application!

shinyhydra · March 18, 2010, 9:27pm

On Thu, Mar 18, 2010 at 11:29 AM, Robert K.
[email protected]wrote:

relational database based on text files?
I’m currently working through the book Beginning Ruby: From Novice to
Kind regards

robert

–
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

This got me excited, my file manipulation isn’t very good, so thought
I’d
give it a try. Here is what I got file.rb · GitHub

shinyhydra · March 19, 2010, 4:33pm

2010/3/19 Shiny H. [email protected]:

This got me excited, my file manipulation isn’t very good, so thought
I’d
give it a try. Here is what I got file.rb · GitHub

Thank you very much for writing up a sample of how I could begin making
a script to manipulate this data. This provides a fantastic spot to jump
in and begin building this application!

See also
http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert

shinyhydra · March 19, 2010, 5:07pm

On Mar 19, 2010, at 10:31 AM, Robert K. wrote:

which did not (yet) make it into the mailing list.
Yeah, the Gateway is having trouble talking to our Usenet host. I’ve
emailed him about it.

James Edward G. II

shinyhydra · March 19, 2010, 5:10pm

On Fri, Mar 19, 2010 at 10:31 AM, Robert K.
[email protected]wrote:

which did not (yet) make it into the mailing list.

Kind regards

robert

–
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hi, Robert

I don’t have the time to go completely through your code. Few remarks
anyway:

That is fine, I appreciate what you did take the time to go through

It is not clear to me why you have MailRecord composed of Records.
I’d probably rather have picked another name, e.g. MailFile. Class
comments would also help.

That name would probably make sense. I’m not sure what you mean by Class
comments, though.

Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

Well, each line must match the line length, or when we try to pull
specific
attributes from that record they will not be correct. If each line is
the
correct length, then the record is the correct length.

You can use Struct to easily define Record in less lines

Record = Struct.new :title , :id , :from , :to , :offset

That makes a lot of sense, thanks for the tip, I default to OO first,
because that is what I am most familiar with, so structs don’t come
quickly
to mind. Though I have read your article probably three times
http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html it
usually
only comes to mind it when my structure is very dynamic, ie OpenStruct

Your MailRecord is a good abstraction of the file storage.

Thank you

I would not store the position of the record in the Record instance
because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.

I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I’d place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

I understand your points here, but I am having difficulty thinking of a
way
to implement this. Perhaps the record should have a reference to the
MailRecord (or MailFile, as you suggest), and then tell the file to
write
itself? But the position thing still seems to be an issue.

Maybe the problem is that I am considering it to be almost an array of
files
based on position in the file, but I should remove index/offset from
consideration and instead consider it as a set, where ordering is
arbitrary
and can be altered as necessary to accommodate encapsulated logic and
data
integrity.

I’m not really sure what a proper approach would look like here.

Anyway, thanks for taking the time to look and comment

shinyhydra · March 19, 2010, 5:11pm

On Fri, Mar 19, 2010 at 11:07 AM, Josh C. [email protected]
wrote:

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.Anyway, thanks for
taking the time to look and comment

I just realized this is false, because I was using readline. Initially I
was
calculating the offset of each attribute, but then I removed that. So
you
are correct, I could just validate the total length. It would be easier
and
cleaner.

shinyhydra · March 20, 2010, 1:47pm

On 03/18/2010 09:27 PM, Josh C. wrote:

This got me excited, my file manipulation isn’t very good, so thought I’d
give it a try. Here is what I got file.rb · GitHub

I don’t have the time to go completely through your code. Few remarks
anyway:

It is not clear to me why you have MailRecord composed of Records.
I’d probably rather have picked another name, e.g. MailFile. Class
comments would also help.
Your method #validate is invoked on individual fields but you rather
want to check the complete record length.
You can use Struct to easily define Record in less lines

Record = Struct.new :title , :id , :from , :to , :offset

I would not store the position of the record in the Record instance
because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.
Your MailRecord is a good abstraction of the file storage.
I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I’d place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

Kind regards

robert

shinyhydra · March 20, 2010, 2:04pm

On 03/19/2010 05:07 PM, Josh C. wrote:

a script to manipulate this data. This provides a fantastic spot to jump
http://blog.rubybestpractices.com/
I’d probably rather have picked another name, e.g. MailFile. Class
comments would also help.

That name would probably make sense. I’m not sure what you mean by Class
comments, though.

A comment that describes what the class is about.

Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.

So you are saying that all fields in the record have the same length? I
thought LINE_WIDTH would refer to the record’s length.

I’m not really sure what a proper approach would look like here.

Maybe you could have a method in MailFile that writes record n.
Internally it would seek to n * record length and then write the record
passed. Your caching functionality (i.e. keeping read records) could
probably better go into another class - which potentially wraps
MailFile.

Kind regards

robert