On Nov 12, 2012, at 5:21 PM, Ruby S. wrote:
You can get quite efficient using the important fact that the files are
already sorted by your key value.
Your problem is a slight variation on the merge phase of an External
Let’s name the two files P (as in “previous”) and N (as in “new”).
At each step, you have the next record from P and from N.
One of three things will be true:
the keys match
Compare the rest of the records and
a) write the update if different
b) do nothing if all are the same
read the next P and the next N
Pkey < Nkey
Pkey was missing from N
Delete the record P (update to nothing?) if deletions can happen
read the next P (only; use the same N for the next iteration)
Pkey > Nkey
Nkey was missing from P
Insert a new record N
read the next N (only; use the same P for the next iteration)
If the end of file is reached for only one of the files, treat the key
as always greater than the other from the not-yet-at-end file.
When both ends-of-file have been reached, you are done.
Close the files.
Of course, this depends on being able to read the files one record at a
time (or having a buffer to do so). Since it seems like these might be
line-oriented files, I suspect you’re OK here.