Design question - mmap, insert/delete editing on huge binary

kelleysislander · March 4, 2007, 6:42pm

Hi,

I’d like to create Ruby bindings for a library that may or
may not exist yet.

What I’d like to be able to do, is make an arbitrary number
of insert/delete (cut/paste) style edits on huge binary
files (> 4 GB). There should be minimal overhead for
initially “opening” the file, and the inserts and
deletes should be very fast, until the changes are finally
committed.

(Further, on commit, if the edits to the file did not cause
existing data to be shifted to different offsets–i.e. the
edits changed some existing bytes but did not shift the
file contents, then the commit should be optimized by just
patching/updating the changed parts of the file, rather than
rewriting the whole file.)

If anyone knows of an open source (non-GPL) library that
already does this sort of thing, that would be fantastic.

If not, here’s how I’m thinking of implementing it. If
anyone has thoughts on better ways to implement it, feedback
is welcome.

For the sake of illustration, let’s consider a Hex editor
implemented on top of this library, performing a very
simple insert operation:

We have a 10 GB file, and we want to run the Hex editor
and jump to about 5 GB into the file, and insert a byte
(shifting the remaining 5GB up by one byte.)

From the user’s point of view, there should be no delay
when opening the file, jumping ahead 5GB and inserting
the byte. The user should see a view representing the
edited data, and there should be no delays until the edits
are actually committed (Save key pressed.)

What I figure on doing is mmap’ing a swap file, and
accessing the swap file in getpagesize()-sized blocks.

I have four kinds of block types in mind for the pagefile:

directory block (the zeroth block, contains offset-ptrs
to the first data block, and first free block)
free block (points to next free block if any)
virtual data block (represents arbitrary-length span
of data in original file)
edit data block (holds literal data pasted by the user)

Initially, the page file would consist of the directory
block, and a single virtual data block representing the
entirety of the original file.

As edits are made, the virtual data block will be split,
and a linked list of virtual- and edit- data blocks will
be formed, which if walked linearly, represent the edited
version of the file.

In our hex-editing example with the insertion of one byte
midway through the 10GB file, we’d see something like this:
(Note: 10GB == 0x280000000)

Initially,

blk#
0 [dir] [<0001><0000>______________________]
1 [virt] [<0000><0000>(000000000)(27fffffff)]

The values represent block offsets within the pagefile.
(Not sure if they will be 32- or 64- bit quantities.)

In the case of the [dir] block, the <0001> points to the first
in a linked list of data blocks, and the <0000> would point
to the first in a linked list of free blocks, but there
aren’t any free blocks initially.

In the case of the [virt] block, the <0000><0000> are next/prev
pointers in a doubly linked list of virtual- and edit- blocks.

The (mmmmmmmmmmm) values in the [virt] block represent 64-bit
offsets into the original file. (The remainder of the virtual
block is unused.)

So our page file starts off simply representing the entirety
of the original file.

When our hex-editing user inserts a byte at the 5GB point,
our virtual block will be split, and an edit block containing
the literal byte from the user will be inserted, as follows:
(Note: the user inserted a byte with value 42, ASCII ‘*’.)

blk#
0 [dir] [<0001><0000>]
1 [virt] [<0002><0000>(000000000)(13fffffff)]
2 [edit] [<0003><0001>(001)*]
3 [virt] [<0000><0002>(140000000)(27fffffff)______]

If we walk the list of data blocks linearly, we see the
first 5GB of the original file, the single byte inserted
by the user, and finally the remaining 5GB of original
file.

Since edit blocks hold literal data, they can hold up
to (getpagesize() - n) bytes, where n is the overhead
for the linked list ptrs and length count.

Anyway: it seems to me this structure should allow me to
handle arbitrary inserts/deletes with good speed. And with
a little logic to handle coalescing neighboring edit blocks
we should be able to avoid ending up with a bunch of
partially-full edit blocks linked together wasting space.

Anyway–if you made it this far, thanks very much for
your interest!

If anyone has thoughts about weak points of this design or
how to improve it, your feedback is most welcome.

This will be my first project actually using mmap, even
though I’ve wanted to have a reason to use it for a
decade or two now.

One concern: I hope that mmap is pretty fast, as my naive
approach would be to change which block in the page file
I’m “looking at” very often.

Regards,

Bill

kelleysislander · March 4, 2007, 7:12pm

On 3/4/07, Bill K. [email protected] wrote:

committed.
This is a very interesting read on the topic:

http://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.html

martin

kelleysislander · March 4, 2007, 7:24pm

Bill K. wrote:

If anyone has thoughts about weak points of this design or
how to improve it, your feedback is most welcome.
Pardon my ignorance in this area…

It seems like this will cause quite a bit of data fragmentation
quickly (slowing down reads).
Once you’ve made the edit in swap space, you still have to persist
the binary to the filesystem, no? And that’s fixed block size, correct?

Devin

kelleysislander · March 5, 2007, 12:04am

From: “Martin DeMello” [email protected]

An Efficient Data Structure For A Hex Editor

Wow, and source code to boot!

Thanks, interesting indeed.

I’m not sure the O(N/K) linear scan problem he’s solving with
trees would be a problem with my proposal in practicality, for
a hex editor, anyway.
(I wouldn’t end up with a ton of nodes in the swap file unless
the user performed a ton of sporradic edits all over the file;
and even then, access to the data isn’t really very random
unless the user is winging the scrollbar all over the place.)

But I must say I expected his code to look a lot more
complicated after reading his paper. His implementation looks
very clean considering what it’s doing.

Thanks for the link! In the words of Bomb#20, “I must think
on this further…”

Regards,

Bill

kelleysislander · March 4, 2007, 8:26pm

On Mon, 5 Mar 2007, Bill K. wrote:

committed.

i do exactly this using guy’s ruby mmap bindings all the time, check out
the
api - it might already do much of what you want to do. one issue you
may have
is addressing - since c ints map to ruby fixnums i’m not sure you’ll be
able
to map files quite this big with the pure ruby impl (not that it
couldn’t be
fixed and guy may already have fixed it). anyhow, you could almost
certainly
overcome this by mapping using length offset. i otherwords, map 4
chunks of
1gb.

http://moulon.inra.fr/ruby/mmap.html

btw. it’s fast, look at the last example here

http://codeforpeople.com/lib/ruby/nmap/nmap-0.1.0/README

regards.

-a

kelleysislander · March 5, 2007, 1:00am

From: [email protected]

is addressing - since c ints map to ruby fixnums i’m not sure you’ll be able
to map files quite this big with the pure ruby impl (not that it couldn’t be
fixed and guy may already have fixed it). anyhow, you could almost certainly
overcome this by mapping using length offset. i otherwords, map 4 chunks of
1gb.

http://moulon.inra.fr/ruby/mmap.html

Thanks! Looking at the source, I think it may need to be
using mmap64 to handle specifying offsets in files > 4GB,
although I’m not positive. (I don’t need to map a huge
amount of the file at once, just to specify huge offsets.)

Also I’d need to get it working on Windows too (I wonder
if guy would even accept a patch for that, .)

btw. it’s fast, look at the last example here

http://codeforpeople.com/lib/ruby/nmap/nmap-0.1.0/README

Nice…!

Regards,

Bill

kelleysislander · March 5, 2007, 3:31am

On Mon, 5 Mar 2007, Bill K. wrote:

the
1gb.

http://moulon.inra.fr/ruby/mmap.html

Thanks! Looking at the source, I think it may need to be
using mmap64 to handle specifying offsets in files > 4GB,
although I’m not positive. (I don’t need to map a huge
amount of the file at once, just to specify huge offsets.)

Also I’d need to get it working on Windows too (I wonder
if guy would even accept a patch for that, .)

you can wrap it and the windows version that daniel berger did - it’s in
rubyforge.

cheers.

-a

kelleysislander · March 5, 2007, 12:29am

Hi,

From: “Devin M.” [email protected]

Bill K. wrote:

If anyone has thoughts about weak points of this design or
how to improve it, your feedback is most welcome.
Pardon my ignorance in this area…

It seems like this will cause quite a bit of data fragmentation
quickly (slowing down reads).

Hmm, I wonder. On the plus side, any fragmentation in the swap
file would disappear whenever the user hit the Save button. (Because
the swap file would revert its original state of simply referencing
the entirety of the saved file.) Also, I’d be presuming the OS’
Virtual Memory Manager would be keeping the blocks in the swap file
cached to a degree… But thanks for pointing that out. I don’t
know how to guess how bad it would be. For an application like an
editor, I’d think it might not be noticable.

Once you’ve made the edit in swap space, you still have to persist
the binary to the filesystem, no? And that’s fixed block size, correct?

The user would be able to make an arbitrary number of edits, but
we would have to eventually commit when they hit the Save button.

So the Save would be slow, but I don’t know how to avoid that.
At least it happens at a defined point, and the user can go get
a cup of coffee or something.

Regards,

Bill

kelleysislander · March 5, 2007, 3:57am

From: [email protected]

you can wrap it and the windows version that daniel berger did - it’s in
rubyforge.

Awesome. That Daniel B. fellow is on the ball.

Regards,

Bill