Efficient access to a large int32 array


#1

I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

martin


#2

On 16.01.2007 15:10, Martin DeMello wrote:

I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

How large is “large”?

robert


#3

Martin DeMello wrote:

I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

I would write my own extension and use mmap (if you’re under linux) to
save trouble and memory. That’s just a question of a few hours, and
you’ll get high performance.

Cheers,

Vince, in a posting mood today…


#4

On 1/16/07, Robert K. removed_email_address@domain.invalid wrote:

On 16.01.2007 15:10, Martin DeMello wrote:

I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

How large is “large”?

Order of 1 MB - not gigantic, but not trivial either.

martin


#5

On 16.01.2007 15:29, Martin DeMello wrote:

How large is “large”?

Order of 1 MB - not gigantic, but not trivial either.

Um, I would not say 1MB is large. What accesses do you do? Do you just
access based on index? Do you often need to load this data? etc.

In your case I’d start out writing a Ruby version (either using a single
String or an Array with integers) and change that into an extension if
performance is insufficient. My 0.02 EUR…

Kind regards

robert


#6

“Martin DeMello” removed_email_address@domain.invalid writes:

I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

I think Ara has been hacking some package to use Narray on mmap’ed
files, which works fast and is persistent. Search the archives.


#7

On 1/16/07, Christian N. removed_email_address@domain.invalid wrote:

I think Ara has been hacking some package to use Narray on mmap’ed
files, which works fast and is persistent. Search the archives.

http://codeforpeople.com/lib/ruby/nmap/

You’ll need to install Guy’s mmap package …

http://moulon.inra.fr/ruby/mmap.html

Blessings,
TwP


#8

On 1/16/07, Robert K. removed_email_address@domain.invalid wrote:

Um, I would not say 1MB is large. What accesses do you do? Do you just
access based on index? Do you often need to load this data? etc.

Pure array indexing, just a lot of it.

In your case I’d start out writing a Ruby version (either using a single
String or an Array with integers) and change that into an extension if
performance is insufficient. My 0.02 EUR…

Good point - it’d be the quickest thing to get up and running, if
nothing else :slight_smile:

martin