Forum: Ruby efficient access to a large int32 array

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2007-01-16 15:11
(Received via mailing list)
I have a file that is a dump of a large array of 32-bit unsigned ints
(bitvectors actually). All I need is to load this into memory and have
fast, efficient access to the array members, and a few bitmasking
operations - the array will never be written to, and ideally should
not need to be bounds-checked. Should I be looking at something like
mmap or narray to do this, or would I be better served by a custom C
extension?

martin
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-01-16 15:16
(Received via mailing list)
On 16.01.2007 15:10, Martin DeMello wrote:
> I have a file that is a dump of a large array of 32-bit unsigned ints
> (bitvectors actually). All I need is to load this into memory and have
> fast, efficient access to the array members, and a few bitmasking
> operations - the array will never be written to, and ideally should
> not need to be bounds-checked. Should I be looking at something like
> mmap or narray to do this, or would I be better served by a custom C
> extension?

How large is "large"?

  robert
Ea24c17719a975fb38c107a60f4b3802?d=identicon&s=25 Vincent Fourmond (Guest)
on 2007-01-16 15:17
(Received via mailing list)
Martin DeMello wrote:
> I have a file that is a dump of a large array of 32-bit unsigned ints
> (bitvectors actually). All I need is to load this into memory and have
> fast, efficient access to the array members, and a few bitmasking
> operations - the array will never be written to, and ideally should
> not need to be bounds-checked. Should I be looking at something like
> mmap or narray to do this, or would I be better served by a custom C
> extension?

  I would write my own extension and use mmap (if you're under linux) to
save trouble and memory. That's just a question of a few hours, and
you'll get high performance.

  Cheers,

  Vince, in a posting mood today...
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2007-01-16 15:30
(Received via mailing list)
On 1/16/07, Robert Klemme <shortcutter@googlemail.com> wrote:
> On 16.01.2007 15:10, Martin DeMello wrote:
> > I have a file that is a dump of a large array of 32-bit unsigned ints
> > (bitvectors actually). All I need is to load this into memory and have
> > fast, efficient access to the array members, and a few bitmasking
> > operations - the array will never be written to, and ideally should
> > not need to be bounds-checked. Should I be looking at something like
> > mmap or narray to do this, or would I be better served by a custom C
> > extension?
>
> How large is "large"?

Order of 1 MB - not gigantic, but not trivial either.

martin
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-01-16 16:16
(Received via mailing list)
On 16.01.2007 15:29, Martin DeMello wrote:
>> How large is "large"?
>
> Order of 1 MB - not gigantic, but not trivial either.

Um, I would not say 1MB is large.  What accesses do you do?  Do you just
access based on index?  Do you often need to load this data? etc.

In your case I'd start out writing a Ruby version (either using a single
String or an Array with integers) and change that into an extension if
performance is insufficient.  My 0.02 EUR...

Kind regards

  robert
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (Guest)
on 2007-01-16 16:22
(Received via mailing list)
On 1/16/07, Robert Klemme <shortcutter@googlemail.com> wrote:
>
> Um, I would not say 1MB is large.  What accesses do you do?  Do you just
> access based on index?  Do you often need to load this data? etc.

Pure array indexing, just a lot of it.

> In your case I'd start out writing a Ruby version (either using a single
> String or an Array with integers) and change that into an extension if
> performance is insufficient.  My 0.02 EUR...

Good point - it'd be the quickest thing to get up and running, if
nothing else :)

martin
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2007-01-16 20:25
(Received via mailing list)
"Martin DeMello" <martindemello@gmail.com> writes:

> I have a file that is a dump of a large array of 32-bit unsigned ints
> (bitvectors actually). All I need is to load this into memory and have
> fast, efficient access to the array members, and a few bitmasking
> operations - the array will never be written to, and ideally should
> not need to be bounds-checked. Should I be looking at something like
> mmap or narray to do this, or would I be better served by a custom C
> extension?

I think Ara has been hacking some package to use Narray on mmap'ed
files, which works fast and is persistent.  Search the archives.
4d5b5dd4e263d780a5dfe7ac8b8ac98c?d=identicon&s=25 Tim Pease (Guest)
on 2007-01-16 20:52
(Received via mailing list)
On 1/16/07, Christian Neukirchen <chneukirchen@gmail.com> wrote:
> I think Ara has been hacking some package to use Narray on mmap'ed
> files, which works fast and is persistent.  Search the archives.
>

http://codeforpeople.com/lib/ruby/nmap/

You'll need to install Guy's mmap package ...

http://moulon.inra.fr/ruby/mmap.html

Blessings,
TwP
This topic is locked and can not be replied to.