Good or best way to allocate a large array

Robert,

Tuesday, November 10, 2009, 1:23:48 AM, you wrote:

RK> Please do not top post.

Yup, you are right. The culture here is to do it this way.

Thanks for pointing it out.

While I’m at it, does this list like/dislike HTML-based posts?

RK> 2009/11/9 Ralph S. [email protected]:

Doesn’t that just move things into a BiNum … which is not native 64?

RK> Well, define “native”. I chose to interpret it as “does Ruby come
RK> with built in, C implemented support for 64 integers” - which it
does.
RK> :slight_smile:

There is a footnote on page 829 of the 1.9 Pickaxe book that reads:

“Fixnums are stored as 31-bit numbers … Or 63 bit on wider CPU
architectures.”

Is there any way to tell what architecture Ruby thinks I’m using?

RK> Cheers

RK> robert

What’s about using C or C++ to allocate and deallocate the array? I do
not
think Ruby was made to create such Arrays.

Jonathan Schmidt-Dominé - Developer wrote:

What’s about using C or C++ to allocate and deallocate the array?

Unnecessary. Just use a database.

I do
not
think Ruby was made to create such Arrays.

What gives you that idea?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Nov 10, 2009, at 10:39 AM, Robert K. wrote:

Kind regards

robert

It was not meant as a complaint, more as an explanation why the behavior
in my first mail happens (~~40% performance gain by switching off the GC
before allocating the array). It is just a sample where the heuristic
applied by the GC somewhat fails.

The GC behavior is reasonable, but not efficient in that case. In most
cases,
this one second does not really matter, so I would recommend against
tampering with it. Thats why I wrote that you should handle this with
care.

It is, in the end, a nice example illustrating how the GC behaves and
that’s what I used it for.

Regards,
Florian

On Nov 10, 2009, at 4:25 PM, Jonathan Schmidt-Dominé - Developer wrote:

What’s about using C or C++ to allocate and deallocate the array? I
do not
think Ruby was made to create such Arrays.

All basic Methods in Array (especially #new) are implemented in C, so
I don’t think you will gain a lot. Implementing the whole algorithm in
C might be worthwhile, because you would bypass using a Ruby object for
every slot, but the Array is not the issue here.

As others have shown, it works reasonably quick.

Regards,
Florian

Is there any way to tell what architecture Ruby thinks I’m using?

RK> Cheers

RK> robert

Check the size of an integer.

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ irb
irb(main):001:0> 1.size
=> 8

$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
$ irb
irb(main):001:0> 1.size
=> 4

-Justin

On Nov 10, 2009, at 4:50 PM, Marnen Laibow-Koser wrote:

Jonathan Schmidt-Dominé - Developer wrote:

What’s about using C or C++ to allocate and deallocate the array?

Unnecessary. Just use a database.

Why are you so insisting on a database? In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

Varnish for example is an interesting example that uses data structures
in that magnitude.

Regards,
Florian G.

Hi!

I thaught about allocation and deallocation from C. And I think there
you have
more control over GC and the heap.

The User

Jonathan,

JSDD> What’s about using C or C++ to allocate and deallocate the array?
I do not
JSDD> think Ruby was made to create such Arrays.

Yes, I am beginning to come to such a conclusion. Now I have to
figure out how to interface the two. The Ruby book is good … but a
Windows-based example of adding two numbers together and returning the
result would be even better. :slight_smile:

Justin,

Is there any way to tell what architecture Ruby thinks I’m using?

RK> Cheers

RK> robert

JC> Check the size of an integer.

JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 8

JC> $ ruby -v
JC> ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
JC> $ irb
irb(main):001:0>> 1.size
=>> 4

What a lovely and elegant solution!

Thanks!

Ralph

Florian G. wrote:

On Nov 10, 2009, at 4:50 PM, Marnen Laibow-Koser wrote:

Jonathan Schmidt-Domin� - Developer wrote:

What’s about using C or C++ to allocate and deallocate the array?

Unnecessary. Just use a database.

Why are you so insisting on a database?

Because almost any array that large is better dealt with through a DB
interface.

In contrast to your opinion,
there are valid cases where an in-memory array is just what you want.

How about an in-memory DB? SQLite can function that way.

Varnish for example is an interesting example that uses data structures
in that magnitude.

Varnish is a caching package, not what most people will want to
implement. Besides, if the OP were implementing something like Varnish,
I suspect he’d be knowledgeable enough that he wouldn’t have asked the
question in the first place.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Regards,
Florian G.

On Thu, Nov 12, 2009 at 10:05 AM, Simon K. [email protected] wrote:

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

No, a copying GC doesn’t substitute for the sweep phase of mark and
sweep.

Instead it works by tracking references to new objects from old
objects, and copying live objects to a new generation space, leaving
new objects which are not reachable behind. After an object has
survived a number of such scavenging cycles it gets moved to old space
which is then usually gc’d using another method such as mark and
sweep. A copying GC does a bit of marking via a write barrier which
notes when a reference to a ‘young’ object has been stored in an ‘old’
object, and combines the traversal of the young object space with
collection by moving surviving objects.

It’s efficient because the overhead of eliminating young dead objects
is quite a bit lower, and typically there is a lot of infant mortality
in a highly object oriented system.

Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Mark/Sweep is a fairly primitive GC technique, it was probably the
second technique applied in the long history of GC. There are more
recent techniques such as generation scavenging which makes use of the
observance that most objects in a uniformly object-oriented language
like Ruby either live a very short, or a reasonably long life time
with the preponderance being short-lived. Generation scavenging
collects the short-lived objects very efficiently, and typically uses
mark-sweep less frequently to clean up the older ones.

In my understanding these are orthogonal concepts. How to find out what
to collect, and how to do the collection.

Mark and sweep is a technique to find collectable objects just like
reference counting. And a copying gc (like a generational) is a
way of collecting these objects, more efficient than calling free().

mfg, simon … l

On 11/9/09, Ralph S. [email protected] wrote:

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I’m surprised that no-one has yet mentioned narray; I’d think it’s the
ideal data structure for this kind of problem. (I’ve never used it,
tho.)

I’d like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger…
something like 32 bytes or so, even if it’s only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

On 12.11.2009 21:24, Caleb C. wrote:

tell Ruby, “OK, I’m done with this array.”

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I’m surprised that no-one has yet mentioned narray; I’d think it’s the
ideal data structure for this kind of problem. (I’ve never used it,
tho.)

Actually everybody waited for you to finally come up with your reply.
Shame on you that it took you so long. :wink:

I’d like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger…
something like 32 bytes or so, even if it’s only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

Certainly. However, if Bignums work as well, why worry to use another
lib? Whether the allocation overhead is OK or not depends of course on
the use case. Btw, did we see a more concrete description of the
problem? I cannot remember having seen it.

Kind regards

robert

CC> On 11/9/09, Ralph S. [email protected] wrote:

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

CC> I’m surprised that no-one has yet mentioned narray; I’d think it’s
the
CC> ideal data structure for this kind of problem. (I’ve never used it,
CC> tho.)

CC> I’d like to point out that while the memory cost of a Fixnum is
small
CC> (4 or 8 bytes), the memory cost of a Bignum is considerably
larger…
CC> something like 32 bytes or so, even if it’s only storing a 64bit
CC> value. If you expect a significant number of Bignums to be stored in
CC> this array, you should be aware of this. Yet another reason to
CC> consider narray, to my mind.

Could someone point me at documentation on narray, please.

On 11/13/09, Ralph S. [email protected] wrote:

Could someone point me at documentation on narray, please.

http://narray.rubyforge.org/

Now that I’m looking at it, I see that it says:

Supported element types are 1/2/4-byte Integer

No 8-byte integers… maybe you could use two parallel arrays, one for
low half and one for high half of your data. Depends on what you’re
using this for.

Friday, November 13, 2009, 9:45:10 AM, you wrote:

RK> On 12.11.2009 21:24, Caleb C. wrote:

tell Ruby, “OK, I’m done with this array.”

Can one create such an array such that the garbage collector can
free the memory in one fast operation?

I’m surprised that no-one has yet mentioned narray; I’d think it’s the
ideal data structure for this kind of problem. (I’ve never used it,
tho.)

RK> Actually everybody waited for you to finally come up with your
reply.
RK> Shame on you that it took you so long. :wink:

I’d like to point out that while the memory cost of a Fixnum is small
(4 or 8 bytes), the memory cost of a Bignum is considerably larger…
something like 32 bytes or so, even if it’s only storing a 64bit
value. If you expect a significant number of Bignums to be stored in
this array, you should be aware of this. Yet another reason to
consider narray, to my mind.

RK> Certainly. However, if Bignums work as well, why worry to use
another
RK> lib? Whether the allocation overhead is OK or not depends of course
on
RK> the use case. Btw, did we see a more concrete description of the
RK> problem? I cannot remember having seen it.

I am the OP and there really isn’t much more to the problem.

Basically, each row of the array is a list of integers. The first
element of the array is an id. The remaining elements of the array are
a list of ids.

Thus, if
MyMatrix = [ [100, [200, 300, 400]],
[200, [300, 400, 500, 600]],
[300, [400, 600, 555, 1902, 1708]],
[400, 101, 200, 100] ]

If I am processing the last vector (i.e. [400, 101, 200, 100]) then I
want to be able to note that

101 has no corresponding element
200 was found and its remaining content is [300, 400, 500, 600]
100 was found and its remaining content is [200, 300, 400]

Speed is critical. The array is more-or-less static. That is, access
to this array swamps any changes to the array.

The order of the rows is irrelevant … and for speed reasons the
Matrix will be sorted by the first column so that I can do a binary
search.

Of course, I could use a hash … but I doubt that a hash would beat
my binary search … I could be wrong about this.

RK> Kind regards

RK> robert

Ralph S. wrote:
[…]

Could someone point me at documentation on narray, please.

You could have found it in about 5 seconds with Google. Please try that
before posting in future.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

MLK> Ralph S. wrote:
MLK> […]

Could someone point me at documentation on narray, please.

MLK> You could have found it in about 5 seconds with Google. Please try
that
MLK> before posting in future.

I am new to Ruby.

There are/were indications that this is an incmoplete project.

Indeed, as a newbie, I attempted to install/expand the tar file and
could not get the installation to work.

Hence, my request which I had hoped would point me at the most recent
information.