Robert K. wrote in post #1131967:
On Wed, Jan 1, 2014 at 10:46 PM, Jabari Z. [email protected] wrote:
Ideally I want to store the boolean (True/False) data in arrays that
take the least amount of memory to enable me to run out of the cpu cache
as much as possible and to allow the algorithm to search over larger
numbers.
What algorithm is that?
Currently I’m memory limited on the size of numbers I can process.
On a 32/64 bit cpu a boolean would take 4/8 bytes of memory.
Even if I could do just byte addressing I could compute up to 4/8 times
larger numbers, and with bit arrays 32/64 times larger number within the
same address space, which I could fit in cache memory.
This is done all the time with languages like C/++, et al, which allow
you to control hardware at the bit level of registers and memory.
Well, if your performance needs are that pressing then Ruby may not
be the proper tool.
If you chunk your sizes correctly, the performance/memory advantages can
exceed the administrative overhead versus standard arrays,
It would be nice if Ruby would include this capability natively.
There are indeed some options (see below - albeit probably not exactly
what you expect) but I am not sure whether the performance gains you
are hoping for will be realized. The reason is that if the rest of
your algorithm is implemented in Ruby and it allocates objects then
there will be overhead of Ruby’s GC plus you have no control over the
placement in memory. The effect of well aligned byte arrays in memory
may just not measurable.
I did try the gems that were suggested (Peter C. and Ingram’s) and
they do work for me. Now I want to see if I can make them faster or find
something that is.
There are two quite obvious alternatives:
I once cooked something together using Strings with an optimization
using chunks:
Basis for a bit set which uses String as storage. Storage is split up into multiple String instances to avoid too large allocations. · GitHub
Note sure what state that is in.
If you rarely change bits and query frequently then a simple Integer
works pretty well because operator [] is used for bit indexing:
irb(main):003:0> flag = ‘0011011010’.to_i(2)
=> 218
irb(main):004:0> flag.to_s 16
=> “da”
irb(main):005:0> 8.times {|idx| printf “%3d %1d\n”, idx, flag[idx]}
0 0
1 1
2 0
3 1
4 1
5 0
6 1
7 1
=> 8
If there are frequent updates then solutions using String are more
efficient. If you consider non Ruby solutions then of course a C
solution is even more efficient - if that difference is measurable. I
would always start with a Ruby implementation and only switch to a C
implementation if Ruby proves too slow.
Kind regards
robert
Thanks for the info on strings holding flags.
I tried it in my algorithm, and it can functionally work, but it is
waaayy, waaayy, slower than with arrays,
I need to store boolean arrays on the order of 100s of millions of bits
and greater. I recall reading that strings greater than something like
just 29 characters cause a severe processing hit in performance, so this
isn’t a practical solution for me.
What I was able to do, though, is use use Ingram’s bitarray gem (which
is written in C) and tweak my code to actually not only run faster than
the code with standard Ruby arrays, but so allow me to process at
minimum 4 times larger arrays.
More performance/features can definitely be squeezed out of the gem
since it seems development activity ceased on it in 2008. In the
comments in the source code he even stated some of the algorithms were
not optimal, and could be improved.
I believe there are a number of C/C++ libraries, or techniques from
them, that can be used to improve the code. I’m not really a C/C++ coder
though, so someone who is would probably be aware of tricks and
practices to make the code better (faster).
This would seem like a nice project for something like a Google Summer
of Code (GSoC) to create a standard Ruby BitArray library.