Bit count or bit set

In Ruby1.9 is there any good way to count the number of on bits in an
integer (after an & operation)?
Alternatively, is there any VERY light-weight implementation of a bit
set? I’d prefer to use Fixnums, as I’m probably going to need thousands
of these, if the tests work out. But before I can test, I need a decent
bit counter. (shift, xor, &, and | are already present for integer
values, but I also need to count the number of “true” items after the
logical operation. So if a bitset is the correct approach, I’ll need it
to implement those operations, or their equivalents in terms of union
and intersection.)

Or do I need to drop into C for this?

2012/10/25 Charles H. [email protected]:

In Ruby1.9 is there any good way to count the number of on bits in an
integer (after an & operation)?

number.to_s(2).count(‘1’)? I’m not sure how fast this is going to be,
but might turn out to be faster than a simple while loop.

– Matma R.

On 10/25/2012 02:00 PM, Bartosz Dziewoński wrote:

2012/10/25 Charles H.[email protected]:

In Ruby1.9 is there any good way to count the number of on bits in an
integer (after an& operation)?
number.to_s(2).count(‘1’)? I’m not sure how fast this is going to be,
but might turn out to be faster than a simple while loop.

– Matma R.

Thank you. I expect that it would be faster than a loop in the Ruby
interpreter. I was hoping for some built-in method that I hadn’t
noticed. (Converting a Fixnum to a bit string should be pretty fast,
but it would be clearly faster to skip that step.)

Hei Charles,

Any number of these bit twiddling hacks here:
Bit Twiddling Hacks

will translate straight to Ruby. I guess the lookup table approach is
easy to implement and reasonably fast.

Another strategy would be to implement this in C and load it as an
extension. Make sure this is really the hotspot of your code before
optimizing…

kaspar

Kaspar S. wrote:

optimizing…

kaspar
Right now I’m trying to do the simple optimizations that happen while
designing. The reason for the question is that I don’t want to drop
into C, so I’m trying to design something that’s “fast enough” in a
higher level language. And I’m skeptical about bithacks being very fast
in a high level language. They are a way to do something if you must,
but they are really MUCH more appropriate in C. Or C++, Ada, D, any of
those. I’m not even sure, though, that they are appropriate in Java.
In Java one should probably look for something that has been implemented
in the JVM.

So. Part of the design optimization is choosing the implementation
language. You need to pick one that has support for the primitive
operations that you’ll be using a lot. Which means you need to design
the general approach at the same time that you pick the language. A
part of my design turned out to be that I need long lists that will
support multiple kinds of entries. This let out, e.g., Java. (Well, I
COULD use a list of objects, but that is fighting with the language, a
thing I try to avoid.) So this narrowed the choices to a few, Ruby,
Python, Smalltalk, etc. So then I go through the other requirements.
So far Ruby1.9 and Python3 are the leading candidates. I must admit
that Ruby’s fake parallelism bothers me, but I don’t like the way Java
handles unicode, so JRuby is out. (I prefer utf8 or 32bit unicode.
Utf8, because that’s what the files are, and it takes less space in
RAM. 32bit unicode because it’s easy to handle. 16bit unicode seems to
blend the worse characteristics of both.)

P.S.: ARE Threads going to eventually get real parallelism? Multiple
CPUs are becoming much more common. I understand that originally the
reason was because many libraries weren’t thread-safe, but is that being
addressed?

On Fri, Oct 26, 2012 at 12:34 PM, Charles H.
[email protected] wrote:

I must admit that Ruby’s fake parallelism bothers
me, but I don’t like the way Java handles unicode, so JRuby is out. (I
prefer utf8 or 32bit unicode. Utf8, because that’s what the files are, and
it takes less space in RAM. 32bit unicode because it’s easy to handle.
16bit unicode seems to blend the worse characteristics of both.)

JRuby handles unicode (and other encodings) the same way MRI does. We
do not use Java Strings.

  • Charlie

Charles Oliver N. wrote:

  • Charlie

Thank you. I misunderstood.

On Fri, Oct 26, 2012 at 7:34 PM, Charles H.
[email protected] wrote:

Another strategy would be to implement this in C and load it as an
MUCH more appropriate in C. Or C++, Ada, D, any of those. I’m not even
sure, though, that they are appropriate in Java. In Java one should
probably look for something that has been implemented in the JVM.

There exist gems that implement a bitset in C. For example:

I haven’t used any, but you might try some of them.

Jesus.