Forum: JRuby Faster primality testing and factorization methods in Ruby

F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-06-27 16:25
I have created various implementations of methods to replace prime? and
prime_division (from the standard lib file prime.rb) based on a class of
mathematical operators I've developed called prime generators (PG).Using
a special form of these PG I term Strictly Prime (SP) prime generators I
have created extremely simple and fast algorithms that can find all the
primes up to some number n, determine the primality of n, or its
factorization. These methods are significantly faster in JRuby than the
standard methods.

Below are links to the paper which provides the mathematical basis for
the proposed methods with two tables of benchmark results comparing the
performance of the library methods prime? and prime_division with my
proposed methods. I provide test results on 32-bit and 64-bit Linux
systems using 5 reference primes of 17 to 19 digits.

My paper, 'Improved Primality Testing and Factorization in Ruby'
http://www.scribd.com/doc/150217723/Improved-Prima...

The code rile 'primeszp.rb' is available in my github repository:
https://gist.github.com/jzakiya/455f2357cdb08f4ee1c4
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles Nutter (headius)
on 2013-06-27 21:44
(Received via mailing list)
On Thu, Jun 27, 2013 at 9:25 AM, Jabari Z. <lists@ruby-forum.com> wrote:
> I have created various implementations of methods to replace prime? and
> prime_division (from the standard lib file prime.rb) based on a class of
> mathematical operators I've developed called prime generators (PG).Using
> a special form of these PG I term Strictly Prime (SP) prime generators I
> have created extremely simple and fast algorithms that can find all the
> primes up to some number n, determine the primality of n, or its
> factorization. These methods are significantly faster in JRuby than the
> standard methods.

Very nice!

> Below are links to the paper which provides the mathematical basis for
> the proposed methods with two tables of benchmark results comparing the
> performance of the library methods prime? and prime_division with my
> proposed methods. I provide test results on 32-bit and 64-bit Linux
> systems using 5 reference primes of 17 to 19 digits.

We ship a stock prime.rb pulled from CRuby, at present. Ideally you
would want to submit this as a patch to CRuby folks
(http://bugs.ruby-lang.org) so it can get into the standard library
and we'll eventually pick it up.

So, I would recommend doing that...but we would be happy to
incorporate your patch directly, too. We're always a release or two
behind CRuby, so when they incorporate new libraries changes we don't
get them right away unless we commit them ourselves.

Would it be possible for you to restructure your code in terms of a
patch for (or complete replacement of) lib/ruby/1.9/prime.rb? You'll
need to agree to The Ruby License and the 2-clause BSDL (see
http://www.ruby-lang.org/en/about/license.txt) but otherwise it sounds
great!

- Charlie
F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-06-27 23:21
> We ship a stock prime.rb pulled from CRuby, at present. Ideally you
> would want to submit this as a patch to CRuby folks
> (http://bugs.ruby-lang.org) so it can get into the standard library
> and we'll eventually pick it up.
>
> So, I would recommend doing that...but we would be happy to
> incorporate your patch directly, too. We're always a release or two
> behind CRuby, so when they incorporate new libraries changes we don't
> get them right away unless we commit them ourselves.
>
> Would it be possible for you to restructure your code in terms of a
> patch for (or complete replacement of) lib/ruby/1.9/prime.rb? You'll
> need to agree to The Ruby License and the 2-clause BSDL (see
> http://www.ruby-lang.org/en/about/license.txt) but otherwise it sounds
> great!
>
> - Charlie

Thanks for your reply.

I've never done a substantial patch before, and think if would be easier
just to submit this as a replacement implementation. I submitted this to
the Ruby-core list, but got a message back that I had to be member
first. So I went through that process, and I guess I'm waiting for
confirmation that I can now submit stuff.

The code is completely OSS from my perspective, and if I need to stamp a
particular license(s) on it that's no problem. I put it out for people
to use.

Regarding the particular implementation to use: as the benchmarks show
there is variability of performance based on the particular form of
code. It would seem JRuby would want to use the 'best' form for its VM,
MRI for it, and Rubinius for it.

I don't know what the politics of code compliance is to be considered
compatible (100% same source code or 100% same results), so I was
thinking each VM should use the versions that perform the best for it,
assuming a standard behaviour for inputs and other variables of
operation (small primes behavior, factorization output formatting, etc)
is agreed upon.

You have my complete permission to use the code as you see fit.

Please provide more detail on how I can best proceed to make the code
more acceptable.

jz
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles Nutter (headius)
on 2013-06-28 02:53
(Received via mailing list)
Unfortunately it's not even as simple as picking one impl for JRuby
that's the fastest, since we have many different execution modes :-)

I have run some tests with JRuby master on Java 7u40, compile with
invokedynamic, compiled without invokedynamic, and interpreted.

For the compiled+indy run, primzp7 and primzp(13) trade off for top
honors.

For the compiled-indy run, primzp7a is consistently the fastest.

For the interpreted run, primzp7a is fastest.

Now it gets more complicated...

primzp7 is fastest in compiled+indy, but primzp7 is *slowest* in
interpreter, and second fastest in compiled-indy.

primzp(13) is fastest in compiled+indy, but third fastest in
interpreter and on the slow side in compiled-indy.

primzp7a is fastest in interpreter, but *slowest* in compiled+indy and
average in compiled-indy.

Now we generally don't optimize for the interpreter, but we don't want
to pick the slowest version...so I think primzp7 is out.

compiled modes should take priority, indy should take priority for our
future...so I think primzp(13) is the way to go.

I will post a patch modifying prime.rb with factorzp and primzp(13)
shortly.

- Charlie
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles Nutter (headius)
on 2013-06-28 17:42
(Received via mailing list)
Here's an initial patch: https://gist.github.com/headius/5885551

This works fine for Integer#prime? but the behavior for prime_division
isn't quite right. The MRI version returns the factors along with
their exponentiation. I tried to add that into your factorization
code, but had some troubles.

This patch also does not patch the Prime class/module, so going
directly in still uses the old logic.

FWIW, I am also a committer on MRI. If we can decide on the fastest
version for MRI, I can open an issue and start the process of getting
it into MRI.

So, to-do:

* Fix the factorization logic to match MRI
* Choose a version for MRI
* Patch Prime appropriately
* Perhaps plug your logic into the other methods in prime, like #each
* Get MRI to incorporate the logic

Thanks again!

- Charlie

On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver Nutter
F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-06-28 20:25
Charles Nutter wrote in post #1113832:
> Here's an initial patch: https://gist.github.com/headius/5885551
>
> This works fine for Integer#prime? but the behavior for prime_division
> isn't quite right. The MRI version returns the factors along with
> their exponentiation. I tried to add that into your factorization
> code, but had some troubles.
>
> This patch also does not patch the Prime class/module, so going
> directly in still uses the old logic.
>
> FWIW, I am also a committer on MRI. If we can decide on the fastest
> version for MRI, I can open an issue and start the process of getting
> it into MRI.
>
> So, to-do:
>
> * Fix the factorization logic to match MRI
> * Choose a version for MRI
> * Patch Prime appropriately
> * Perhaps plug your logic into the other methods in prime, like #each
> * Get MRI to incorporate the logic
>
> Thanks again!
>
> - Charlie
>
> On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver Nutter

I'll fix factorzp to match the output format of prime_division, which
creates an array of arrays of factors like [[f1,n1],[f2,n2]...] where fi
is the factor value and ni is the number of its occurrences (or
exponent).
I will just edit my gist code to include it (to keep all the code in one
place) and I'll let you know when its done.

jz
F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-06-28 21:53
Jabari Z. wrote in post #1113848:
> Charles Nutter wrote in post #1113832:
>> Here's an initial patch: https://gist.github.com/headius/5885551
>>
>> This works fine for Integer#prime? but the behavior for prime_division
>> isn't quite right. The MRI version returns the factors along with
>> their exponentiation. I tried to add that into your factorization
>> code, but had some troubles.
>>
>> This patch also does not patch the Prime class/module, so going
>> directly in still uses the old logic.
>>
>> FWIW, I am also a committer on MRI. If we can decide on the fastest
>> version for MRI, I can open an issue and start the process of getting
>> it into MRI.
>>
>> So, to-do:
>>
>> * Fix the factorization logic to match MRI
>> * Choose a version for MRI
>> * Patch Prime appropriately
>> * Perhaps plug your logic into the other methods in prime, like #each
>> * Get MRI to incorporate the logic
>>
>> Thanks again!
>>
>> - Charlie
>>
>> On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver Nutter
>
> I'll fix factorzp to match the output format of prime_division, which
> creates an array of arrays of factors like [[f1,n1],[f2,n2]...] where fi
> is the factor value and ni is the number of its occurrences (or
> exponent).
> I will just edit my gist code to include it (to keep all the code in one
> place) and I'll let you know when its done.
>
> jz

Here is the code for method prime_division_new to produce the same
output as prime_division. It's now included in the gist for the previous
code.

  # This produces the same output format as lib method prime_division
  def prime_division_new(p=13) # P13 is default prime generator here
    h=Hash.new(0); factorzp(p).each {|f| h[f] = h[f]+1}; h.to_a
  end
F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-06-29 06:40
> Here is the code for method prime_division_new to produce the same
> output as prime_division. It's now included in the gist for the previous
> code.
>
>   # This produces the same output format as lib method prime_division
>   def prime_division_new(p=13) # P13 is default prime generator here
>     h=Hash.new(0); factorzp(p).each {|f| h[f] = h[f]+1}; h.to_a
>   end

One more revision to ensure output is sorted and some synaptic sugar.

   # This produces the same output format as lib method prime_division
   def prime_division_new(p=13) # P13 is default prime generator here
     h=Hash.new(0); factorzp(p).each {|f| h[f] +=1}; h.to_a.sort
   end
F4c2dcb24f2828cde331ba6a2c9ea10d?d=identicon&s=25 Jabari Z. (jabari_z)
on 2013-08-28 18:34
Hi Charles,

Just to let you know I've come up with a much faster, and more standard,
method for doing primality testing and factorization in Ruby.

[Un/L]inux comes with the standard cli command "factor".

$ factor 30409113
30409113: 3 7 1448053                # here the number is composite

$ factor 6000000000000001
6000000000000001: 6000000000000001   # here the number is a prime

This can be used to create MUCH FASTER and more portable code that will
work exactly the same with all versions of Ruby run under *nix systems.

Here's the code:

class Integer
   def factors
     factors = `factor #{self.abs}`.split(' ')[1..-1].map {|i| i.to_i}
     h = Hash.new(0); factors.each {|f| h[f] +=1}; h.to_a.sort
   end

   def primality?
     return true if `factor #{self.abs}`.split(' ')[1..-1].size == 1
     return false
   end
end

irb(main):003:0> 30409113.factors
 => [[3, 1], [7, 1], [1448053, 1]]

irb(main):003:0> 6000000000000001.primality?
 => true

I used these names to not conflict with the other methods in my code
base.

Now Ruby can do REALLY BIG NUMBERS with consistent fast performance.

Since Ruby uses other system calls and dependencies anyway, I don't see
where this should be a problem (technically or politically), especially
for the enormous performance gains. This should even work for Windoze
builds, as they use *nix emulators.

Let me know what you think.

Jabari
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.