Faster primality testing and factorization methods in Ruby

addis_a · June 27, 2013, 4:25pm

I have created various implementations of methods to replace prime? and
prime_division (from the standard lib file prime.rb) based on a class of
mathematical operators I’ve developed called prime generators (PG).Using
a special form of these PG I term Strictly Prime (SP) prime generators I
have created extremely simple and fast algorithms that can find all the
primes up to some number n, determine the primality of n, or its
factorization. These methods are significantly faster in JRuby than the
standard methods.

Below are links to the paper which provides the mathematical basis for
the proposed methods with two tables of benchmark results comparing the
performance of the library methods prime? and prime_division with my
proposed methods. I provide test results on 32-bit and 64-bit Linux
systems using 5 reference primes of 17 to 19 digits.

My paper, ‘Improved Primality Testing and Factorization in Ruby’
Improved Primality Testing and Factorization in Ruby Revised | PDF | Prime Number | Ruby (Programming Language)…

The code rile ‘primeszp.rb’ is available in my github repository:

gist.github.com

https://gist.github.com/jzakiya/455f2357cdb08f4ee1c4

primeszp.rb

#!/usr/local/bin/ruby -w

require 'rational' if RUBY_VERSION =~ /^(1.8)/   # for 'gcd' method

=begin
Author:  Jabari Zakiya
Version: June 26, 2013
Version: August 1, 2013  -- shorter/faster/adjustable primemr?
Version: August 28, 2013 -- two new methods use cli command "factor"
Version: September 13, 2013 -- refactored/simplified "primality"

This file has been truncated. show original

jabari_z · June 27, 2013, 9:44pm

On Thu, Jun 27, 2013 at 9:25 AM, Jabari Z. [email protected] wrote:

I have created various implementations of methods to replace prime? and
prime_division (from the standard lib file prime.rb) based on a class of
mathematical operators I’ve developed called prime generators (PG).Using
a special form of these PG I term Strictly Prime (SP) prime generators I
have created extremely simple and fast algorithms that can find all the
primes up to some number n, determine the primality of n, or its
factorization. These methods are significantly faster in JRuby than the
standard methods.

Very nice!

Below are links to the paper which provides the mathematical basis for
the proposed methods with two tables of benchmark results comparing the
performance of the library methods prime? and prime_division with my
proposed methods. I provide test results on 32-bit and 64-bit Linux
systems using 5 reference primes of 17 to 19 digits.

We ship a stock prime.rb pulled from CRuby, at present. Ideally you
would want to submit this as a patch to CRuby folks
(http://bugs.ruby-lang.org) so it can get into the standard library
and we’ll eventually pick it up.

So, I would recommend doing that…but we would be happy to
incorporate your patch directly, too. We’re always a release or two
behind CRuby, so when they incorporate new libraries changes we don’t
get them right away unless we commit them ourselves.

Would it be possible for you to restructure your code in terms of a
patch for (or complete replacement of) lib/ruby/1.9/prime.rb? You’ll
need to agree to The Ruby License and the 2-clause BSDL (see
http://www.ruby-lang.org/en/about/license.txt) but otherwise it sounds
great!

Charlie

jabari_z · June 27, 2013, 11:21pm

We ship a stock prime.rb pulled from CRuby, at present. Ideally you
would want to submit this as a patch to CRuby folks
(http://bugs.ruby-lang.org) so it can get into the standard library
and we’ll eventually pick it up.

So, I would recommend doing that…but we would be happy to
incorporate your patch directly, too. We’re always a release or two
behind CRuby, so when they incorporate new libraries changes we don’t
get them right away unless we commit them ourselves.

Would it be possible for you to restructure your code in terms of a
patch for (or complete replacement of) lib/ruby/1.9/prime.rb? You’ll
need to agree to The Ruby License and the 2-clause BSDL (see
http://www.ruby-lang.org/en/about/license.txt) but otherwise it sounds
great!

Charlie

Thanks for your reply.

I’ve never done a substantial patch before, and think if would be easier
just to submit this as a replacement implementation. I submitted this to
the Ruby-core list, but got a message back that I had to be member
first. So I went through that process, and I guess I’m waiting for
confirmation that I can now submit stuff.

The code is completely OSS from my perspective, and if I need to stamp a
particular license(s) on it that’s no problem. I put it out for people
to use.

Regarding the particular implementation to use: as the benchmarks show
there is variability of performance based on the particular form of
code. It would seem JRuby would want to use the ‘best’ form for its VM,
MRI for it, and Rubinius for it.

I don’t know what the politics of code compliance is to be considered
compatible (100% same source code or 100% same results), so I was
thinking each VM should use the versions that perform the best for it,
assuming a standard behaviour for inputs and other variables of
operation (small primes behavior, factorization output formatting, etc)
is agreed upon.

You have my complete permission to use the code as you see fit.

Please provide more detail on how I can best proceed to make the code
more acceptable.

jz

jabari_z · June 28, 2013, 2:53am

Unfortunately it’s not even as simple as picking one impl for JRuby
that’s the fastest, since we have many different execution modes

I have run some tests with JRuby master on Java 7u40, compile with
invokedynamic, compiled without invokedynamic, and interpreted.

For the compiled+indy run, primzp7 and primzp(13) trade off for top
honors.

For the compiled-indy run, primzp7a is consistently the fastest.

For the interpreted run, primzp7a is fastest.

Now it gets more complicated…

primzp7 is fastest in compiled+indy, but primzp7 is slowest in
interpreter, and second fastest in compiled-indy.

primzp(13) is fastest in compiled+indy, but third fastest in
interpreter and on the slow side in compiled-indy.

primzp7a is fastest in interpreter, but slowest in compiled+indy and
average in compiled-indy.

Now we generally don’t optimize for the interpreter, but we don’t want
to pick the slowest version…so I think primzp7 is out.

compiled modes should take priority, indy should take priority for our
future…so I think primzp(13) is the way to go.

I will post a patch modifying prime.rb with factorzp and primzp(13)
shortly.

Charlie

jabari_z · June 28, 2013, 8:25pm

Charles Nutter wrote in post #1113832:

Here’s an initial patch: Partial patch for faster prime? and prime_division · GitHub

This works fine for Integer#prime? but the behavior for prime_division
isn’t quite right. The MRI version returns the factors along with
their exponentiation. I tried to add that into your factorization
code, but had some troubles.

This patch also does not patch the Prime class/module, so going
directly in still uses the old logic.

FWIW, I am also a committer on MRI. If we can decide on the fastest
version for MRI, I can open an issue and start the process of getting
it into MRI.

So, to-do:

Fix the factorization logic to match MRI

Choose a version for MRI

Patch Prime appropriately

Perhaps plug your logic into the other methods in prime, like #each

Get MRI to incorporate the logic

Thanks again!

Charlie

On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver N.

I’ll fix factorzp to match the output format of prime_division, which
creates an array of arrays of factors like [[f1,n1],[f2,n2]…] where fi
is the factor value and ni is the number of its occurrences (or
exponent).
I will just edit my gist code to include it (to keep all the code in one
place) and I’ll let you know when its done.

jz

jabari_z · June 28, 2013, 5:42pm

Here’s an initial patch: Partial patch for faster prime? and prime_division · GitHub

This works fine for Integer#prime? but the behavior for prime_division
isn’t quite right. The MRI version returns the factors along with
their exponentiation. I tried to add that into your factorization
code, but had some troubles.

This patch also does not patch the Prime class/module, so going
directly in still uses the old logic.

FWIW, I am also a committer on MRI. If we can decide on the fastest
version for MRI, I can open an issue and start the process of getting
it into MRI.

So, to-do:

Fix the factorization logic to match MRI
Choose a version for MRI
Patch Prime appropriately
Perhaps plug your logic into the other methods in prime, like #each
Get MRI to incorporate the logic

Thanks again!

Charlie

On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver N.

jabari_z · June 29, 2013, 6:40am

Here is the code for method prime_division_new to produce the same
output as prime_division. It’s now included in the gist for the previous
code.

This produces the same output format as lib method prime_division

def prime_division_new(p=13) # P13 is default prime generator here
h=Hash.new(0); factorzp§.each {|f| h[f] = h[f]+1}; h.to_a
end

One more revision to ensure output is sorted and some synaptic sugar.

This produces the same output format as lib method prime_division

def prime_division_new(p=13) # P13 is default prime generator here
h=Hash.new(0); factorzp§.each {|f| h[f] +=1}; h.to_a.sort
end

jabari_z · August 28, 2013, 6:34pm

Hi Charles,

Just to let you know I’ve come up with a much faster, and more standard,
method for doing primality testing and factorization in Ruby.

[Un/L]inux comes with the standard cli command “factor”.

$ factor 30409113
30409113: 3 7 1448053 # here the number is composite

$ factor 6000000000000001
6000000000000001: 6000000000000001 # here the number is a prime

This can be used to create MUCH FASTER and more portable code that will
work exactly the same with all versions of Ruby run under *nix systems.

Here’s the code:

class Integer
def factors
factors = factor #{self.abs}.split(’ ')[1…-1].map {|i| i.to_i}
h = Hash.new(0); factors.each {|f| h[f] +=1}; h.to_a.sort
end

def primality?
return true if factor #{self.abs}.split(’ ')[1…-1].size == 1
return false
end
end

irb(main):003:0> 30409113.factors
=> [[3, 1], [7, 1], [1448053, 1]]

irb(main):003:0> 6000000000000001.primality?
=> true

I used these names to not conflict with the other methods in my code
base.

Now Ruby can do REALLY BIG NUMBERS with consistent fast performance.

Since Ruby uses other system calls and dependencies anyway, I don’t see
where this should be a problem (technically or politically), especially
for the enormous performance gains. This should even work for Windoze
builds, as they use *nix emulators.

Let me know what you think.

Jabari

jabari_z · June 28, 2013, 9:53pm

Jabari Z. wrote in post #1113848:

Charles Nutter wrote in post #1113832:

Here’s an initial patch: Partial patch for faster prime? and prime_division · GitHub

This works fine for Integer#prime? but the behavior for prime_division
isn’t quite right. The MRI version returns the factors along with
their exponentiation. I tried to add that into your factorization
code, but had some troubles.

This patch also does not patch the Prime class/module, so going
directly in still uses the old logic.

FWIW, I am also a committer on MRI. If we can decide on the fastest
version for MRI, I can open an issue and start the process of getting
it into MRI.

So, to-do:

Fix the factorization logic to match MRI

Choose a version for MRI

Patch Prime appropriately

Perhaps plug your logic into the other methods in prime, like #each

Get MRI to incorporate the logic

Thanks again!

Charlie

On Thu, Jun 27, 2013 at 7:51 PM, Charles Oliver N.

I’ll fix factorzp to match the output format of prime_division, which
creates an array of arrays of factors like [[f1,n1],[f2,n2]…] where fi
is the factor value and ni is the number of its occurrences (or
exponent).
I will just edit my gist code to include it (to keep all the code in one
place) and I’ll let you know when its done.

jz

Here is the code for method prime_division_new to produce the same
output as prime_division. It’s now included in the gist for the previous
code.

This produces the same output format as lib method prime_division

def prime_division_new(p=13) # P13 is default prime generator here
h=Hash.new(0); factorzp(p).each {|f| h[f] = h[f]+1}; h.to_a
end