Hamming Distance

What is the fastest way to calculate the Hamming Distance between two
equally long strings in Ruby? A have looked at the amatch gem, but I was
wondering what can be done with Ruby only?

Here is a very interesting discussion on the subject:

http://www.perlmonks.org/?node_id=500235

Note the elegant and extremely fast Perl solution:

sub hd {
return ($[0] ^ $[1]) =~ tr/\001-\255//;
}

Cheers,

Martin

On Mon, Mar 7, 2011 at 8:47 AM, Martin H. [email protected] wrote:

sub hd {
return ($[0] ^ $[1]) =~ tr/\001-\255//;
}

There are a bunch of ways. There isn’t a built in xor of strings in
Ruby, but it’s easy to define one. Once you have that, you can do it
pretty similarly to the Perl example:

def hd(l, r)
l.xor(r).tr(“\x00”,‘’).length
end

You can implement xor of two strings using pure ruby, but to get it
fast, you’ll probably want to utilize an extension. Here’s one
approach using a technique that, I believe, was in the mailing list
several years ago:

require “narray”

class String
def xor(other)
if other.empty?
self
else
left = self
right = other

  if left.length < right.length
    n,r = right.length.divmod(left.length)
    left = left * n + left[0, r]
  elsif right.length < left.length
    n,r = left.length.divmod(right.length)
    right = right * n + right[0, r]
  end

  left_na = NArray.to_na(left, "byte")
  right_na = NArray.to_na(right, "byte")

  (left_na ^ right_na).to_s
end

end

def hamming_distance(other)
self.xor(other).tr(“\x00”,‘’).length
end
end

Kirk H.
Software Engineer
Engine Y.

On 07.03.2011 18:11, Kirk H. wrote:

Note the elegant and extremely fast Perl solution:
l.xor®.tr("\x00",’’).length
class String
elsif right.length< left.length

def hamming_distance(other)
self.xor(other).tr("\x00",’’).length
end
end

I’ll throw in two other solutions for whoever wants to do a benchmark.

class String
def hd_1(s)
c = 0

 each_codepoint.zip(s.each_codepoint) do |l, r|
   c += 1 unless l == r
 end

 c

end

def hd_2(s)
each_codepoint.zip(s.each_codepoint).select {|l, r| l != r}.length
end
end

Cheers

robert

On Mon, Mar 7, 2011 at 10:11 AM, Kirk H. [email protected] wrote:

You can implement xor of two strings using pure ruby, but to get it
fast, you’ll probably want to utilize an extension. Here’s one

Actually, I should revise that statement. To get it really fast with
MRI, you’ll probably want to utilize an extension. I am guessing, as
I haven’t benchmarked anything, but one may well be able to get much
faster pure ruby speeds with jruby or rubinius. When in doubt,
benchmark it!

Kirk H.

On Tue, Mar 8, 2011 at 5:49 AM, Kirk H. [email protected] wrote:

On Mon, Mar 7, 2011 at 10:11 AM, Kirk H. [email protected] wrote:

You can implement xor of two strings using pure ruby, but to get it
fast, you’ll probably want to utilize an extension. Here’s one
Actually, I should revise that statement. To get it really fast with
MRI, you’ll probably want to utilize an extension. I am guessing, as
I haven’t benchmarked anything, but one may well be able to get much
faster pure ruby speeds with jruby or rubinius. When in doubt,
benchmark it!

am staying w your modest narray implem :wink:

$ ruby test_hamming_distance.rb
Rehearsal ------------------------------------------------
kirk narray 0.140000 0.030000 0.170000 ( 0.160551)
robert zip 1 22.570000 0.010000 22.580000 ( 22.589685)
robert zip 2 22.870000 0.020000 22.890000 ( 22.922881)
-------------------------------------- total: 45.640000sec

               user     system      total        real

kirk narray 0.050000 0.030000 0.080000 ( 0.080348)
robert zip 1 22.490000 0.090000 22.580000 ( 22.805407)
robert zip 2 22.900000 0.040000 22.940000 ( 22.963609)

wow, your narray scheme is immune to length of string…

thanks kirk and robert
best regards -botp