# Ordered contrast for String or Array

I have two strings: “aabc” and “aacd”. I want to get an “ordered
contrast” to see the difference in their spellings. Eg.

“aabc”.ordered_contrast( “aacc” ) => " cc"

In the example a space represents a matching string, although I suppose
there may be a better alternative. In any case, one could also imagine
a non-ordered contrast:

“aabc”.contrast( “aacc” ) => " c "

And conversly one could ask for the intersect.

“aabc”.inersect( “aacc” ) => “aa c”
“aabc”.ordered_inersect( “aacc” ) => "aa "

These could be extended to Array as well, and String could just use
split(//) with those.

So the question is: Is there an efficient way to calculate these?

Thanks,
T.

On Sep 2, 2006, at 12:10 PM, Trans wrote:

I have two strings: “aabc” and “aacd”. I want to get an “ordered
contrast” to see the difference in their spellings. Eg.

“aabc”.ordered_contrast( “aacc” ) => " cc"

I assume that’s an error. The last to letters match.

Also, how are you determining which letter to show for the contrast?

James Edward G. II

Trans wrote:

I have two strings: “aabc” and “aacd”. I want to get an “ordered
contrast” to see the difference in their spellings. Eg.

“aabc”.ordered_contrast( “aacc” ) => " cc"

In the example a space represents a matching string, although I suppose
there may be a better alternative. In any case, one could also imagine
a non-ordered contrast:

“aabc”.contrast( “aacc” ) => " c "

From this, the ‘ordered contrast’ means that the first different letter,
and all following letters, are shown? And unordered means only the
different letters are shown?

Google refused to provide a definition or examples.

William C. wrote:

“aabc”.contrast( “aacc” ) => " c "

From this, the ‘ordered contrast’ means that the first different letter,
and all following letters, are shown? And unordered means only the
different letters are shown?

That’s right.

Google refused to provide a definition or examples.

Yes, they aren’t technical terms, just me trying my best to describe
them.

T.

James Edward G. II wrote:

On Sep 2, 2006, at 12:10 PM, Trans wrote:

I have two strings: “aabc” and “aacd”. I want to get an “ordered
contrast” to see the difference in their spellings. Eg.

“aabc”.ordered_contrast( “aacc” ) => " cc"

I assume that’s an error. The last to letters match.

It’s correct. By “ordered” I mean by sort order (ie. alphabetic) so
“aabc” and “aacc” diverge at the thrid letter. Regular, unordered
“contrast” doesn’t care about that and would blank the last letter too.

Also, how are you determining which letter to show for the contrast?

If they are != or == and in the same position. Hmmm… maybe the Array
form would be a better example:

[ “a”, “a”, “b”, “c” ].contrast => [ nil, nil, “b”, nil ]
[ “a”, “a”, “b”, “c” ].ordered_contrast => [ nil, nil, “b”,“c” ]

[ “a”, “a”, “b”, “c” ].intersect => [ “a”, “a”, nil, “c” ]
[ “a”, “a”, “b”, “c” ].ordered_intersect => [ “a”, “a”, nil, nil ]

Also, maybe the terms ‘negative’ and ‘positive’ would have been better
than ‘contrast’ and ‘intersect’.

T.

Trans wrote:

[ “a”, “a”, “b”, “c” ].contrast => [ nil, nil, “b”, nil ]
[ “a”, “a”, “b”, “c” ].ordered_contrast => [ nil, nil, “b”,“c” ]

[ “a”, “a”, “b”, “c” ].intersect => [ “a”, “a”, nil, “c” ]
[ “a”, “a”, “b”, “c” ].ordered_intersect => [ “a”, “a”, nil, nil ]

Opps… I forgot the comparision array and screwed it up. Let me try
that again:

[ “a”, “a”, “b”, “c” ].contrast([ “a”, “a”, “c”, “c” ]) => [ nil,
nil, “c”, nil ]
[ “a”, “a”, “b”, “c” ].ordered_contrast([ “a”, “a”, “c”, “c” ]) => [
nil, nil, “c”,“c” ]

In these the difference being show is that of the parameter’s. If we
swap the receiver and the parameter:

[ “a”, “a”, “c”, “c” ].contrast([ “a”, “a”, “b”, “c” ]) => [ nil,
nil, “b”, nil ]
[ “a”, “a”, “c”, “c” ].ordered_contrast([ “a”, “a”, “b”, “c” ]) => [
nil, nil, “b”,“c” ]

And of course the inverse:

[ “a”, “a”, “b”, “c” ].intersect([ “a”, “a”, “c”, “c” ]) => [ “a”,
“a”, nil, “c” ]
[ “a”, “a”, “b”, “c” ].ordered_intersect([ “a”, “a”, “c”, “c” ]) =>
[ “a”, “a”, nil, nil ]

Also interesting:

[ “a”, “a”, “b”, “c” ].contrast([ “a”, “a”, “c”, “c” ]) => [ nil,
nil, “c”, nil ]
[ nil, nil, “c”, nil ].contrast([ “a”, “a”, “c”, “c” ]) => [ “a”,
“a”, nil, “c” ]

which is the intersection.

T.

Here’s one way to do it. There are probably better ways.

rick@frodo:~/rubyscripts\$ cat enum_contrast.rb
module Enumerable

``````    def contrast(enum, eql_val=nil)
result = []
self.zip(enum) { |a, b| result << (a.eql?(b) ? eql_val :
``````

b)}
result
end

``````    def ordered_contrast(enum, eql_val=nil)
result = []
diff = false
self.zip(enum) do |a, b|
diff = diff || !a.eql?(b)
result << (diff ? b: eql_val)
end
result
end

def intersect(enum, diff_val=nil)
result = []
self.zip(enum) { |a, b| result << (a.eql?(b) ? b :
``````

diff_val)}
result
end

``````    def ordered_intersect(enum, diff_val=nil)
result = []
diff = false
self.zip(enum) do |a, b|
diff = diff || !a.eql?(b)
result << (diff ? diff_val : b)
end
result
end
``````

end

``````    class String

def to_chars_array
unpack('a'*length)
end

def contrast(str)
to_chars_array.contrast(str.to_chars_array,'
``````

').join
end

``````            def ordered_contrast(str)
``````

to_chars_array.ordered_contrast(str.to_chars_array,’ ').join
end

``````            def intersect(str)
to_chars_array.intersect(str.to_chars_array,'
``````

').join
end

``````            def ordered_intersect(str)
``````

to_chars_array.ordered_intersect(str.to_chars_array,’ ').join
end
end

rick@frodo:~/rubyscripts\$ cat test_enum_contrast.rb
require ‘enum_contrast.rb’
require ‘test/unit’

class TestSubranges < Test::Unit::TestCase

``````    def test_array_contrast
assert_equal([ nil, nil, "c", nil ],
[ "a", "a", "b", "c" ].contrast([ "a",
``````

“a”, “c”, “c” ]))

``````            assert_equal([ nil, nil, "b", nil ],
[ "a", "a", "c", "c" ].contrast([ "a",
``````

“a”, “b”, “c” ]))

``````            assert_equal([ nil, nil, "c", nil ],
[ "a", "a", "b", "c" ].contrast([ "a",
``````

“a”, “c”, “c” ]))

``````            assert_equal([ "a", "a", nil, "c" ],
[ nil, nil, "c", nil ].contrast([ "a",
``````

“a”, “c”, “c” ]))
end

``````    def test_array_ordered_contrast
assert_equal([ nil, nil, "c","c" ],
[ "a", "a", "b", "c" ].ordered_contrast([
``````

“a”, “a”, “c”, “c” ]))

``````            assert_equal([ nil, nil, "b","c" ],
[ "a", "a", "c", "c" ].ordered_contrast([
``````

“a”, “a”, “b”, “c” ]))
end

``````    def test_array_intersect
assert_equal([ "a", "a", nil, "c" ],
[ "a", "a", "b", "c" ].intersect([ "a",
``````

“a”, “c”, “c” ]))
end

``````    def test_array_ordered_intersect
assert_equal([ "a", "a", nil, nil ],
[ "a", "a", "b", "c"
``````

].ordered_intersect([ “a”, “a”, “c”, “c” ]))

``````    end

def test_string_ordered_contrast
assert_equal("  cc",
"aabc".ordered_contrast( "aacc" ))
end

def test_string_contrast
assert_equal("  c ",
"aabc".contrast("aacc"))
end

def test_string_intersect
assert_equal("aa c",
"aabc".intersect("aacc"))
end

def test_string_ordered_intersect
assert_equal("aa  ",
"aabc".ordered_intersect("aacc"))
end
``````

end
rick@frodo:~/rubyscripts\$ ruby test_enum_contrast.rb
Started

Finished in 0.010052 seconds.

## 8 tests, 12 assertions, 0 failures, 0 errors rick@frodo:~/rubyscripts\$

Rick DeNatale

My blog on Ruby

On 9/4/06, Trans [email protected] wrote:

Rick DeNatale wrote:

Here’s one way to do it. There are probably better ways.

Nice! Good use of zip, and using #unpack for the String rendition makes
a lot of sense and is probably the fastest way. Thanks for these good
solutions. Beats the hek out of what I had.

Well, I was surprised when I couldn’t seem to find a standard String
method for splitting a String into an array of single character
strings.

Thinking about it again, another way is

string.scan /./

I haven’t benchmarked the two though so I don’t know which is faster.
I guess that’s a task for the to-do list.

I’m going to add these to Facets, albiet I’m going to give some thought
to possibly better names. You’ll get the credit of course and added to
the list of Authors/Contributors if that’s okay with you.

That’s fine. I don’t have any advice on the names, except that
intersect is definitely a bad name, since it sounds too much like a
set operation which would have slightly different semantics.

Thanks!
T.

P.S. Your blog seems to be down. Something about:
SQLite3::CantOpenException in ArticlesController#index

Thanks for pointing that out. This is the second time in a week that
Typo has gotten me. Although this time it was just a matter of
restarting it. Maybe it’s time to consider migrating to Mephisto!?!

Rick DeNatale

My blog on Ruby

Rick DeNatale wrote:

Here’s one way to do it. There are probably better ways.

Nice! Good use of zip, and using #unpack for the String rendition makes
a lot of sense and is probably the fastest way. Thanks for these good
solutions. Beats the hek out of what I had.

I’m going to add these to Facets, albiet I’m going to give some thought
to possibly better names. You’ll get the credit of course and added to
the list of Authors/Contributors if that’s okay with you.

Thanks!
T.

P.S. Your blog seems to be down. Something about:
SQLite3::CantOpenException in ArticlesController#index

On 9/4/06, Rick DeNatale [email protected] wrote:

It looks like unpack is a clear winner:

Then on second thought, it looks like the problem with scan was the
construction of the regex, although using scan without pre-compiling
the regex is about twice as slow as unpack, using scan with a
pre-compiled regexp looks like it’s about 100 times faster!

rick@frodo:~/rubyscripts\$ cat benchstringsplit.rb
require ‘benchmark’
include Benchmark

class String
To_chars_regex = Regexp.new(‘/./’)

``````    def to_chars_array_with_unpack
unpack('a'*length)
end

def to_chars_array_with_scan
scan /./
end

def to_chars_array_with_scan_precomp
scan To_chars_regex
end
``````

end

iterations = 100
str = “abcdefghijklmnopqrstuvwxyz” * 5
bmbm do | x |
5.times do
x.report(“unpack #{str.length} character string”) do
iterations.times do
str.to_chars_array_with_unpack
end
end

``````    x.report("scan #{str.length} character string") do
iterations.times do
str.to_chars_array_with_scan
end
end

x.report("scan-precomp #{str.length} character string") do
iterations.times do
str.to_chars_array_with_scan_precomp
end
end
str += str
end
``````

end

## rick@frodo:~/rubyscripts\$ ruby benchstringsplit.rb Rehearsal

unpack 130 character string 0.960000 0.010000 0.970000 (
0.984373)
scan 130 character string 2.150000 0.000000 2.150000 (
2.178162)
scan-precomp 130 character string 0.010000 0.000000 0.010000 (
0.012862)
unpack 260 character string 0.910000 0.000000 0.910000 (
0.910658)
scan 260 character string 2.040000 0.000000 2.040000 (
2.100734)
scan-precomp 260 character string 0.010000 0.000000 0.010000 (
0.010890)
unpack 520 character string 0.940000 0.000000 0.940000 (
0.942446)
scan 520 character string 1.990000 0.000000 1.990000 (
2.020499)
scan-precomp 520 character string 0.010000 0.000000 0.010000 (
0.010869)
unpack 1040 character string 0.980000 0.010000 0.990000 (
0.995709)
scan 1040 character string 2.140000 0.000000 2.140000 (
2.160120)
scan-precomp 1040 character string 0.010000 0.000000 0.010000 (
0.013315)
unpack 2080 character string 1.130000 0.000000 1.130000 (
1.214512)
scan 2080 character string 2.110000 0.000000 2.110000 (
2.132072)
scan-precomp 2080 character string 0.010000 0.000000 0.010000 (
0.011119)
------------------------------------------------------------ total:
15.420000sec

``````                                     user     system      total
``````

## real unpack 130 character string 1.270000 0.000000 1.270000 ( 1.338689) scan 130 character string 2.530000 0.000000 2.530000 ( 2.710398) scan-precomp 130 character string 0.010000 0.000000 0.010000 ( 0.011328) unpack 260 character string 1.350000 0.000000 1.350000 ( 1.445329) scan 260 character string 2.420000 0.000000 2.420000 ( 2.532545) scan-precomp 260 character string 0.000000 0.000000 0.000000 ( 0.010712) unpack 520 character string 1.080000 0.010000 1.090000 ( 1.086219) scan 520 character string 2.120000 0.000000 2.120000 ( 2.128990) scan-precomp 520 character string 0.010000 0.000000 0.010000 ( 0.010815) unpack 1040 character string 1.080000 0.000000 1.080000 ( 1.078558) scan 1040 character string 2.120000 0.000000 2.120000 ( 2.129707) scan-precomp 1040 character string 0.010000 0.000000 0.010000 ( 0.010945) unpack 2080 character string 1.210000 0.000000 1.210000 ( 1.267488) scan 2080 character string 2.460000 0.000000 2.460000 ( 2.627165) scan-precomp 2080 character string 0.010000 0.000000 0.010000 ( 0.012961)

Rick DeNatale

My blog on Ruby

On 9/4/06, Rick DeNatale [email protected] wrote:

Well, I was surprised when I couldn’t seem to find a standard String
method for splitting a String into an array of single character
strings.

Thinking about it again, another way is

string.scan /./

I haven’t benchmarked the two though so I don’t know which is faster.
I guess that’s a task for the to-do list.

It looks like unpack is a clear winner:
rick@frodo:~/rubyscripts\$ cat benchstringsplit.rb
require ‘benchmark’
include Benchmark

class String
def to_chars_array_with_unpack
unpack(‘a’*length)
end

``````    def to_chars_array_with_scan
scan /./
end
``````

end

iterations = 100
str = “abcdefghijklmnopqrstuvwxyz” * 5
bmbm do | x |
5.times do
x.report(“unpack #{str.length} character string”) do
iterations.times do
str.to_chars_array_with_unpack
end
end

``````    x.report("scan #{str.length} character string") do
iterations.times do
str.to_chars_array_with_scan
end
end
str += str
end
``````

end

## rick@frodo:~/rubyscripts\$ ruby benchstringsplit.rb Rehearsal

unpack 130 character string 1.130000 0.010000 1.140000 (
1.139545)
scan 130 character string 2.590000 0.000000 2.590000 (
3.678513)
unpack 260 character string 1.100000 0.000000 1.100000 (
1.672301)
scan 260 character string 2.170000 0.000000 2.170000 (
2.850112)
unpack 520 character string 1.000000 0.000000 1.000000 (
1.808740)
scan 520 character string 2.230000 0.010000 2.240000 (
3.562964)
unpack 1040 character string 1.080000 0.000000 1.080000 (
2.122081)
scan 1040 character string 2.260000 0.000000 2.260000 (
3.962422)
unpack 2080 character string 1.020000 0.000000 1.020000 (
1.518132)
scan 2080 character string 2.120000 0.000000 2.120000 (
3.091133)
------------------------------------------------------ total:
16.720000sec

``````                               user     system      total
``````

real
unpack 130 character string 0.920000 0.000000 0.920000 (
1.348387)
scan 130 character string 2.200000 0.000000 2.200000 (
5.322029)
unpack 260 character string 0.990000 0.000000 0.990000 (
1.363863)
scan 260 character string 2.210000 0.020000 2.230000 (
3.426876)
unpack 520 character string 1.010000 0.000000 1.010000 (
1.761541)
scan 520 character string 2.140000 0.000000 2.140000 (
2.251753)
unpack 1040 character string 1.010000 0.000000 1.010000 (
1.136075)
scan 1040 character string 2.220000 0.000000 2.220000 (
2.373706)
unpack 2080 character string 1.000000 0.000000 1.000000 (
1.083459)
scan 2080 character string 2.130000 0.000000 2.130000 (
2.199584)

Rick DeNatale

My blog on Ruby

On 9/4/06, Rick DeNatale [email protected] wrote:

On 9/4/06, Rick DeNatale [email protected] wrote:

It looks like unpack is a clear winner:

Then on second thought, it looks like the problem with scan was the
construction of the regex, although using scan without pre-compiling
the regex is about twice as slow as unpack, using scan with a
pre-compiled regexp looks like it’s about 100 times faster!

Disregard that last test. It’s been pointed out to me that my code
using the pre-compiled regex was flawed, I used Regexp.new(‘/./’) when
it should have been either Regexp.new(‘.’) or Regexp.new(/./).

Rick DeNatale

My blog on Ruby