Afternoon,

On Tue, Nov 2, 2010 at 3:04 PM, Dv Dasari [email protected] wrote:

Unless for some reason you are doing homework or something and need to

use a

regular expression, may I suggest the following instead.

You have a very small string and a very small set of options in terms of

possible letters. Your string can have no more than 5 different

characters

so why not create a “binary” version of your string and then check to

see if

anything appears more than you want?

For example

chars = {}

# Why the powers jump by 5 explanation follows

chars[‘a’] = 2**0**

chars[‘b’] = 25

chars[‘c’] = 2**10**

chars[‘d’] = 215

test_string = ‘aabcd’

bit_value = 0

test_string.downcase.each_char{ |c| bit_value = bit_value + chars[c] }

if (bit_value & 949214) != 0 #Magic number explanation to follow

puts ‘Bad string’

else

puts ‘Accepted string’

end

Really what this does is goes character by character and puts each

character

into it’s “bucket”

So we start with a binary value of all zeros

00000 00000 00000 00000 - we use 5 slots (or increase our power of 2 by

5)

for each character because they could appear 5 times each

Going through the test_string - ‘aabcd’

First an a - we add 2**0 which is 1

0+1 = 1

Binary

00000 00000 00000 00001

Next another a - add another 1

1+1 = 2

Binary

00000 00000 00000 00010

b is next - add 2**5 or 32

2+32 = 34

Binary

00000 00000 00001 00010

c - add 2**10 or 1024

34+1024 = 1058

Binary

00000 00001 00001 00010

d - add 2**15 or 32768

Binary

00001 00001 00001 00010

It should be fairly apparent that the location of the 1 in each “bucket”

represents the number of times that character appears in our string.

If you had aaabb then your binary value would be 00000 00000 00010 00100

dddcd would look like 01000 00001 00000 00000

And so on - obviously if we have none of a character then we have no 1

in

that “bucket”

So lets now take a look at the “magic” number of 949214 which has a

binary

rep of

11100 11110 11110 11110

This number represents all the possible locations of the binary digit

one

that you would find unacceptable. Either 5,4, or 3 Ds and 5, 4, 3, or 2

Cs,

Bs, or As.

So now we have

00001 00001 00001 00010 - test value

11100 11110 11110 11110 - our mask of unacceptable options

And when we & (bit wise AND) the two values we get

00000 00000 00000 00010 - we have too many As in this case.

Therefore since this value is above 0 we have a problem. Any valid value

will return zero here.

Not a regular expression - but I believe this would be faster given your

limited set of options and string lengths.

John