Regular Expression

Dobai-Pataky_BSSSSl · November 2, 2010, 10:05pm

I am trying to write a reqular expression to match a word with my input
string.

Input string can be ( inpstr = ABCDD )

I should be able to match all the below words with 0 or 1 A, 0 or 1 B, 0
or 1 C and 0 or 1 or 2 D

A
AC
CA
DAD
BAC
ADD

but not words which have like 2B or 2A
AAB
ABCC
ADDD

I am using an expression like ^[ABCDD]*$,but this matches words like ABB
or ABCC which have like 2B or 2A.

Any suggestion on how I can fix this?

Thanks

diivii · November 2, 2010, 10:35pm

On Tue, Nov 2, 2010 at 9:05 PM, Dv Dasari [email protected] wrote:

I am trying to write a reqular expression to match a word with my input
string.

Input string can be ( inpstr = ABCDD )

Any suggestion on how I can fix this?

You are going to need some more advanced regexes to get a match in the
way
you are hoping for.

I would get out a copy of a good regex cheat sheet, and work through
some
examples in http://rubular.com
http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/pdf/

The main reason for rubular is that you list all of the regexes you want
to
match, and which to reject, and keep tweaking until you get what you
want.

OTOH this regex might scratch your itch:
/^A?B?C?D{0,2}$/

diivii · November 2, 2010, 10:57pm

On Tue, Nov 2, 2010 at 11:34 PM, Richard C.
[email protected] wrote:

On Tue, Nov 2, 2010 at 9:05 PM, Dv Dasari [email protected] wrote:

I am trying to write a reqular expression to match a word with my input
string.

Input string can be ( inpstr = ABCDD )

Any suggestion on how I can fix this?

Take a look at looakahead and lookbehind assertions. They used to
conditionally match when followed or preceded by a given pattern.

OTOH this regex might scratch your itch:
/^A?B?C?D{0,2}$/

Note that the last quantifier here (the {0,2}) only applies to the
last character, the D, not the whole expression.

“ABBC” =~ /^A?B?C?D{0,2}$/
=> nil

HTH,
Ammar

diivii · November 2, 2010, 11:04pm

Kendall G. wrote in post #958838:

On Tue, Nov 2, 2010 at 3:34 PM, Richard C.
[email protected] wrote:

You are going to need some more advanced regexes to get a match in the way
/^A?B?C?D{0,2}$/

The above works so long as each of ABC or D must come in said order,
if present. This goes against the OP’s examples: CA, DAD, and BAC.

My suspicion is that you’re problem isn’t solvable by a regular
expression alone, but that you’ll need to do some parsing (still
possibly using regular expressions in the process).

Yes, you are correct, this expression doesnt match words like CA or BAD
or CAD.

Just wondering if there is an option to say all different combinations
or orders.

diivii · November 2, 2010, 10:58pm

On Tue, Nov 2, 2010 at 3:34 PM, Richard C.
[email protected] wrote:

You are going to need some more advanced regexes to get a match in the way
/^A?B?C?D{0,2}$/

The above works so long as each of ABC or D must come in said order,
if present. This goes against the OP’s examples: CA, DAD, and BAC.

My suspicion is that you’re problem isn’t solvable by a regular
expression alone, but that you’ll need to do some parsing (still
possibly using regular expressions in the process).

diivii · November 2, 2010, 11:28pm

On Tue, Nov 2, 2010 at 4:04 PM, Dv Dasari [email protected] wrote:

My suspicion is that you’re problem isn’t solvable by a regular
expression alone, but that you’ll need to do some parsing (still
possibly using regular expressions in the process).

Yes, you are correct, this expression doesnt match words like CA or BAD
or CAD.

Just wondering if there is an option to say all different combinations
or orders.

In theory, any language/grammar/syntax construct of finite length can
be matched with a regular expression, it’s just that the expression
would, for most things, get really huge fast as the length of said
construct grows.

So, for your example you CAN match it with just ONE regular
expression, but as you’ve noticed, it will be loooong:

Better to just use code in such situations, utilizing simple regex
patterns in the process.

diivii · November 2, 2010, 11:32pm

Afternoon,

On Tue, Nov 2, 2010 at 3:04 PM, Dv Dasari [email protected] wrote:

Unless for some reason you are doing homework or something and need to
use a
regular expression, may I suggest the following instead.

You have a very small string and a very small set of options in terms of
possible letters. Your string can have no more than 5 different
characters
so why not create a “binary” version of your string and then check to
see if
anything appears more than you want?

For example

chars = {}

Why the powers jump by 5 explanation follows

chars[‘a’] = 20
chars[‘b’] = 25
chars[‘c’] = 210
chars[‘d’] = 215

test_string = ‘aabcd’

bit_value = 0

test_string.downcase.each_char{ |c| bit_value = bit_value + chars[c] }

if (bit_value & 949214) != 0 #Magic number explanation to follow
puts ‘Bad string’
else
puts ‘Accepted string’
end

Really what this does is goes character by character and puts each
character
into it’s “bucket”

So we start with a binary value of all zeros

00000 00000 00000 00000 - we use 5 slots (or increase our power of 2 by
5)
for each character because they could appear 5 times each

Going through the test_string - ‘aabcd’

First an a - we add 2**0 which is 1

0+1 = 1

Binary
00000 00000 00000 00001

Next another a - add another 1

1+1 = 2
Binary
00000 00000 00000 00010

b is next - add 2**5 or 32

2+32 = 34
Binary
00000 00000 00001 00010

c - add 2**10 or 1024

34+1024 = 1058
Binary
00000 00001 00001 00010

d - add 2**15 or 32768
Binary
00001 00001 00001 00010

It should be fairly apparent that the location of the 1 in each “bucket”
represents the number of times that character appears in our string.

If you had aaabb then your binary value would be 00000 00000 00010 00100
dddcd would look like 01000 00001 00000 00000

And so on - obviously if we have none of a character then we have no 1
in
that “bucket”

So lets now take a look at the “magic” number of 949214 which has a
binary
rep of

11100 11110 11110 11110

This number represents all the possible locations of the binary digit
one
that you would find unacceptable. Either 5,4, or 3 Ds and 5, 4, 3, or 2
Cs,
Bs, or As.

So now we have

00001 00001 00001 00010 - test value
11100 11110 11110 11110 - our mask of unacceptable options

And when we & (bit wise AND) the two values we get

00000 00000 00000 00010 - we have too many As in this case.

Therefore since this value is above 0 we have a problem. Any valid value
will return zero here.

Not a regular expression - but I believe this would be faster given your
limited set of options and string lengths.

John

diivii · November 3, 2010, 11:45am

On Wed, Nov 3, 2010 at 11:10 AM, Robert K.
[email protected] wrote:

alt: true

end

There are different requirements for each letter. Adapting your
solution:

def check(input)
limits = {‘A’ => 1, ‘B’ => 1, ‘C’ => 1, ‘D’ => 2}
raise ArgumentError, “Illegal chars in sequence: %p” % input unless
/\A[A-D]{0,4}\z/ =~ input
cnt = Hash.new 0
input.scan /./ do |m|
cnt[m] += 1
end

cnt.each do |letter, amount|
raise ArgumentError, “Illegal chars in sequence: %p” % input if
amount > limits[letter]
end
end

Jesus.

diivii · November 3, 2010, 11:11am

On Tue, Nov 2, 2010 at 11:31 PM, John W Higgins [email protected]
wrote:

Afternoon,

chars = {}

So we start with a binary value of all zeros

00000 00000 00000 00000 - we use 5 slots (or increase our power of 2 by 5)
for each character because they could appear 5 times each

Do they? In order to verify that you would need a separate regexp
which limits overall length of the sequence to 4.

But I think your math is wrong. Since you always add the same value
(e.g. 20 for “a”) as you show below, you can store 25 = 32
different values (0 to 31) for each character. The 2**5 thing only
makes sense if you use binary OR and shift the mask for each
character. Am I missing something?

c - add 2**10 or 1024
represents the number of times that character appears in our string.
11100 11110 11110 11110
And when we & (bit wise AND) the two values we get

00000 00000 00000 00010 - we have too many As in this case.

Therefore since this value is above 0 we have a problem. Any valid value
will return zero here.

Not a regular expression - but I believe this would be faster given your
limited set of options and string lengths.

Interesting approach. I would simply have done

def check(input)
raise ArgumentError, “Illegal chars in sequence: %p” % input unless
/\A[A-D]{0,4}\z/ =~ input
cnt = Hash.new 0
input.scan /./ do |m|
raise ArgumentError, “Illegal sequence %p” % input if (cnt[m] += 1)

1
# alt: return false
end

alt: true

end

Kind regards

robert

diivii · November 3, 2010, 5:45pm

On Nov 3, 7:31am, Robert K. [email protected] wrote:

def check(input)
There are different requirements for each letter. Adapting your solution:
end
raise ArgumentError, “Illegal chars in sequence: %p” % input unless
/\A[A-D]{0,4}\z/ =~ input
cnt = {‘A’ => 1, ‘B’ => 1, ‘C’ => 1, ‘D’ => 2}
input.scan /./ do |m|
raise ArgumentError, “Illegal sequence %p” % input if (cnt[m] -= 1) < 0

alt: return false

end

alt: true

end

“We don’t need no stinkin’ loops!”

def check input
return false unless /\A[A-D]{0,4}\z/ =~ input
%w(A B C D).map{|s| input.count s}.zip( [1,1,1,2] ).
map{|a,b| b-a}.all?{|n| n >= 0}
end

diivii · November 3, 2010, 1:31pm

2010/11/3 Jess Gabriel y Galn [email protected]:

alt: return false

end

alt: true

end

There are different requirements for each letter. Adapting your solution:

Good point! I overlooked that.

raise ArgumentError, “Illegal chars in sequence: %p” % input if
amount > limits[letter]
end
end

I’d rather do:

def check(input)
raise ArgumentError, “Illegal chars in sequence: %p” % input unless
/\A[A-D]{0,4}\z/ =~ input
cnt = {‘A’ => 1, ‘B’ => 1, ‘C’ => 1, ‘D’ => 2}
input.scan /./ do |m|
raise ArgumentError, “Illegal sequence %p” % input if (cnt[m] -= 1) <
0

alt: return false

end

alt: true

end

Cheers

robert

diivii · November 4, 2010, 9:11am

On Wed, Nov 3, 2010 at 5:45 PM, w_a_x_man [email protected] wrote:

On Nov 3, 7:31am, Robert K. [email protected] wrote:

alt: true

end

“We don’t need no stinkin’ loops!”

def check input
return false unless /\A[A-D]{0,4}\z/ =~ input
%w(A B C D).map{|s| input.count s}.zip( [1,1,1,2] ).
map{|a,b| b-a}.all?{|n| n >= 0}
end

The question is: what qualifies as a loop? Explicit loops are only
done with “while”, “until” and “for”, Everything else is just a
method call with a block.

According to the strict loop definition my code does not have a loop
either. According to the wide loop definition your code has a lot
more loops than mine. I can spot at least five of them - plus a lot
temporary Array instances.

Cheers

robert

diivii · November 4, 2010, 6:31pm

On Thu, Nov 4, 2010 at 5:46 AM, Mike C. [email protected] wrote:

‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ => DAD
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>

However, this also appears by my test to match “DADD”, “ABA”, etc:

[“ABA”, “DADD”, “CAC”].each do |string|
puts “‘#{string}’ => #{string.match(regex)}”
end
‘DADD’ => DADD
‘ABA’ => ABA
‘CAC’ => CAC

It does get you closer though. I rarely remember to make use of
look-ahead (and look-behind and other “(?X” style patterns) since when
switching languages/regexp engines, I’m never sure what features will
be there (and will still work the same). I guess I’m too conservative,
sticking with core/basic features.

This does make me curious how short of a regex using these features
could be written for this one case…

diivii · November 4, 2010, 6:35pm

On Thu, Nov 4, 2010 at 11:27 AM, Kendall G. [email protected]
wrote:

‘CAD’ => CAD
[“ABA”, “DADD”, “CAC”].each do |string|
sticking with core/basic features.

This does make me curious how short of a regex using these features
could be written for this one case…

Okay, now this is even closer:

regex = /^(A(?=[^A]+$)|B(?=[^B]+$)|C(?=[^C]+$)|D{1,2}(?!D))+$/

However, it still has problems with “DADD”, “DADBD” and such…

diivii · November 4, 2010, 7:11pm

On Nov 3, 10:42am, w_a_x_man [email protected] wrote:

“We don’t need no stinkin’ loops!”

def check input
return false unless /\A[A-D]{0,4}\z/ =~ input
%w(A B C D).map{|s| input.count s}.zip( [1,1,1,2] ).
map{|a,b| b-a}.all?{|n| n >= 0}
end

def check input
/\A[A-D]{0,4}\z/.match input and
%w(A B C D).map{|s| input.count s}.zip( [1,1,1,2] ).
map{|a,b| b-a}.min >= 0
end

diivii · November 4, 2010, 12:46pm

regex = /^(A(?!A)|B(?!B)|C(?!C)|D{1,2}(?!D))+$/

[“ABCDD”,“CA”,“CAD”,“CADD”,“CADDD”,“DAD”,“BAC”,“BBAD”,“AABCCD”].each do
|string|
puts “‘#{string}’ => #{string.match(regex)}”
end

==============
‘ABCDD’ => ABCDD
‘CA’ => CA
‘CAD’ => CAD
‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ => DAD
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>

On Nov 2, 2010, at 5:55 PM, Kendall G. wrote:

OTOH this regex might scratch your itch:
–
Kendall G.
[email protected]

Mike C.

[email protected]

diivii · November 4, 2010, 11:54pm

On Nov 4, 2010, at 1:34 PM, Kendall G. wrote:

‘CA’ => CA

be there (and will still work the same). I guess I’m too conservative,
However, it still has problems with “DADD”, “DADBD” and such…

–
Kendall G.
[email protected]

Oh… missed that detail… I think this covers the bases…

regex = /^(A(?!.*A)|B(?!.*B)|C(?!.*C)|D{1,2}(?!.*D))+$/

[“ABCDD”,“CA”,“CAD”,
“CADD”,“CADDD”,“DAD”,
“BAC”,“BBAD”,“AABCCD”,
“DADD”,“DADBD”,“ABCDDA”,
“MIKE”].each do |string|
puts “‘#{string}’ => #{string.match(regex)}”
end

‘ABCDD’ => ABCDD
‘CA’ => CA
‘CAD’ => CAD
‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ =>
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>
‘DADD’ =>
‘DADBD’ =>
‘ABCDDA’ =>
‘MIKE’ =>

Mike C.

[email protected]

Regular Expression

Why the powers jump by 5 explanation follows

alt: true

alt: true

alt: return false

alt: true

alt: return false

alt: true

alt: return false

alt: true

alt: true

‘CADD’ => CADD ‘CADDD’ => ‘DAD’ => DAD ‘BAC’ => BAC ‘BBAD’ => ‘AABCCD’ =>

============== ‘ABCDD’ => ABCDD ‘CA’ => CA ‘CAD’ => CAD ‘CADD’ => CADD ‘CADDD’ => ‘DAD’ => DAD ‘BAC’ => BAC ‘BBAD’ => ‘AABCCD’ =>

[“ABCDD”,“CA”,“CAD”, “CADD”,“CADDD”,“DAD”, “BAC”,“BBAD”,“AABCCD”, “DADD”,“DADBD”,“ABCDDA”, “MIKE”].each do |string| puts “‘#{string}’ => #{string.match(regex)}” end

‘ABCDD’ => ABCDD ‘CA’ => CA ‘CAD’ => CAD ‘CADD’ => CADD ‘CADDD’ => ‘DAD’ => ‘BAC’ => BAC ‘BBAD’ => ‘AABCCD’ => ‘DADD’ => ‘DADBD’ => ‘ABCDDA’ => ‘MIKE’ =>

‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ => DAD
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>

==============
‘ABCDD’ => ABCDD
‘CA’ => CA
‘CAD’ => CAD
‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ => DAD
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>

[“ABCDD”,“CA”,“CAD”,
“CADD”,“CADDD”,“DAD”,
“BAC”,“BBAD”,“AABCCD”,
“DADD”,“DADBD”,“ABCDDA”,
“MIKE”].each do |string|
puts “‘#{string}’ => #{string.match(regex)}”
end

‘ABCDD’ => ABCDD
‘CA’ => CA
‘CAD’ => CAD
‘CADD’ => CADD
‘CADDD’ =>
‘DAD’ =>
‘BAC’ => BAC
‘BBAD’ =>
‘AABCCD’ =>
‘DADD’ =>
‘DADBD’ =>
‘ABCDDA’ =>
‘MIKE’ =>