Reverse the Polarity (#143)

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this quiz
    until
    48 hours have passed from the time on this message.

  2. Support Ruby Q. by submitting ideas as often as you can:

http://www.rubyquiz.com/

  1. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Benjohn B.

Usually, you write a regular expression and then search for it in a text
string.
How about doing the opposite? Given a regular expression, generate all
of the
strings that it will match.

Rather amazingly, this has independently come up a couple of times in
conversations with friends, so perhaps it’s actually useful. I wanted to
use it
to specify possible domain names for a potential new company…

/(lovely|delicious|splendid)(food|snacks|munchies)/.generate
=> [lovelyfood, deliciousfood, splendidfood,
lovelysnacks, delicioussnacks, splendidsnacks,
lovelymunchies, deliciousmunchies, splendidmunchies]

The full regular expression grammar is pretty complex, so maybe you’ll
want to
just go with a small subset. :slight_smile:

Looks like its been 2 days…
Did not have a lot of time this weekend, so my solution just implements
operators used by the author:

require ‘strscan’

class Regexp

Create a list of all values matching the regex.

Currently only supports groupings and ‘|’

def generate
regex = self.inspect
regex = regex.slice(1, regex.size - 2) # Remove leading/trailing
‘/’’
s = StringScanner.new(regex)

# Build a list containing each grouping, and blocks in between
groups = []
result = ''
while (result = s.scan_until(/\(|\)/)) != nil
  result = result.sub("(", "") # Does not support '\' escape chars

  if result.size > 0
    if result[-1].chr == ")"
      groups << result.split("|")
      groups[-1][-1] = groups[-1][-1].sub(")", "")
    else
      groups << [result]
    end
  end
end

# Create all combinations of those groups and return them
find_list_combinations(groups)

end

Return an array of all combinations of values in given list of lists

def find_list_combinations(lists)
lists.reverse!
results = lists.pop
list = lists.pop

while list != nil
  new_results = []
  for result in results
    for item in list
      new_result = Array.new([result])
      new_result << item
      new_results << new_result
    end
  end

  results = new_results
  list = lists.pop
end

new_results.map{|list| list.flatten.join }

end
end

Examples of usage:

/(lovely|delicious|splendid)(food|snacks|munchies)/.generate
/Hello, my name is (Linus|Yukihiro|Dennis)
(Torvalds|Matsumoto|Ritchie)/.generate

Thanks,

Justin

On 10/12/07, Ruby Q. [email protected] wrote:

by Benjohn B.

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

Hi, this is my solution. For the design I took the following decisions:

  • I will limit the number of repetitions provided by * and + to a
    constant. In my examples I’ve set it to 2

  • I will limit the set of characters generated by . and [^…]
    constructs to a fixed list of chars. For my tests I used CHARS = %w{a
    b c d e}, but also tested with
    #CHARS = [(“a”…“z”).to_a, (“A”…“Z”).to_a, “.”, “,”, “;”].flatten
    but the results can be big !!

I have the concept of repeaters and groups. A repeteater is a class
that knows how to repeat the result of the groups it contains. I’ve
implemented OneTimeRepeater for the groups without repetition
modifier, StarRepeater, PlusRepeater and QuestionMarkRepeater for the
appropriate modifiers. RangeRepeater takes care of the {a,b} modifier.

Groups: a group represents a piece of the string for which we can
geneate a set of possible values. The groups I have implemented are
SingleChar, CharGroup (don’t support ranges yet, just a list of chars)
and Dot. I haven’t implemented escaped characters like \s, \d, etc,
but I think they are trivial to implement. I have a couple of special
groups to cope with parens and |. MultiGroup takes care of the parens
nesting by combining the result of the contained repeaters, and
OrGroup adds together the result of the contained groups.

Things I don’t support:

  • Ranges in char groups [a-zA-Z]
  • non-greedy repetitions: I don’t even know how to do this, cause I
    think you have to take into account what you are going to generate
    after the group
  • Character classes: \s, \d, etc and [:alpha:], etc, but I think they
    are easy to implement
  • Backreferences: /(abc)\1/, didn’t even thought about them, but
    probably a nightmare.
  • Other things I don’t even know :-).

Most of the interesting stuff is in the parse and combine methods. The
first one understands the syntax and creates groups and repeaters.
Uses recursive calls for parens and |. The combine method is able to
combine an array of arrays to produce all the possible combinations:

[[“”, “a”], [“b”, “bb”]] → [“b”, “bb”, “ab”, “abb”]

Here it is:

Number of times to repeat for Star and Plus repeaters

TIMES = 2

Set of chars for Dot and negated [^] char groups

#CHARS = [(“a”…“z”).to_a, (“A”…“Z”).to_a, “.”, “,”, “;”].flatten
CHARS = %w{a b c d e}

class OneTimeRepeater
def initialize(group)
@group = group
end

def result
@group.result
end
end

class StarRepeater
def initialize(group)
@group = group
end

def result
r = []
group_res = @group.result
group_res.unshift(“”)
TIMES.times do
r << group_res
end
combine(r).uniq
end
end

class PlusRepeater
def initialize(group)
@group = group
end

def result
group_res = @group.result
r = [group_res]
temp = [“”].concat(group_res)
(TIMES - 1).times do
r << temp
end
combine(r).uniq
end
end

class QuestionMarkRepeater
def initialize(group)
@group = group
end

def result
@group.result.unshift(“”)
end
end

class RangeRepeater
def initialize(group,min,max)
@group = group
@min = min
@max = max
end

def result
result = @group.result
r = []
r << [“”] if @min == 0
@min.times {r << result}
temp = result.dup.unshift(“”)
(@max - @min).times do
r << temp
end
combine(r).uniq
end
end

class SingleChar
def initialize(c)
@c = c
end
def result
[@c]
end
end

TODO: support ranges [a-zA-Z]

class CharGroup
def initialize(chars)
@negative = chars[0] == “^”
@chars = chars
@chars = @chars[1…-1] if @negative
end
def result
if @negative
CHARS - @chars
else
@chars
end
end
end

class Dot
def result
CHARS
end
end

class MultiGroup
def initialize(groups)
@groups = groups
end

def result
strings = @groups.map {|x| x.result}
combine(strings)
end
end

class OrGroup
def initialize(first_groupset, second_groupset)
@first = first_groupset
@second = second_groupset
end

def result
strings = @first.map {|x| x.result}
s = combine(strings)
strings = @second.map {|x| x.result}
s.concat(combine(strings))
end
end

Combines arrays, calling + on each possible pair

Starts from the first two arrays, then goes on

combining another array to the result

def combine(arrays)
string = arrays.inject do |r, rep|
temp = []
r.each {|aa| rep.each {|bb| temp << (aa + bb)}}
temp
end
string
end

def parse(s, i = 0)
repeaters = []
group = nil
while i < s.length
char = s[i].chr
case char
when ‘(’
groups, i = parse(s, i+1)
group = MultiGroup.new(groups)
when ‘)’
return repeaters,i
when ‘[’
chars = []
i += 1
until s[i].chr == ‘]’
chars << s[i].chr
i += 1
end
group = CharGroup.new(chars)
when ‘.’
group = Dot.new
when ‘|’
groups, i = parse(s, i + 1)
group = OrGroup.new(repeaters, groups)
return [group], i
else
group = SingleChar.new(char)
end

repeater = nil
i += 1
if i < s.length
  case s[i].chr
  when '*'
    repeater = StarRepeater.new(group)
  when '+'
    repeater = PlusRepeater.new(group)
  when '?'
    repeater = QuestionMarkRepeater.new(group)
  when '{'
    first = ""
    i += 1
    while s[i].chr != ","
      first << s[i].chr
      i += 1
    end
    i += 1
    second = ""
    while s[i].chr != "}"
      second << s[i].chr
      i += 1
    end
    repeater = RangeRepeater.new(group, first.to_i, second.to_i)
  else
    repeater = OneTimeRepeater.new(group)
    i -= 1
  end
  i += 1
else
  repeater = OneTimeRepeater.new(group)
end
repeaters << repeater

end
return repeaters, i
end

class Regexp
def generate
r = self.inspect[1…-2]
repeaters, _ = parse(r)
strings = repeaters.map {|x| x.result}
s = combine(strings)
s
end
end

def show(regexp)
s = regexp.generate
puts “#{regexp.inspect} → #{s.inspect}”
#puts “Checking…”
#errors = s.reject {|string| string =~ regexp}
#if errors.size == 0

puts “All strings match”

#else

puts “These don’t match: #{errors.inspect}”

#end

end

Some tests with TIMES=2 and CHARS = %w{a b c d e}:

show(/ab+[def]?[ghi]j/)
show(/abc+/)
show(/ab(c+(de)
)?f/)
show(/a{0,3}/)
show(/a|b|c/)
show(/ab(c)+|xy*|jjjk+[^jk]/)
show(/(lovely|delicious|splendid)(food|snacks|munchies)/)

/ab+[def]?[ghi]j/ → [“abgj”, “abhj”, “abij”, “abdgj”, “abdhj”,
“abdij”, “abegj”, “abehj”, “abeij”, “abfgj”, “abfhj”, “abfij”,
“abbgj”, “abbhj”, “abbij”, “abbdgj”, “abbdhj”, “abbdij”, “abbegj”,
“abbehj”, “abbeij”, “abbfgj”, “abbfhj”, “abbfij”]

/ab*c+/ → [“ac”, “acc”, “abc”, “abcc”, “abbc”, “abbcc”]

/ab(c+(de)*)?f/ → [“abf”, “abcf”, “abcdef”, “abcdedef”, “abccf”,
“abccdef”, “abccdedef”]

/a{0,3}/ → [“”, “a”, “aa”, “aaa”]

/a|b|c/ → [“a”, “b”, “c”]

/ab(c)+|xy*|jjjk+[^jk]/ → [“abc”, “abcc”, “x”, “xy”, “xyy”, “jjjka”,
“jjjkb”, “jjjkc”, “jjjkd”, “jjjke”, “jjjkka”, “jjjkkb”, “jjjkkc”,
“jjjkkd”, “jjjkke”]

/(lovely|delicious|splendid)(food|snacks|munchies)/ → [“lovelyfood”,
“lovelysnacks”, “lovelymunchies”, “deliciousfood”, “delicioussnacks”,
“deliciousmunchies”, “splendidfood”, “splendidsnacks”,
“splendidmunchies”]

It was fun.

Regards,

Jesus.

On Oct 12, 7:17 am, Ruby Q. [email protected] wrote:

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

I’ve had no time to implement the actual quiz. (The quiz is very
interesting, though; I hope to find time to tackle it soon.)

However, I thought I’d post a piece of code I wrote for Quiz 48, since
it is related:

class Array
def random
self[ rand( self.length ) ]
end
end

class String
def variation( values={} )
out = self.dup
while out.gsub!( /(([^())?]+))(?)?/ ){
( $2 && ( rand > 0.5 ) ) ? ‘’ : $1.split( ‘|’ ).random
}; end
out.gsub!( /:(#{values.keys.join(‘|’)})\b/ ){ values[$1.intern] }
out.gsub!( /\s{2,}/, ’ ’ )
out
end
end

Used like so:

irb(main):023:0> str = “(lovely|delicious|splendid)(food|snacks|
munchies)”
irb(main):024:0> 10.times{ puts str.variation }
deliciousmunchies
delicioussnacks
splendidsnacks
splendidmunchies
splendidsnacks
deliciousmunchies
lovelysnacks
splendidsnacks
lovelyfood
delicioussnacks

irb(main):025:0> str = "(How (much|many)|What) is (the (value|result)
of)? 67?"
irb(main):026:0> 10.times{ puts str.variation }
How many is 6
7?
How much is the value of 67?
How much is 6
7?
What is 67?
How many is the result of 6
7?
What is the value of 67?
What is the value of 6
7?
How many is 67?
What is 6
7?
What is 6*7?

On 10/15/07, Jesús Gabriel y Galán [email protected] wrote:

On 10/12/07, Ruby Q. [email protected] wrote:

by Benjohn B.

Usually, you write a regular expression and then search for it in a text string.
How about doing the opposite? Given a regular expression, generate all of the
strings that it will match.

I made a second version, supporting:

  • Ranges in char groups [a-zA-Z]
  • Backreferences: /(abc)\1/, didn’t even thought about them, but
    probably a nightmare.

I also support the {} modifier, previously I only supported
{,}.

Backreferences were indeed a nightmare, but I managed a solution. I
tried first to integrate them seemlessly in my architecture with no
luck, since the generation of the groups is isolated from one another,
and there was no easy way. So I settled for the following:

Multigroups will carry a number depending on appearance.
When I generate the result for a multigroup, I add the result of that
group to each string.
When I generate a backreference, I generate a special syntax
() to gsub in a second step, when all groups have been
generated and stored. This can of course fail if the regexp matches
that syntax, since I will later on try to sub things that I shouldn’t
(any ideas here are welcome).
In the second step I gsub every occurrence of with the
appropriate group value for that string.

Here’s the code and some examples of the new things:

Number of times to repeat for Star and Plus repeaters

TIMES = 2

Set of chars for Dot and negated [^] char groups

#CHARS = [(“a”…“z”).to_a, (“A”…“Z”).to_a, “.”, “,”, “;”].flatten
CHARS = %w{a b c d e}

class OneTimeRepeater
def initialize(group)
@group = group
end

def result
@group.result
end
end

class StarRepeater
def initialize(group)
@group = group
end

def result
r = []
group_res = @group.result
group_res.unshift(“”)
TIMES.times do
r << group_res
end
combine(r).uniq
end
end

class PlusRepeater
def initialize(group)
@group = group
end

def result
group_res = @group.result
r = [group_res]
temp = [“”].concat(group_res)
(TIMES - 1).times do
r << temp
end
combine(r).uniq
end
end

class QuestionMarkRepeater
def initialize(group)
@group = group
end

def result
@group.result.unshift(“”)
end
end

class RangeRepeater
def initialize(group,min,max)
@group = group
@min = min
@max = max
end

def result
result = @group.result
r = []
r << [“”] if @min == 0
@min.times {r << result}
temp = result.dup.unshift(“”)
if @max
(@max - @min).times do
r << temp
end
end
combine(r).uniq
end
end

class SingleChar
def initialize(c)
@c = c
end
def result
[@c]
end
end

class CharGroup
def initialize(chars)
@chars = chars
if chars[0] == “^”
@negative = true
@chars = @chars[1…-1]
else
@negative = false
end

# Ranges a-b
# save first and last "-" if present
first = nil
last = nil
first = @chars.shift if @chars.first == "-"
last = @chars.pop if @chars.last == "-"
while i = @chars.index("-")
  @chars[i-1..i+1] = (@chars[i-1]..@chars[i+1]).to_a
end
# restore them back
@chars.unshift(first) if first
@chars.push(last) if last

end
def result
if @negative
CHARS - @chars
else
@chars
end
end
end

class Dot
def result
CHARS
end
end

class MultiGroup
attr_reader :group_num
def initialize(groups, group_num)
@groups = groups
@group_num = group_num
end

Generates the result of each contained group

and adds the filled group of each result to

itself

def result
strings = @groups.map {|x| x.result}
result = combine(strings)
result.each {|x| x.add_filled_group(@group_num, x)}
result
end
end

class OrGroup
def initialize(first_groupset, second_groupset)
@first = first_groupset
@second = second_groupset
end

def result
strings = @first.map {|x| x.result}
s = combine(strings)
strings = @second.map {|x| x.result}
s.concat(combine(strings))
end
end

class BackReference
attr_reader :num
def initialize(num)
@num = num
end

def result
[“#{@num}”]
end
end

Combines arrays, concatenating each string

merging the possible groups they have

Starts combining the first two arrays, then goes on

combining each other array to the result of the

previous combination

def combine(arrays)
string = arrays.inject do |r, rep|
temp = []
r.each {|aa| rep.each {|bb| temp <<
(aa.concat_and_merge_groups(bb))}}
temp
end
string
end

class String
attr_accessor :filled_groups

def add_filled_group(num, group)
@filled_groups ||= {}
@filled_groups[num] = group
end

def concat_and_merge_groups(other)
temp = self + other
groups = {}
groups.merge!(self.filled_groups) if self.filled_groups
groups.merge!(other.filled_groups) if other.filled_groups
temp.filled_groups = groups
temp
end

end

class Regexp
attr_reader :num_groups
def parse(s, i = 0)
repeaters = []
group = nil
while i < s.length
char = s[i].chr
case char
when ‘(’
num = @num_groups + 1
@num_groups += 1
groups, i = parse(s, i+1)
group = MultiGroup.new(groups, num)
when ‘)’
return repeaters,i
when ‘[’
chars = []
i += 1
until s[i].chr == ‘]’
chars << s[i].chr
i += 1
end
group = CharGroup.new(chars)
when ‘.’
group = Dot.new
when ‘|’
groups, i = parse(s, i + 1)
group = OrGroup.new(repeaters, groups)
return [group], i
when ‘\’
i += 1
p s[i…-1]
m = s[i…-1].match(/^(\d+)/)
if m
group = BackReference.new(m[0].to_i)
i += m[0].size - 1
end
else
group = SingleChar.new(char)
end

  repeater = nil
  i += 1
  if i < s.length
    case s[i].chr
    when '*'
      repeater = StarRepeater.new(group)
    when '+'
      repeater = PlusRepeater.new(group)
    when '?'
      repeater = QuestionMarkRepeater.new(group)
    when '{'
      m = s[i..-1].match(/\{(\d+)(,(\d+))?\}/)
      first = m[1].to_i if m[1]
      second = m[3].to_i if m[3]
      repeater = RangeRepeater.new(group, first, second)
      i += m[0].size - 1
    else
      repeater = OneTimeRepeater.new(group)
      i -= 1
    end
    i += 1
  else
    repeater = OneTimeRepeater.new(group)
  end
  repeaters << repeater
end
return repeaters, i

end

def generate
@num_groups = 0
r = self.inspect[1…-2]
repeaters, _ = self.parse(r)
strings = repeaters.map {|x| x.result}
s = combine(strings)
# Makes a pass for the backreferences
s.each do |string|
string.gsub!(/(\d+)/) do |match|
string.filled_groups[$1.to_i]
end
end
s
end
end

def show(regexp)
s = regexp.generate
puts “#{regexp.inspect} → #{s.inspect}”
puts “Checking…”
errors = s.reject {|string| string =~ regexp}
if errors.size == 0
puts “All strings match”
else
puts “These don’t match: #{errors.inspect}”
end
end

Samples:

show(/a{3}bcd/)
show(/a[a-d1-9-]/)
show(/a[-a-c1-3]/)
show(/a(b|c)d\1xyz/)
show(/(a)(b)(c)(d)(e)(f)(g)(h)(i)(jjjj|xxxx)(k)\10/)

/a{3}bcd/ → [“aaabcd”]

/a[a-d1-9-]/ → [“aa”, “ab”, “ac”, “ad”, “a1”, “a2”, “a3”, “a4”,
“a5”, “a6”, “a7”, “a8”, “a9”, “a-”]

/a[-a-c1-3]/ → [“a-”, “aa”, “ab”, “ac”, “a1”, “a2”, “a3”]

/a(b|c)d\1xyz/ → [“abdbxyz”, “acdcxyz”]

/(a)(b)(c)(d)(e)(f)(g)(h)(i)(jjjj|xxxx)(k)\10/ →
[“abcdefghijjjjkjjjj”, “abcdefghixxxxkxxxx”]

Regards,

Jesus.