Re: Reverse the Polarity (#143)

I had fun with this quiz.

/Ruby (should be|better be|is) (very )* fun[…!] (You see?)?/.generate

=>["Ruby should be fun. ", “Ruby should be fun. You see?”, "Ruby
should be fun! ", “Ruby should be fun! You see?”, "Ruby should be very
fun. ", “Ruby should be very fun. You see?”, "Ruby should be very fun!
", “Ruby should be very fun! You see?”, "Ruby should be very very fun.
", “Ruby should be very very fun. You see?”, "Ruby should be very very
fun! ", “Ruby should be very very fun! You see?”, "Ruby should be very
very very fun. ", “Ruby should be very very very fun. You see?”, "Ruby
should be very very very fun! ", “Ruby should be very very very fun!
You see?”, "Ruby should be very very very very fun. ", “Ruby should be
very very very very fun. You see?”, "Ruby should be very very very very
fun! ", “Ruby should be very very very very fun! You see?”, "Ruby
should be very very very very very fun. ", “Ruby should be very very
very very very fun. You see?”, "Ruby should be very very very very very
fun! ", “Ruby should be very very
very very very fun! You see?”, "Ruby better be fun. ", “Ruby better
be fun. You see?”, "Ruby better be fun! ", “Ruby better be fun! You
see?”, "Ruby better be very fun. ", “Ruby better be very fun. You
see?”, "Ruby better be very fun! ", “Ruby better be very fun! You
see?”, "Ruby better be very very fun. ", “Ruby better be very very
fun. You see?”, "Ruby better be very very fun! ", “Ruby better be very
very fun! You see?”, "Ruby better be very very very fun. ", “Ruby
better be very very very fun. You see?”, "Ruby better be very very very
fun! ", “Ruby better be very very very fun! You see?”, "Ruby better be
very very very very fun. ", “Ruby better be very very very very fun.
You see?”, "Ruby better be very very very very fun! ", “Ruby better be
very very very very fun! You see?”, "Ruby better be very very very very
very fun. ", “Ruby better be very very very very very fun. You see?”,
"Ruby better be very very very very very fun!
", “Ruby better be very very very very very fun! You see?”, "Ruby is
fun… ", “Ruby is fun. You see?”, "Ruby is fun! ", “Ruby is fun! You
see?”, "Ruby is very fun. ", “Ruby is very fun. You see?”, "Ruby is
very fun! ", “Ruby is very fun! You see?”, "Ruby is very very fun. ",
“Ruby is very very fun. You see?”, "Ruby is very very fun! ", “Ruby is
very very fun! You see?”, "Ruby is very very very fun. ", “Ruby is
very very very fun. You see?”, "Ruby is very very very fun! ", “Ruby
is very very very fun! You see?”, "Ruby is very very very very fun. ",
“Ruby is very very very very fun. You see?”, "Ruby is very very very
very fun! ", “Ruby is very very very very fun! You see?”, "Ruby is
very very very very very fun. ", “Ruby is very very very very very
fun. You see?”, "Ruby is very very very very very fun! ", “Ruby is very
very very very very fun! You see?”]

Thank you to Evan for making the comment about nesting. My solution
broke on nesting. Luckily, that was easy to fix.

/(a+(bc)?){1,2}/.generate
=>[“a”, “abc”, “aa”, “aabc”, “aaa”, “aaabc”, “aaaa”, “aaaabc”, “aaaaa”,
“aaaaabc”, “aa”, “aabc”, “aaa”, “aaabc”, “aaaa”, “aaaabc”, “aaaaa”,
“aaaaabc”, “aaaaaa”, “aaaaaabc”, “abca”, “abcabc”, “abcaa”, “abcaabc”,
“abcaaa”, “abcaaabc”, “abcaaaa”, “abcaaaabc”, “abcaaaaa”, “abcaaaaabc”,
“aaa”, “aaabc”, “aaaa”, “aaaabc”, “aaaaa”, “aaaaabc”, “aaaaaa”,
“aaaaaabc”, “aaaaaaa”, “aaaaaaabc”, “aabca”, “aabcabc”, “aabcaa”,
“aabcaabc”, “aabcaaa”, “aabcaaabc”, “aabcaaaa”, “aabcaaaabc”,
“aabcaaaaa”, “aabcaaaaabc”, “aaaa”, “aaaabc”, “aaaaa”, “aaaaabc”,
“aaaaaa”, “aaaaaabc”, “aaaaaaa”, “aaaaaaabc”, “aaaaaaaa”, “aaaaaaaabc”,
“aaabca”, “aaabcabc”, “aaabcaa”, “aaabcaabc”, “aaabcaaa”, “aaabcaaabc”,
“aaabcaaaa”, “aaabcaaaabc”, “aaabcaaaaa”, “aaabcaaaaabc”, “aaaaa”,
“aaaaabc”, “aaaaaa”, “aaaaaabc”, “aaaaaaa”, “aaaaaaabc”, “aaaaaaaa”,
“aaaaaaaabc”, “aaaaaaaaa”, “aaaaaaaaabc”, “aaaabca”, “aaaabcabc”,
“aaaabcaa”, “aaaabcaabc”, “aaaabcaaa”, “aaaabcaaabc”, “aaaabcaaaa”,
“aaaabcaaaabc”, “aaaabcaaaaa”, “aaaabcaaaaabc”, “aaaaaa”, “aaaaaabc”,
“aaaaaaa”, “aaaaaaabc”, “aaaaaaaa”, “aaaaaaaabc”, “aaaaaaaaa”,
“aaaaaaaaabc”, “aaaaaaaaaa”, “aaaaaaaaaabc”, “aaaaabca”, “aaaaabcabc”,
“aaaaabcaa”, “aaaaabcaabc”, “aaaaabcaaa”, “aaaaabcaaabc”, “aaaaabcaaaa”,
“aaaaabcaaaabc”, “aaaaabcaaaaa”, “aaaaabcaaaaabc”]

My solution uses a macro-character system inspired by the Lisp parser.
Every time it encounters a special character, it calls a function
associated with that character, passing in both the tree it’s
encountered so far, and the reader. This tree isn’t actually a physical
tree, however. Instead, it consists of functions that generate all
possible combinations of a segment of the regex. Each function is
curried with the relevant arguments so as to require no new arguments.
For example, the regex-segment “a{1,2}” becomes quantified_range curried
with the range 1…2 and the method quote, which is in turn curried with
“a”. The array that is passed in to the macro-character functions
consists of all of those trees of curried functions that have been seen
so far; these will eventually be curried into the capturing_group method
by the parse_regex method. A capturing group, in this case, is not only
anything between unescaped parentheses but also the top level of
the regex itself.

My solutions supports most of regex syntax. Non-capturing groups should
be an unmissed exception, as should be more exotic quoting.
Unfortunately, I could not think of a way to add in backreferencing
capturing groups. My first thoughts involved having a global array
referencing all capturing groups and then having backreferencing refer
to that, but I couldn’t think of a way so that each backreference would
know which characters were matched by its capturing group in that
particular match. Having the tree be evalutated sequentially rather
than layer-by-layer as it is now (i.e.: rather than having the
possibilities for each segment evaluated and then applying higher
functions on those possibilities, do something more of an inorder
traversal, so that all possibilities if a capturing group matches some
characters are generated before the possibilities of the next
possibility for the capturing group are encountered) was briefly thought
of, but that would
make this solution several times more complex (i.e.: needing actual
tree nodes rather than curried methods).

A few notes: Since + and * can match infinite amounts of characters, I
just have them match upto the value of INF_QUANTIFIER_LEN, which is set
to 5. Additionally, I anything along the lines of /|a/ will not work
(though /a|/ does work); however, should I bother, I can make a
work-around with ease.

Anyway, here’s the code:

INF_QUANTIFIER_LEN = 5

module Invokable
def curry(*largs)
proc {|args| self.call((largs+args))}
end
end
class Method
include Invokable
end

$escape_seqs = {
?a => ?\a,
?b => ?\b,
?f => ?\f,
?n => ?\n,
?r => ?\r,
?t => ?\t,
?v => ?\v
}

$predefined_classes = {
?d => (?0…?9).to_a,
?D => (0…255).to_a - (?0…?9).to_a,
?s => [" “[0], ?\t,?\n,?\v,?\r],
?S => (0…255).to_a - [” "[0], ?\t,?\n,?\v,?\r],
?w => (?a…?z).to_a + (?A…?Z).to_a + (?0…?9).to_a + [?],
?W => (0…255).to_a -
((?a…?z).to_a + (?A…?Z).to_a + (?0…?9).to_a + [?
])
}

###Given a StringIO removes the next character if it’s a ?
def remove_reluctant(strio)
return if strio.eof?
if (ch=strio.getc) == ??
#Do nothing
else
strio.ungetc(ch)
end
end

###Given a StringIO, returns the everything until an unnested closed
parenthesis is read
def get_outer_layer(strio)
str = “”
nest = 0
until (ch=strio.read(1)) == ‘)’ and nest == 0
str << ch
str << strio.read(1) if ch == “\”
nest += 1 if ch == ‘(’
nest -= 1 if ch == ‘)’
end
str
end

###Returns an array whose elements are subarrays containing all distinct
###combinations of one element from each argument
def all_combinations(first, *rest)
return first if rest == []
rest = all_combinations(*rest)
combs = []
first.each do |v1|
rest.each do |v2|
combs << v1 + v2
end
end
combs
end

###The following methods return an array of all valid matches to the
entity.

###Note: The functions corresponding to regex operators that operate on
###valid subregexes accept curried functions that return the values to
operate on,
###not the values themselves. (That’s why the quote function exists.)

def char_class(ascii_vals)
ascii_vals.map{|i|i.chr}
end

def or(left, right)
left.call+right.call
end

def quantified_range(range, vals)
vals = [vals.call] * range.end
range.to_a.map{|n|n == 0 ? “” :
all_combinations(*vals[0…(n-1)])}.flatten
end

def capturing_group(vals)
all_combinations(*vals.map{|f|f.call})
end

def quote(val)
[val]
end

###Following is a hash that maps characters to procedures accepting the
###previously-encountered entities and the StringIO of the regex-source
reader.
###These procedures add to prev curried functions that a form a tree of
functions
###that return all possible values
$macro_chars = {
?\ => proc do |prev, strio|
ch = strio.getc
prev << if $predefined_classes.has_key? ch
method(:char_class).curry($predefined_classes[ch])
elsif $escape_seqs.has_key? ch
method(:quote).curry($escape_seqs[ch])
else
method(:quote).curry(ch.chr)
end
end,
?. => proc do |prev, strio|
prev << method(:char_class).curry((0…255).to_a)
end,
?[ => proc do |prev,strio|
ascii_vals = []

        char_str = strio.gets("]")[0...-1]

        neg = if char_str[0] == ?^
                  char_str = char_str[1..-1]
                  true
                else
                  false
                end

        ##The next three lines handle escape characters. \- is a 

special case
char_str.gsub!(/\-/) {ascii_vals << ?-; “”}
char_str.gsub!(/\(.)/) {
$escape_seqs.has_key?($1[0]) ? $escape_seqs[$1[0]] : $1}
char_str.scan(/.-.|./) do |seg|
if seg =~ /(.)-(.)/
ascii_vals += (($1[0])…($2[0])).to_a
else
ascii_vals << seg[0]
end
end
prev << method(:char_class).curry(
neg ? (0…255).to_a - ascii_vals : ascii_vals)
end,
?( => proc do |prev,strio|
prev << parse_regex(get_outer_layer(strio))
end,
?* => proc do |prev, strio|
remove_reluctant(strio)
prev[-1] = method(:quantified_range).
curry(0…INF_QUANTIFIER_LEN, prev[-1])
end,
?+ => proc do |prev, strio|
remove_reluctant(strio)
prev[-1] = method(:quantified_range).
curry(1…INF_QUANTIFIER_LEN, prev[-1])
end,
?? => proc do |prev, strio|
remove_reluctant(strio)
prev[-1] = method(:quantified_range).
curry(0…1, prev[-1])
end,
?{ => proc do |prev, strio|
remove_reluctant(strio)
contents = strio.gets(“}”)
prev[-1] = if contents =~ /(\d),(\d)/
method(:quantified_range).curry(($1.to_i)…($2.to_i),
prev[-1])
elsif contents=~ /(\d),/
method(:quantified_range).curry(
($1.to_i)…INF_QUANTIFIER_LEN,
prev[-1])
elsif contents =~ /(\d)/
method(:quantified_range).curry(($1.to_i)…($1.to_i),
prev[-1])
end
end,
?| => proc do |prev, strio|
prev[0…-1] = method(:or).curry(
method(:capturing_group).curry(prev[0…-1]),
parse_regex(strio.gets(nil)))
end
}

def parse_regex(src)
return method(:quote).curry(“”) if src == “” or src.nil?
func_arr = []
strio = StringIO.new(src)
until strio.eof?
ch = strio.getc
if $macro_chars.has_key? ch
$macro_chars[ch].call(func_arr, strio)
else
func_arr << method(:quote).curry(ch.chr)
end
end
method(:capturing_group).curry(func_arr)
end

class Regexp
def generate
parse_regex(self.source).call
end
end

  ____________________________________________________________________________________

Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user
panel and lay it on us.
http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7