Match against multiple patterns problem

What I would want to do, if it were legal code would be:
if not s.is_a?(String) or len(s) < 1 then
return [“1”,""]
if (m = /^[^\W_]+/.match(s) ) !== nil then return
[“a”, m[0]]
elsif (m = /^\s+/.match(s) ) != nil then return
[“s”, m[0]]
elsif (m = /^[[:cntrl:]]+/) != nil then return
[“c”, m[0]]
elsif (m = /^[[:punct:]_]+/) != nil then return
[“p”, m[0]]
elsif (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil
then return
[“r”, m[0]]
But this isn’t legal, because one can’t do an assignment within a logic
test (any expression?).
So what SHOULD I do? I can double the length of the code by pulling the
assignments out, and
interleaving tests and assignments, but that’s ugly (i.e., it makes the
code harder to read). Any better choices?

On Thu, Apr 18, 2013 at 7:40 PM, Charles H.
[email protected] wrote:

m[0]]
elsif (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil
then return [“r”,
m[0]]
But this isn’t legal, because one can’t do an assignment within a logic test
(any expression?).
So what SHOULD I do? I can double the length of the code by pulling the
assignments out, and
interleaving tests and assignments, but that’s ugly (i.e., it makes the code
harder to read). Any better choices?

Actually you can do assignments inside logical expressions. The
problems in the code above are related to nonexistent methods (len),
operator precedence(not and or) and a typo (!==). Try this:

if (!s.is_a?(String) || (s.length < 1))
return [“1”,“”]
end
if (m = /^[^\W_]+/.match(s) ) != nil then return [“a”, m[0]]
elsif (m = /^\s+/.match(s) ) != nil then return [“s”, m[0]]
elsif (m = /^[[:cntrl:]]+/) != nil then return [“c”, m[0]]
elsif (m = /^[[:punct:]_]+/) != nil then return [“p”, m[0]]
elsif (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil then return [“r”,
m[0]]
end

Changing the returns for puts to test outside a method:

1.9.2p290 :116 > s = " "
=> " "
1.9.2p290 :117 > if (!s.is_a?(String) || (s.length < 1))
1.9.2p290 :118?> puts [“1”,“”]
1.9.2p290 :119?> end
=> nil
1.9.2p290 :120 > if (m = /^[^\W_]+/.match(s) ) != nil then puts
[“a”, m[0]]
1.9.2p290 :121?> elsif (m = /^\s+/.match(s) ) != nil then
puts [“s”, m[0]]
1.9.2p290 :122?> elsif (m = /^[[:cntrl:]]+/) != nil then
puts [“c”, m[0]]
1.9.2p290 :123?> elsif (m = /^[[:punct:]_]+/) != nil then
puts [“p”, m[0]]
1.9.2p290 :124?> elsif (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil
then puts [“r”, m[0]]
1.9.2p290 :125?> end
s

It’s not very pretty still, but at least it works. I’m sure someone
will come up with a more elegant solution.

Jesus.

Hello,

there are two possible solutions coming to my mind:

  • Using the global $~, the last matched expression (equivalent to the
    return value of Regexp.last_match). This is not as good as the second
    idea imo, because you just get rid of the assignment, but your code is
    still as wet.
  • Since the pattern is “if match then return [something_changing,
    m[0]]”, you could put all your regular expressions as well as the
    changing part in a hash and then loop through it.

I think the second approach is better, since you avoid a lot of code
duplication and it gets more maintainable (because you just have to
modify the hash if you want to add/remove/modify one or more of your
patterns and changing strings).

It would look something like this:

hsh = { /^[^\W_]+/ => “1”, /^\s+/ )> “a” }
s = s.to_s
return [“1”, “”] unless s.size >= 1
hsh.each do |regex, text|
if regex =~ s
return [text, Regexp.last_match 0]
end
end

Hope this helps.

Regards,
Calvin

How about this?

def test(s)
return [‘1’,’’] if !s.is_a?(String) || s.empty?
case s
when /^([^\W_]+)/
[‘a’, $1]
when /^(\s+)/
[‘s’, $1]
when /^([[:cntrl:]]+)/
[‘c’, $1]
when /^([[:punct:]_]+)/
[‘p’, $1]
when /^([^\w\s[:cntrl:][:punct:]]+)/
[‘r’, $1]
end
end

Wouldn’t using the case statement be a better option than all the elsif
statements? Seems like this is the perfect situation to use case.

Wayne

On 04/18/2013 11:17 AM, Wayne B. wrote:

Wouldn’t using the case statement be a better option than all the
elsif statements? Seems like this is the perfect situation to use case.

Wayne

The case expression might be a bit cleaner, but that would appear to
coerce using the $~ pattern match value, which I don’t like at all. So
I think I’ll stick to cleaning up the typo, and changing len(s) to
s.length. Then I’ll merge the initial if onto the rest, so everything
is the same form.

I’m glad I hadn’t misremembered how the assignment should be written,
but the error messages sure indicated (to me) that I had. Now that the
problem is explained I’ve cleared it up a lot. (E.g., those != nil
things turned out to be superfluous.) The current version is:
if not s.is_a?(String) or s.length < 1 then
return [“1”,""]
if (m = /^[^\W_]+/.match(s) ) then return [“a”,
m[0]]
if (m = /^\s+/.match(s) ) then return [“s”,
m[0]]
if (m = /^[[:cntrl:]]+/) then return [“c”,
m[0]]
if (m = /^[[:punct:]_]+/) then return [“p”,
m[0]]
if (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil
then return [“r”,
m[0]]
which looks a lot better.

On 04/18/2013 11:37 AM, Joel P. wrote:

[‘c’, $1]
when /^([[:punct:]_]+)/
[‘p’, $1]
when /^([^\w\s[:cntrl:][:punct:]]+)/
[‘r’, $1]
end
end

Given that $1 means what I think it means from context, that would work,
but I dislike the global variables with pattern matching. I understand
that some people consider assignment within an if statement to be worse,
but I’m not one of them. Were I to use that form, I’d indent it
differently (maybe you normally do). Also, there would need to be a
default case to handle uncovered options (which just prints an error
message and returns [“2”, “”]. empty? instead of a length test is
reasonable, and I should think about adopting that change.

But really my problem was being mislead by error messages about what the
problem actually was.

On Thu, Apr 18, 2013 at 8:37 PM, Joel P. [email protected]
wrote:

[‘c’, $1]
when /^([[:punct:]_]+)/
[‘p’, $1]
when /^([^\w\s[:cntrl:][:punct:]]+)/
[‘r’, $1]
end
end

I prefer a more OO approach:

Tests = [
lambda {|s| (!(String === s) || s.empty?) and [‘1’, ‘’]},
lambda {|s| /^([^\W_]+)/ =~s and [‘a’, $1]},
lambda {|s| /^(\s+)/ =~s and [‘s’, $1]},
]

def test(s)
Tests.each do |t|
r = t[s] and return r
end

raise “No match!”
end

Kind regards

robert

Charles H. wrote in [#1106215]:

The case expression might be a bit cleaner, but that would appear to
coerce using the $~ pattern match value, which I don’t like at all

$~ is the same object as your ‘m’, so you can do your m[0]' thing with it, or call m.to_s’ on it, or whatever you like to explicitly coerce
it. I’ll put some examples below.

The current version is:
if not s.is_a?(String) or s.length < 1 then return [“1”,“”]
if (m = /^[^\W_]+/.match(s) ) then return [“a”, m[0]]
if (m = /^\s+/.match(s) ) then return [“s”, m[0]]
if (m = /^[[:cntrl:]]+/) then return [“c”, m[0]]
if (m = /^[[:punct:]_]+/) then return [“p”, m[0]]
if (m = /^[^\w\s[:cntrl:][:punct:]]+/) != nil
then return [“r”, m[0]]
which looks a lot better.

Sure, except that it’s syntactically invalid (missing all the 'end’s)
and the final 3 if-statements assign the regexp (not the match data) to
m. Hopefully this is just a copy-paste error; otherwise, decent unit
tests should catch it.

Also, there would need to be a default case to handle uncovered
options (which just prints an error message and returns [“2”, “”].

Your original (and final) code missed that part.

If you want to not assign m in the condition, you could use $~. Here’s
a straight copy of your final code (with some fixes):

if !s.is_a?(String) or s.length < 1 then return [“1”,“”]; end
if /^[^\W_]+/.match(s) then return [“a”, $~[0]]; end
if /^\s+/.match(s) then return [“s”, $~[0]]; end
if /^[[:cntrl:]]+/.match(s) then return [“c”, $~[0]]; end
if /^[[:punct:]_]+/.match(s) then return [“p”, $~[0]]; end
if /^[^\w\s[:cntrl:][:punct:]]+/.match(s)
then return [“r”, $~[0]]; end
puts “error message”
[“2”,“”]

Or with a case statement:

return [‘1’,‘’] unless s.is_a?(String) and !s.empty?
case s
when /^[^\W_]+/ ; [‘a’, $~[0]]
when /^\s+/ ; [‘s’, $~[0]]
when /^[[:cntrl:]]+/ ; [‘c’, $~[0]]
when /^[[:punct:]_]+/ ; [‘p’, $~[0]]
when /^[^\w\s[:cntrl:][:punct:]]+/
[‘r’, $~[0]]
else puts “error message”
[‘2’, ‘’]
end

Obviously you can arrange it however you like, I’ve just attempted to
match your layout style based on a guess.

Charles H. wrote in [#1106217]:

Given that $1 means what I think it means from context, that would work,
but I dislike the global variables with pattern matching.

The global assignment happens whether or not you like/use it.
Assertively not using the feature isn’t a very strong way to express
your displeasure with it, and it won’t just shrivel up and drop off lack
of use. :wink:

See also: Feature #8110: Regex methods not changing global variables - Ruby master - Ruby Issue Tracking System

But really my problem was being mislead by error messages about what the
problem actually was.

That is the main thing. As a tip, in future you should include the
error message in your post; possibly someone with more experience could
help you find the root cause more quickly that way.