Why no Regexp#to_regexp?

Detlef_R · May 13, 2014, 5:55pm

Given:

 p RUBY_VERSION    # => "2.1.1"

Then:

 p [].to_ary       # => []
 p ''.to_str       # => ""

But not:

 p //.to_regexp    # undefined method `to_regexp`...

Why is there no Regexp#to_regexp?

waynefchin · May 13, 2014, 9:44pm

The #to_foo methods exist primarily on other types of objects as a
signal
that the underlying data structures are essentially compatible, for
implicit type coercion purposes and related magic.

What other object is fundamentally like a regex, that would cause the
language to ever attempt to coerce objects using #to_regex[p]?

waynefchin · May 13, 2014, 10:43pm

On 05/13/2014 12:43 PM, Matthew K. wrote:

The #to_foo methods exist primarily on other types of objects as a
signal that the underlying data structures are essentially compatible,
for implicit type coercion purposes and related magic.

What other object is fundamentally like a regex, that would cause the
language to ever attempt to coerce objects using #to_regex[p]?

No object that is currently a part of Ruby or its standard library
defines #to_regexp. It would be some object created by the programmer.

Since Regexp does not define #to_regexp, then I need to do this in order
for my code to accept both a Regexp or an object that responds to
#to_regexp:

 if regexp = Regexp.try_convert(o)
   regexp = Regexp.try_convert(o)
   ...

What I’d rather do is this:

 if o.respond_to(:to_regexp)
   regexp = o.to_regexp
   ...

This is symmetrical with this:

 if o.respond_to(:to_ary)
   ary = o.to_ary
   ...

which works fine if o is an Array, since Array defines #to_ary; and with

 if o.respond_to(:to_str)
   str = o.to_str
   ...

which works fine if o is a String, since String defines #to_str.

waynefchin · May 13, 2014, 11:21pm

On Tue, May 13, 2014 at 1:42 PM, Wayne C. [email protected] wrote:

…
Since Regexp does not define #to_regexp, then I need to do this in order
for my code to accept both a Regexp or an object that responds to
#to_regexp:
if regexp = Regexp.try_convert(o)
  regexp = Regexp.try_convert(o)
  ...

You seem to have forgotten that you pretty much are not forced to do
anything at all within Ruby.

class Regexp
def to_regexp
self
end
end

That solves your problem pretty easily to me. If you are worried about
future issues if/when Ruby defines to_regexp itself - just wrap that
with
an unless statement that checks for to_regexp within the class.

John

waynefchin · May 13, 2014, 11:38pm

On 05/13/2014 02:21 PM, John W Higgins wrote:

You seem to have forgotten that you pretty much are not forced to do
anything at all within Ruby.

class Regexp
def to_regexp
self
end
end

Yes, I can define it myself. Why doesn’t Ruby define it? If there’s no
reason against it, then I want to try to make it exist and submit a
patch.

waynefchin · May 13, 2014, 11:44pm

On 14 May 2014 06:42, Wayne C. [email protected] wrote:

#to_regexp. It would be some object created by the programmer.

which works fine if o is an Array, since Array defines #to_ary; and with
if o.respond_to(:to_str)
  str = o.to_str
  ...
which works fine if o is a String, since String defines #to_str.

Well, why is there no Range#to_range,
Exception#to_exception, Class#to_class, etc?

Because there’s no demand. The day someone legitimately creates a class
that waddles and quacks just like a Regexp, in pretty much every way a
Regexp does, and there is an operation that expects only a Regexp*, then
there might be a case for #to_regexp. Until that day, YAGNI rules.

Conversely, there are lots of classes that waddle and quack like an
integer and lots of operations that expect simple numbers, to #to_int
makes sense; and human IO depends almost entirely on strings, to #to_str
makes sense; and hashes and procs are a core part of language calling
conventions, to #to_hash and #to_proc make sense. What does #to_regexp
buy
us?

* AFAIK most regexen are used in case-when/#=== comparisons, so
duck-typing is already trivial. The only other real use I could think of
would be overloading String#=~

waynefchin · May 13, 2014, 11:48pm

Sorry for replying to myself; I just realised your particular use-case.
It’s because Regexp.try_convert already uses #to_regexp. What you’re
saying
is, effectively, you want to duplicate .try_convert’s underlying logic,
without actually converting the object to a Regexp.

I don’t know if that’s a strong enough argument to convince Matz, but it
might work if you express it properly, and can justify why you want to
detect the regexp-compatibility without actually doing the conversion.

waynefchin · May 14, 2014, 2:07am

On May 13, 2014, at 14:38, Wayne C. [email protected] wrote:

Yes, I can define it myself. Why doesn’t Ruby define it? If there’s no reason
against it, then I want to try to make it exist and submit a patch.
Because there is no reason FOR it.

#to_ary exists for splat assignments and other “array’y” things.

#to_str exists for stringy things. It’s used in various places like
objects that can behave like paths and backtraces and the like.

#to_regex isn’t called anywhere. Where would it be called? For what?
When should something be cast to a regexp?

We don’t add methods to core just for symmetry’s sake. There needs to be
a purpose. I haven’t seen you propose a purpose for it yet other than to
make it exist.

waynefchin · May 14, 2014, 12:13am

Argh, I’m being Mr Spammypants today, sorry everyone.

On 14 May 2014 06:42, Wayne C. [email protected] wrote:

On 05/13/2014 12:43 PM, Matthew K. wrote:

...

Hopefully you mean:

if regexp = Regex.try_convert(o)

or:

if tmp = Regex.try_convert(o)
regexp = tmp

What I’d rather do is this:

if o.respond_to(:to_regexp)
  regexp = o.to_regexp
  ...

My personal feeling here is that Regexp actually has it more right.
Why
defensively test for the method when you can just invoke it outright?
Especially since, in your example, you’re just going to invoke it
straight
away anyway.

The other important thing is that I could define:

class String
def to_regexp
Regexp.new(self) rescue nil
end
end

Now I can use Regexp.try_convert("foo+ba[rz]") and
Regexp.try_convert("[invalid re") without fear of random nils or exceptions, because we already
expect
Regexp.try_convert to return nils.

This contract (being able to return nil when it doesn’t work) can’t be
used
for the other to_foo methods, because of the pattern of assuming that,
since to_foo exists, it can always be called safely.

I’d rather argue for adding .try_convert to the other classes,
especially
those for which a Kernel.Integer()-type method already exists.

waynefchin · May 14, 2014, 4:29am

On May 13, 2014, at 18:55, Jon A. Lambert [email protected] wrote:

Mathew Kerwin wrote:

What other object is fundamentally like a regex, that would cause the language
to ever attempt to coerce objects using #to_regex[p]?
String

Why? Regexp defines =~ and String defines =~. What more do you need?

“blah” =~ /a/
=> 2
/a/ =~ “blah”
=> 2

If you defined #to_regex, what would it mean? When would it get invoked?
When two strings are used?:

“blah” =~ “a”
TypeError: type mismatch: String given
from (irb):1:in =~' from (irb):1 from /usr/bin/irb:12:in ’
“a” =~ “blah”
TypeError: type mismatch: String given
from (irb):4:in =~' from (irb):4 from /usr/bin/irb:12:in ’

Which one is the pattern? How can you know?

waynefchin · May 14, 2014, 3:55am

Mathew Kerwin wrote:

What other object is fundamentally like a regex, that would cause the language to
ever attempt to coerce objects using #to_regex[p]?

String

waynefchin · May 14, 2014, 1:45pm

On 05/13/2014 05:06 PM, Ryan D. wrote:

#to_regex isn’t called anywhere. Where would it be called? For what?
When should something be cast to a regexp?

Actually, it is called. From re.c:

 VALUE
 rb_check_regexp_type(VALUE re)
 {
     return rb_check_convert_type(re, T_REGEXP, "Regexp",

“to_regexp”);
}

 /*
  *  call-seq:
  *     Regexp.try_convert(obj) -> re or nil
  *
  *  Try to convert <i>obj</i> into a Regexp, using to_regexp

method.
* Returns converted regexp or nil if obj cannot be
converted
* for any reason.
*
* Regexp.try_convert(/re/) #=> /re/
* Regexp.try_convert(“re”) #=> nil
*
* o = Object.new
* Regexp.try_convert(o) #=> nil
* def o.to_regexp() /foo/ end
* Regexp.try_convert(o) #=> /foo/
*
*/
static VALUE
rb_reg_s_try_convert(VALUE dummy, VALUE re)
{
return rb_check_regexp_type(re);
}

waynefchin · May 14, 2014, 2:13pm

On 05/13/2014 02:48 PM, Matthew K. wrote:

Sorry for replying to myself; I just realised your particular
use-case. It’s because Regexp.try_convert already uses #to_regexp.
What you’re saying is, effectively, you want to duplicate
.try_convert’s underlying logic, without actually converting the
object to a Regexp.

Yes, exactly. Here’s the code that is driving this:

 def filter_proc(filter)
   if filter.respond_to?(:to_a)
     filter_procs = filter.to_a.map(&method(:filter_proc))
     ->(word) {
       filter_procs.any? { |p| p.call(word) }
     }
   elsif filter.respond_to?(:to_str)
     words = filter.to_str.split.map { |word| word.downcase }
     ->(word) { words.include?(word.downcase) }
   elsif regexp = Regexp.try_convert(filter)
     ->(word) { word =~ regexp }
   elsif filter.respond_to?(:to_proc)
     filter.to_proc
   else
     raise ArgumentError, "Incorrect filter type"
   end
 end

Having to use Regexp.try_convert just seemed… asymmetric in this
context. That’s what started me wondering why not Regexp#to_regexp.

I think you may be right that the .try_convert methods have “got it
right.” This code could indeed use String.try_convert. It cannot use
Array.try_convert, since that method relies upon to_ary rather than
to_a. This code wants to accept an Enumerable as through it were an
Array, and Enumerable quite sensibly defines #to_a rather than #to_ary.

waynefchin · May 15, 2014, 1:05am

On May 14, 2014, at 4:45, Wayne C. [email protected] wrote:

*
}
I stand corrected. In all my years of using ruby, I’ve never seen this
(nor needed it, obviously).