Forum: Ruby String#to_rx ?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
669b7046f02e5dfc4bda4421f1069731?d=identicon&s=25 alex (Guest)
on 2005-11-23 00:49
(Received via mailing list)
Possible RCR: would anyone else find this a useful addition to the core

class String
  def to_rx
    Regexp.new( Regexp.escape(self) )
  end
end

as a more straightforward & readable alternative to interpolation:

/#{Regexp.escape(a_string)}/

I turned up a few references to this sort of thing in ruby code on the
web.

alex
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-23 03:06
(Received via mailing list)
Yes, I agree. In Facets, its more like:

  class String
    def to_re( esc=true )
      Regexp.new( esc ? Regexp.escape(self) : self )
    end
  end

T.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 halostatue (Guest)
on 2005-11-23 15:25
(Received via mailing list)
I disagree that either #to_re or #to_rx would be a good name for this
construct. I think that it would be better to have a method on Regexp
itself (a new, alternative constructor) that does this. Maybe:

Regexp.compile_escaped(str)

I don't think I've actually ever seen Regexp.compile used, so maybe it
can be repurposed in 1.9 to do this.

FWIW, I don't tend to use the construct that Alex did -- I tend to
either anchor my strings or insert them in the middle of a larger
regexp, which is why I don't particularly think that this is a method
that belongs on String.

If it's to be on String, though, it should probably be on a few others
as well (Fixnum) and it should be explicit: #to_regexp.

-austin
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-23 18:23
(Received via mailing list)
Austin Ziegler wrote:
> I disagree that either #to_re or #to_rx would be a good name for this
> construct. I think that it would be better to have a method on Regexp
> itself (a new, alternative constructor) that does this. Maybe:
>
> Regexp.compile_escaped(str)

The problem here is largely one of brevity. Who wants to type all that
when #to_re will do?

> I don't think I've actually ever seen Regexp.compile used, so maybe it
> can be repurposed in 1.9 to do this.
>
> FWIW, I don't tend to use the construct that Alex did -- I tend to
> either anchor my strings or insert them in the middle of a larger
> regexp, which is why I don't particularly think that this is a method
> that belongs on String.

> If it's to be on String, though, it should probably be on a few others
> as well (Fixnum) and it should be explicit: #to_regexp.

#to_regexp would imply that the object was some type of Regexp already.
#to_re or #to_rx clearly indicate it is a conversion. It's a pretty
innocent method and I don't think it is too much trouble to have even
if it's not the most useful method in the world.

I think even more usful though would just be a method on String that
does the Regexp escaping. I had been using my own small Kernel method
#resc(str) for this, but now I see it would be much more useful as a
String method:

  /^#{foo.resc}/ =~ bar

All things being the same, I'll put that in the next verison of Facets.

T.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 halostatue (Guest)
on 2005-11-23 18:39
(Received via mailing list)
On 11/23/05, Trans <transfire@gmail.com> wrote:
> Austin Ziegler wrote:
> > I disagree that either #to_re or #to_rx would be a good name for this
> > construct. I think that it would be better to have a method on Regexp
> > itself (a new, alternative constructor) that does this. Maybe:
> >
> > Regexp.compile_escaped(str)
>
> The problem here is largely one of brevity. Who wants to type all that
> when #to_re will do?

Because neither #to_re nor #to_rx are expressive enough. As such, they
don't belong in the core.

> I think even more usful though would just be a method on String that
> does the Regexp escaping. I had been using my own small Kernel method
> #resc(str) for this, but now I see it would be much more useful as a
> String method:
>
>   /^#{foo.resc}/ =~ bar
>
> All things being the same, I'll put that in the next verison of Facets.

Another name without expressiveness. #escape_regexp would be better.
But that doesn't suit your apparent need for brevity. Maybe #regesc,
but I would still oppose its inclusion in the core, so it really does
probably belong in Facets, where I don't have to even care that it
exists.

-austin
669b7046f02e5dfc4bda4421f1069731?d=identicon&s=25 alex (Guest)
on 2005-11-23 19:19
(Received via mailing list)
Hi Austin

Austin Ziegler wrote:
> I disagree that either #to_re or #to_rx would be a good name for this
> construct.

I don't feel strongly about the name, though compare #to_f, #to_i, and
#to_s already in the core - I don't see those as obviously more
'expressive' than to_rx, simply more familiar.

After all, #to_s *could* mean to 'to_symbol', and #to_f *could* mean
'to_file', if one were being perverse.

> I don't think I've actually ever seen Regexp.compile used, so maybe it
> can be repurposed in 1.9 to do this.

Yes, the special constrctor always makes me think it should do something
"special" - perhaps like the /o modifier in perlre. However, I'd prefer
an instance method in String to a new or repurposed class method in
Regexp.

> FWIW, I don't tend to use the construct that Alex did -- I tend to
> either anchor my strings or insert them in the middle of a larger
> regexp, which is why I don't particularly think that this is a method
> that belongs on String.

I also commonly use them anchored within a larger regexp, but it would
still be nicer to be able to write

/before #{str.to_rx} after/

Than to have to wedge a long call to a class function in an interpolated
section.

I suppose the point I'm making is that Strings have a 'natural' affinity
to or representation as Regexps - viz their mutually substitutable uses
in #split, #sub and friends, and so it would be nice to make conversion
between the two less unwieldy and verbose.

Regexp seems to me a 'major' core class, with its own literal syntax (as
Float, Integer, String, Symbol) etc. I wouldn't like to make anyone
write "#{an_integer}" to do the work of Integer#to_s, unless they really
wanted to.

I agree there is some ambiguity about the semantics re anchoring -
should #to_rx mean

/#{Regexp.escape(a_string)/
or
/\A#{Regexp.escape(a_string)\z/

The strongest argument for the former is that the latter doesn't do
anything useful that #== doesn't already do.

> If it's to be on String, though, it should probably be on a few others
> as well (Fixnum) and it should be explicit: #to_regexp.

Perhaps, yes. It's not something I've ever yearned for personally.

cheers
alex
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-23 19:19
(Received via mailing list)
> Maybe #regesc

#regesc is better, thanks.

> but I would still oppose its inclusion in the core, so it really does
> probably belong in Facets, where I don't have to even care that it
> exists.

Uhuh, like you haven't scoured through its source for what suits you
;-p

T.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 halostatue (Guest)
on 2005-11-23 19:28
(Received via mailing list)
On 11/23/05, Trans <transfire@gmail.com> wrote:
> > Maybe #regesc
> #regesc is better, thanks.

> > but I would still oppose its inclusion in the core, so it really does
> > probably belong in Facets, where I don't have to even care that it
> > exists.
> Uhuh, like you haven't scoured through its source for what suits you
> ;-p

I've looked through the docs. There's so much there that I can get
elsewhere... ;)

-austin
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-24 14:29
(Received via mailing list)
Alex Fenton wrote:

>
> /#{Regexp.escape(a_string)}/
>
> I turned up a few references to this sort of thing in ruby code on the
> web.

Please, everyone read http://www.perl.com/pub/a/2002/06/04/apo5.html and
let's use that as a base for regexes in Ruby 2.  Trying to put a string
literal in a regex shouldn't be hard and Ruby should just "do the right
thing" for you.  I think that regular expressions have been grossly
misused in a lot of places and a lot of ways in the applications of
computer science.  I realize that we can't go back, but to be able to go
forward, perhaps we need to stop and look around - more and more
metacharacters, funky escapes, and so on isn't sustainable in the long
run.  Apocalypse 5 is a good base and I think that a lot of nice things
can come out of it (well, we're here, three years later, and still
nothing, but bear with me).

I also want a way to feed data to a regular expression for christmas, so
if you're into the whole thing of "the joy of giving", then I'm
certainly into the whole thing of "the joy of receiving".

        nikolai
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-26 01:28
(Received via mailing list)
I believe the Facets project already contains a method like this for
String
objects.

facets.rubyforge.org

j.

On 11/24/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org>
wrote:
> > end
> literal in a regex shouldn't be hard and Ruby should just "do the right
> if you're into the whole thing of "the joy of giving", then I'm
> certainly into the whole thing of "the joy of receiving".
>
>         nikolai
>
> --
> Nikolai Weibull: now available free of charge at http://bitwi.se/!
> Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
> main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-26 09:47
(Received via mailing list)
Jeff Wood wrote:

[me discussing the merits of a better regex syntax over having #to_*
methods]

> I believe the Facets project already contains a method like this for String
> objects.

What it may contains is a #to_re method.  It doesn't include anything
relating to my message.

        nikolai
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-26 23:33
(Received via mailing list)
nikolai,

Could you give us a summary of how it would apply. I really don't have
time to weed through all that material.

Thanks,
T.
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-27 00:50
(Received via mailing list)
Trans wrote:

> Could you give us a summary of how it would apply. I really don't have
> time to weed through all that material.

There are two possible solutions as I see it that make more sense than
#to_rx: a) strings are interpreted as just that - strings of symbols
that you want to interpret literally and b) add syntax to allow you to
easily embed strings inside a regular expression.

In Perl 6, the suggestion is to interpret $string as a string and
<$string> as a regular expression.  In Ruby that'd be #{string} and
<#{string}>, I suppose.  (#{regex} would still mean what it means today
in both cases.)

Another thing that'd be nice to have is a way to insert literal strings
directly, e.g., /<'common regex operators include *, ?, and .'>/, where
<'...'> is syntax for a literal string.  A way to embedd a string
variable would then perhaps be /<'#{var}'>/, but I think that the
solution in the previous paragraph may make more sense.

I'm probably not making myself clear enough, but if what I'm saying
seems interesting I'd suggest you at least read "Synopsis 5" [1].

        nikolai

[1] http://www.perl.com/pub/a/2002/06/26/synopsis5.html
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 01:06
(Received via mailing list)
How does this differ from embedding variables in regular expressions now
with

a = "hello world"
b = "hello"
c = /#{ b }/
c.match( a ).to_a
#=> ["hello"]

Let me know if I'm missing something...

j.

On 11/26/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:
> In Perl 6, the suggestion is to interpret $string as a string and
> I'm probably not making myself clear enough, but if what I'm saying
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-27 01:18
(Received via mailing list)
Thanks,  I glanced over the synopisis...and whoa! that's a lot of
changes --basically remaking regular expressions. Looks like they're
good changes mostly, but still, that's a major shift.

As for the interpolation itself, I totally agree. It would be better to
have some standard construct. Your proposal cooresponding to Perl 6
seems reasonable to me.

T.
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-27 01:22
(Received via mailing list)
Jeff,

  a = "hello world"
  b = "w.*"
  c = /#{ b }/
  c.match( a ).to_a
  #=> ["world"]

Characters arn't escaped.

T.
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 01:54
(Received via mailing list)
I didn't want them to be ... I wanted the body of the string to be
passed in literally ...

Yes, I understand if I wanted otherwise I would have to do just a
touch more work...

c = /#{ Regexp.escape( b ) }/

but, it's all literal, and doesn't surprise me... as it shouldn't anyone
else.

/#{ b }/ where b = "w.*" should be /w.*/

... guess I'm not understanding the original point ... somebody
wanting an additional wrapper for strings that auto escapes them ?

why not write one %e( ) or something like that.

...

j.


On 11/26/05, Trans <transfire@gmail.com> wrote:
> T.
>
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 02:03
(Received via mailing list)
Although I am surprised there isn't a String#escape ( or maybe #escaped
) method

/#{ b.escaped }/

... or something like that. would be literal, and make sense ...

j.

On 11/26/05, Jeff Wood <jeff.darklight@gmail.com> wrote:
> /#{ b }/ where b = "w.*" should be /w.*/
>
> >
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-27 11:07
(Received via mailing list)
Jeff Wood wrote:

> Although I am surprised there isn't a String#escape ( or maybe
> #escaped ) method
>
> /#{ b.escaped }/
>
> ... or something like that. would be literal, and make sense ...

Did you even read this thread?  That's what was being proposed, see the
subject.  The point is that that's not the right way to solve this
problem.

        nikolai
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 12:36
(Received via mailing list)
Yes I did read the original thread.

And although from time to time I have been "thumbs-up" for adding
punctuation soup into Ruby, I'm learning and growing into the fact that
it's
not the ruby way.

I was simply trying to suggest something that felt more ruby to me.

adding <blah> syntax ( or anything directly from Perl ) just continues
to
feed where most of the complaints I've ever heard about ruby ... It
doesn't
need to feel any more perlish than it already does in some places.

The only opinion that matters to me is Matz. If he likes it, I'm sure
I'll
get used to it, otherwise, bleck, I'll pass.

j.

On 11/27/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org>
wrote:
> Did you even read this thread?  That's what was being proposed, see the
>
--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-27 13:25
(Received via mailing list)
Jeff Wood wrote:

> Yes I did read the original thread.

So in what way is String#escaped better than String#to_rx?

> And although from time to time I have been "thumbs-up" for adding
> punctuation soup into Ruby, I'm learning and growing into the fact
> that it's not the ruby way.
>
> I was simply trying to suggest something that felt more ruby to me.
>
> adding <blah> syntax ( or anything directly from Perl ) just continues
> to feed where most of the complaints I've ever heard about ruby ... It
> doesn't need to feel any more perlish than it already does in some
> places.

You obviously haven't read "Apocalypse 5" or "Synopsis 5".  The whole
point of that syntax is to be able to clean up the regex syntax, i.e.,
to thin down the "punctuation soup".  Did you know that many features of
Ruby have been heavily influenced by, and even lifted directly from,
Perl?  The syntax for regular expressions, for example (with some minor
incompatibilities).  So I don't really see where your "we shouldn't use
perlish stuff" argument is coming from; especially seeing as how the
suggestions in the mentioned articles intend to make Perl's regular
expressions less "perlish".

> The only opinion that matters to me is Matz. If he likes it, I'm sure
> I'll get used to it, otherwise, bleck, I'll pass.

How very individualistic of you.

        nikolai
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 14:13
(Received via mailing list)
On 11/27/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org>
wrote:
>
> Jeff Wood wrote:
>
> > Yes I did read the original thread.
>
> So in what way is String#escaped better than String#to_rx?


When I read #to_rx I can only guess at the specifics of the call (
something
to do with regular expressions ... or perscriptions ) ... but what it's
doing to the string, no idea...  when I read #escaped all the lights
light
up ... ah, you want an escaped version of the string.  There's a bit too
much implicit in #to_rx

>
> You obviously haven't read "Apocalypse 5" or "Synopsis 5".  The whole
> point of that syntax is to be able to clean up the regex syntax, i.e.,
> to thin down the "punctuation soup".  Did you know that many features of
> Ruby have been heavily influenced by, and even lifted directly from,
> Perl?  The syntax for regular expressions, for example (with some minor
> incompatibilities).  So I don't really see where your "we shouldn't use
> perlish stuff" argument is coming from; especially seeing as how the
> suggestions in the mentioned articles intend to make Perl's regular
> expressions less "perlish".


Nope, I haven't. And I surely don't plan to.  And, I don't need a
history
lesson from you, thanks anyways. I've been using ruby for a few years
now
and am quite well aware of it's origins and some of the decisions that
were
made early on ( as well as the features that were implemented to help
Perl
users move away from Perl to Ruby ).

If you want Larry Walls "less perlish" perl ... by all means, perl6
should
be ready in about 5 years.

> The only opinion that matters to me is Matz. If he likes it, I'm sure
> > I'll get used to it, otherwise, bleck, I'll pass.
>
> How very individualistic of you.


Actually, it has NOTHING to do with individualism, even if I don't like
the
choices, Matz is still the one that makes 'em.  I have my opinions about
things, sometimes they match, sometimes they don't.

... If you'd like to continue this discussion, I suggest we move it off
the
list, it's caused enough noise for everybody else.

j.

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
2d532341317628fbb2cb22ec427a1d62?d=identicon&s=25 langstefan (Guest)
on 2005-11-27 14:33
(Received via mailing list)
On Sunday 27 November 2005 14:13, Jeff Wood wrote:
> > So in what way is String#escaped better than String#to_rx?
>
> When I read #to_rx I can only guess at the specifics of the call (
> something to do with regular expressions ... or perscriptions ) ...
> but what it's doing to the string, no idea...  when I read #escaped
> all the lights light up ... ah, you want an escaped version of the
> string. ...

But escaped for what purpose? For usage as XML attribute value?
Or does it escape special shell characters? Or special regex
characters? Or special characters for File.fnmatch?

A string can be escaped for many purposes. I'm sure others can
come up with more examples. IMO adding a method to String is a
bad choice. Either new syntax, or use the existing Regexp.escape,
which is a clean and readable solution.

Regards,
  Stefan
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-27 15:22
(Received via mailing list)
Jeff Wood wrote:

> On 11/27/05, Nikolai Weibull
> <mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:

> > So in what way is String#escaped better than String#to_rx?

> When I read #to_rx I can only guess at the specifics of the call (
> something to do with regular expressions ... or perscriptions ) ...
> but what it's doing to the string, no idea...  when I read #escaped
> all the lights light up ... ah, you want an escaped version of the
> string.  There's a bit too much implicit in #to_rx

As Stefan Lang already pointed out, that's totally non-sensical.

> > You obviously haven't read "Apocalypse 5" or "Synopsis 5".

> Nope, I haven't. And I surely don't plan to.

So why do you feel that you can comment on the syntax?

> If you want Larry Walls "less perlish" perl ... by all means, perl6
> should be ready in about 5 years.

The regular-expression syntax planned for Perl 6 has very little to do
with Perl 6 itself.  I have no plans on waiting for or moving to Perl 6
once it is ready.  That's also the reason why I really want the planned
syntax in Ruby, so that there wouldn't be a compelling reason to do so.

I'm not saying that "Apocalypse 5" is gospel, but there are some good
ideas in it, and I think that now that we have the chance (with the
imminent release of 2.0) we should review all parts of the language and
make sure that they're all up to code.

        nikolai
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 15:22
(Received via mailing list)
Quite true. And comparatively, Regexp.escape does make more sense.

j.

On 11/27/05, Stefan Lang <langstefan@gmx.at> wrote:
> But escaped for what purpose? For usage as XML attribute value?
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-27 17:56
(Received via mailing list)
I do agree that I should have reviewed the document before posting a
response.  I have done that, and still hold to what I've stated.

The Perl syntax he proposed is a hideous mess of line noise ( no
surprises
).

I don't believe that the suggested functionality can be added without
slowing regular expression matching.

But there IS one idea in the document I like and would require NO
changes to
regexp syntax.

# define a normal regex with 1 group
a = /(\d+)/

# add a rule for group 1s matching
a.rules << Rexexp::Rule.new( :group, 1 ) { |val| val > 20 }

# now play.
b = "200 19 14 21 1"
b.scan a
#=>[ "200","21" ]

... at least that's how I'd do it ...  Beyond that, I still feel trying
to
get anything more than "ideas" for features from Perl is NOT a good path
for
ruby.

Anyways, I do see value in what you brought to the table ... but maybe
next
time explain the feature for it's merits not "hey look what perl is
doing!"
...
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-28 01:17
(Received via mailing list)
Jeff Wood wrote:

> I do agree that I should have reviewed the document before posting a
> response.  I have done that, and still hold to what I've stated.
>
> The Perl syntax he proposed is a hideous mess of line noise ( no
> surprises).

So you think /(?:abc)/ is easier to read than /[abc]/?  How about
/\w+\s*=\s*\w+/ versus /<word> = <word>/w (or /<word><ws>=<ws><word>/
without 'w' modifier and assuming word = /\w+/)?

> I don't believe that the suggested functionality can be added without
> slowing regular expression matching.

What functionality are you talking about?  The obvious one that slows
down matching is the one that executes arbitrary Perl (Ruby) statments,
but you're certainly not forced to use that functionality if you don't
want to, which brings us to...

> b = "200 19 14 21 1"
> b.scan a
> #=>[ "200","21" ]
>
> ... at least that's how I'd do it ...  Beyond that, I still feel
> trying to get anything more than "ideas" for features from Perl is NOT
> a good path for ruby.

How is that easier to understand than

b = "200 19 14 21 1"
b.scan /(\d+){ |val| val > 2 }/
# => ["200", "21"]

?  With this syntax, it's pretty obvious what's going on, at least I
think so.  "Match a sequence of one or more digits and call the given
block to see if we should actually accept it as a valid match."

Furthermore, everything could be added _without_ changing the regex
syntax, but that's not the point.  The point is to make the regex syntax
as expressive as possible, just like Ruby's syntax is as expressive as
possible (or at least tries to be).

> Anyways, I do see value in what you brought to the table ... but maybe
> next time explain the feature for it's merits not "hey look what perl
> is doing!" ...

I didn't even mention Perl in my original posting.  I mentioned an
article about a new regular expression syntax, which happened to be for
Perl 6, but I didn't say "Hey, look what almighty Larry Wall has said we
must do."

        nikolai
A7c9c275318af9e1e3812fab9660cd7c?d=identicon&s=25 jeff.darklight (Guest)
on 2005-11-28 06:27
(Received via mailing list)
No further response. There isn't a point. I've stated my opinion, you've
stated yours. We aren't the only people in the world, and this is
generating
far too much noise.

Let it be.  You've brought it up, and I'm sure other people will chip in
with their $0.02USD

j.

On 11/27/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org>
wrote:
> /\w+\s*=\s*\w+/ versus /<word> = <word>/w (or /<word><ws>=<ws><word>/
> > But there IS one idea in the document I like and would require NO
> > b.scan a
> # => ["200", "21"]
> > Anyways, I do see value in what you brought to the table ... but maybe
> --
> Nikolai Weibull: now available free of charge at http://bitwi.se/!
> Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
> main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 w_a_x_man (Guest)
on 2005-11-28 10:25
(Received via mailing list)
Nikolai Weibull wrote:

> b = "200 19 14 21 1"
> b.scan /(\d+){ |val| val > 2 }/
> # => ["200", "21"]

irb(main):020:0> b.scan /(\d+){ |val| val > 2 }/
(irb):20: warning: regexp has invalid interval
(irb):20: warning: regexp has `}' without escape


irb(main):019:0> b.scan( /\d+/).select{|n| n.to_i > 20}
=> ["200", "21"]
45196398e9685000d195ec626d477f0e?d=identicon&s=25 transfire (Guest)
on 2005-11-28 14:00
(Received via mailing list)
So I'll put in the (likely) last 2 cents. I agree with Nikolai if that
there is a Regexp proper way to interpolate a string, that should be
used, and I'm all for anything that makes Regexp less cryptic and more
usable (the "perlishness" of Regexp is just the nature of them, that
won't change, but as Apocolypse shows it can be improved) Nonetheless,
that's all up to Matz and a furture version of Ruby and really no use
to us at the moment, so in that regard I have to agree with Jeff and
that's why I am making these methods a part of Facets:

  String#to_re - Convert string to Regexp escaping.
  String#to_rx - Convert string to Regexp non-escaping.
  String#regesc - Regexp escape

T.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 halostatue (Guest)
on 2005-11-28 18:27
(Received via mailing list)
On 11/27/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:
> Jeff Wood wrote:
> > The Perl syntax he proposed is a hideous mess of line noise ( no
> > surprises).
> So you think /(?:abc)/ is easier to read than /[abc]/?  How about
> /\w+\s*=\s*\w+/ versus /<word> = <word>/w (or /<word><ws>=<ws><word>/
> without 'w' modifier and assuming word = /\w+/)?

No, but I don't think that they're regular expressions at that point.

I wouldn't be opposed to a parser-literal type being included in Ruby
if done right (because that's *really* what Larry proposed), but I
would be opposed to rewriting Ruby regex into the Perl6 format.

-austin
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-28 19:12
(Received via mailing list)
Austin Ziegler wrote:

> On 11/27/05, Nikolai Weibull
> <mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:

> > Jeff Wood wrote:

> > > The Perl syntax he proposed is a hideous mess of line noise ( no
> > > surprises).

> > So you think /(?:abc)/ is easier to read than /[abc]/?  How about
> > /\w+\s*=\s*\w+/ versus /<word> = <word>/w (or
> > /<word><ws>=<ws><word>/ without 'w' modifier and assuming word =
> > /\w+/)?

> No, but I don't think that they're regular expressions at that point.

No, but what we call regular expressions aren't in any way regular
either.  Actually, it depends on how this is actually done.  In Perl 6,
<word> will call the regex in the current grammar called 'word', so
that's certainly not regular.  If <word> is simply substituted by
whatever 'word' is defined to in the current grammar, then that doesn't
change anything.

> I wouldn't be opposed to a parser-literal type being included in Ruby
> if done right (because that's *really* what Larry proposed),

Yes.

> but I would be opposed to rewriting Ruby regex into the Perl6 format.

Why?

(In Perl 6 (vaporware, I know), the Perl 5 syntax will still be
available.)

        nikolai
C35ede9febe86d05f8b45c67191de495?d=identicon&s=25 eric.mahurin (Guest)
on 2005-11-28 21:09
(Received via mailing list)
On 11/28/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:
> that's certainly not regular.  If <word> is simply substituted by
> whatever 'word' is defined to in the current grammar, then that doesn't
> change anything.

I don't think trying to push full parser capabilities into regex's is
the right way to go.  Instead, I think we should have a standard OO
mechanism in ruby to build grammars - and easily put actions in.  So,
in your example above, I think we should be able to do something like
this:

# we previously defined word, ws, and equal
assign = word + ws + equal + ws + word

This is exactly how my grammar package works.  It is LL, but this same
way of defining grammars could be used LR/LALR parsing or
DFA/NFA/backtracking of regular expressions.  Instead of specifing
your entire parser in a very complex string (the regular expression),
you simply create and combine grammars using methods to make your
parser (the most basic methods would be for creating an element,
sequencing (+), alternation (|), and recursion).  You could easily
associate actions (i.e. ruby blocks) with certain elements with what
you are trying to parse.  And since this is OO (and duck-typed), we
wouldn't be limited to just parsing characters like regex's - we could
parse tokens or whatever.
D1c54205b0ba8f40cbb774c6bc557376?d=identicon&s=25 mailing-lists.ruby-talk (Guest)
on 2005-11-28 22:42
(Received via mailing list)
Eric Mahurin wrote:

> I don't think trying to push full parser capabilities into regex's is
> the right way to go.  Instead, I think we should have a standard OO
> mechanism in ruby to build grammars - and easily put actions in.  So,
> in your example above, I think we should be able to do something like
> this:
>
> # we previously defined word, ws, and equal
> assign = word + ws + equal + ws + word

That's a valid point.

Still, my original intent was to point out that the regex syntax may
benefit from an overhaul.  Fine, forget about grammars and such, but as
an example for a possible improvement is that it should be just as easy
to embed a literal string as to embed another regex.

> parse tokens or whatever.
Well, Perl 6 doesn't define the parser as a very complex string.  It's a
set of regexes - a grammar.  But I'm not saying that that's the best way
to do it.  I like your grammar package and once namespaces are
implemented in Ruby I think that it can be made even sweeter (it's a
little too verbose at the moment).

        nikolai
C35ede9febe86d05f8b45c67191de495?d=identicon&s=25 eric.mahurin (Guest)
on 2005-11-29 01:20
(Received via mailing list)
On 11/28/05, Nikolai Weibull
<mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:
> Well, Perl 6 doesn't define the parser as a very complex string.  It's a
> set of regexes - a grammar.

Yep, your right.  Even in perl5, you do it with a set of regexes -
where you can do regex recursion with (??{ $re }).  This is one or two
more languages you have to learn on top of perl - the grammar/regex
language.  And then trying to integrate actions into this becomes a
mess.  I think my approach (no new grammar language - just classes,
objects, and methods) is a more flexible.

> But I'm not saying that that's the best way
> to do it.  I like your grammar package and once namespaces are
> implemented in Ruby I think that it can be made even sweeter (it's a
> little too verbose at the moment).

Thanks.  There are a few things I'm doing for my next release that
will help the verbosity:

- the recommended way to make a parser/lexer will be to sub-class
Grammar (you can do this in the current release - examples in the next
release will do it):

  class Expression < Grammar
    def initialize
       # we don't have to use the Grammar:: prefix all over the place
now
       super(expr)
    end
  end
  g = Expression.new
  g.scan(...)

- make Grammar subclasses (in the package) effectively callable
(calling new).  To do this, I just defined class and instance methods
with the same name as the classes that called klass.new.  Before, you
would do something like this Grammar::Code.new {...}, and now (with
the above sub-classing Grammar for parsers), you'd say Code {...}.  I
would have had real callable objects like Python, but this seems like
a good enough hack.
This topic is locked and can not be replied to.