Forum: Ruby No way of looking for a regrexp match starting from a partic

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
47df9cfb356c3ee0523cc3571b169730?d=identicon&s=25 Kenneth McDonald (Guest)
on 2007-06-03 06:00
(Received via mailing list)
I'm probably just missing something obvious, but I haven't found a way
to match a regular expression against only part of a string, in
particular only past a certain point of a string, as a way of finding
successive matches. Of course, one could do a match against a string,
take the substring past that match and do a match against the substring,
and so on, to find all of the matches for the string, but that could be
very expensive for very large strings.

I'm aware of the String.scan method, but that doesn't work for me
because it doesn't return MatchData instances.

What I want is just something like regexp.match(string, n), where the
regexp starts looking for a match at or after position n in the string.


Thanks,
Ken
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (Guest)
on 2007-06-03 06:47
(Received via mailing list)
Hi,

At Sun, 3 Jun 2007 12:59:24 +0900,
Kenneth McDonald wrote in [ruby-talk:254054]:
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.

string.index(regexp, n)
2f4d4f9c35ea851bffb9a9cc2e086365?d=identicon&s=25 Harry Kakueki (Guest)
on 2007-06-03 06:49
(Received via mailing list)
On 6/3/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
>
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
>
>
> Thanks,
> Ken
>
>

You could match the string but ignore the first part of the match.

str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /(abc.)/
p $1 # abcd
str =~ /a.*ju(abc.)/
p $1 #abcf

Harry

--

A Look into Japanese Ruby List in English
http://www.kakueki.com/
2c51fec8183a5d21c4e11b430beabb47?d=identicon&s=25 Patrick Hurley (Guest)
on 2007-06-03 06:50
(Received via mailing list)
On 6/2/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
>
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
>
>
> Thanks,
> Ken
>
>

I don't know of anything obvious, but I would probably do something a
little more like:


class String
  def match_each(exp)
    str = self
    while md = str.match(exp)
      yield md
      str = md.post_match
    end
  end
end

foo = "foo bar foo bar foo"
foo.match_each /[oa][or]/ do |md|
  puts "Found: #{md}"
end

# pth
2c51fec8183a5d21c4e11b430beabb47?d=identicon&s=25 Patrick Hurley (Guest)
on 2007-06-03 06:57
(Received via mailing list)
On 6/3/07, Nobuyoshi Nakada <nobu@ruby-lang.org> wrote:
> Nobu Nakada
>
>

I think he wanted MatchData objects. The String#index method returns
the index (numeric position of the match). But if all you want are
captures, then index is a good solution.

pth
1cc072ab8daecee4dc8bca69fc5d574c?d=identicon&s=25 Edwin Fine (efine)
on 2007-06-03 07:02
Kenneth McDonald wrote:
> I'm probably just missing something obvious, but I haven't found a way
> to match a regular expression against only part of a string, in
> particular only past a certain point of a string, as a way of finding
> successive matches. Of course, one could do a match against a string,
> take the substring past that match and do a match against the substring,
> and so on, to find all of the matches for the string, but that could be
> very expensive for very large strings.
>
> I'm aware of the String.scan method, but that doesn't work for me
> because it doesn't return MatchData instances.
>
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
>
>
> Thanks,
> Ken

How about this?

def match(s, re, n)
  /(?:.{#{n}})(#{re})/.match(s)
end

irb(main):043:0> p s
"abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh abdefgh
abdefgh "
irb(main):044:0> p match(s, /abd/, 10).begin(1)
16
irb(main):045:0> p match(s, /abd/, 20).begin(1)
24
2f4d4f9c35ea851bffb9a9cc2e086365?d=identicon&s=25 Harry Kakueki (Guest)
on 2007-06-03 07:21
(Received via mailing list)
On 6/3/07, Harry Kakueki <list.push@gmail.com> wrote:
>
> You could match the string but ignore the first part of the match.
>
> str = "abcdefghabcehjjjuabcfjkiabcgdfg"
> str =~ /(abc.)/
> p $1 # abcd
> str =~ /a.*ju(abc.)/
> p $1 #abcf
>
> Harry
>

If you want to specify the point in the string by number, you could do
this.

str = "abcdefghabcehjjjuabcfjkiabcgdfg"
str =~ /.{10}(abc.).*(abc.)/
p $1 #abcf
p $2 #abcg

Harry


--

A Look into Japanese Ruby List in English
http://www.kakueki.com/
47df9cfb356c3ee0523cc3571b169730?d=identicon&s=25 Kenneth McDonald (Guest)
on 2007-06-03 07:23
(Received via mailing list)
Edwin Fine wrote:
>> I'm aware of the String.scan method, but that doesn't work for me
> How about this?
> irb(main):045:0> p match(s, /abd/, 20).begin(1)
> 24
>
>
That's clever. Obscure, but clever :-). I wonder if the regexp engine is
clever enough to turn a match like .{n} into a constant time operation?

Thanks,
Ken
F1d6cc2b735bfd82c8773172da2aeab9?d=identicon&s=25 Nobuyoshi Nakada (Guest)
on 2007-06-03 07:32
(Received via mailing list)
Hi,

At Sun, 3 Jun 2007 13:56:05 +0900,
Patrick Hurley wrote in [ruby-talk:254059]:
> I think he wanted MatchData objects. The String#index method returns
> the index (numeric position of the match). But if all you want are
> captures, then index is a good solution.

String#index also sets $~.
2c51fec8183a5d21c4e11b430beabb47?d=identicon&s=25 Patrick Hurley (Guest)
on 2007-06-03 07:48
(Received via mailing list)
On 6/3/07, Nobuyoshi Nakada <nobu@ruby-lang.org> wrote:
> --
> Nobu Nakada
>
>

I should have know to never question Nobu Nakada :-), I always forget
about those variables.

Thanks
pth
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-06-03 09:31
(Received via mailing list)
On 03.06.2007 07:30, Nobuyoshi Nakada wrote:
> Hi,
>
> At Sun, 3 Jun 2007 13:56:05 +0900,
> Patrick Hurley wrote in [ruby-talk:254059]:
>> I think he wanted MatchData objects. The String#index method returns
>> the index (numeric position of the match). But if all you want are
>> captures, then index is a good solution.
>
> String#index also sets $~.

But then you can also use String#scan:

irb(main):002:0> "ababb".scan(/(a)b+/) {p $~}
#<MatchData:0x7ff94618>
#<MatchData:0x7ff94578>
=> "ababb"
irb(main):003:0> "ababb".scan(/(a)b+/) {p $~.to_a}
["ab", "a"]
["abb", "a"]
=> "ababb"

Ken, why do you need MatchData objects?

Kind regards

  robert
918c6daad03c85e51ad1a11f57017947?d=identicon&s=25 Devin Mullins (twifkak)
on 2007-06-03 10:27
(Received via mailing list)
Nobuyoshi Nakada wrote:
> String#index also sets $~.
For that matter, so does String#scan.
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2007-06-03 14:44
(Received via mailing list)
On Sun, Jun 03, 2007 at 12:59:24PM +0900, Kenneth McDonald wrote:
>
> What I want is just something like regexp.match(string, n), where the
> regexp starts looking for a match at or after position n in the string.
require 'strscan'
scanner = StringScanner.new(string)
scanner.pos = n
if scanner.scan(regexp)
  p scanner[1]
  p scanner.matched
  p scanner.pos
end

It's in the stdlib. (Note, it doesn't actually give you a match data, or
set $~, but of the top of my head I can't think of anything that a
matchdata can do that the stringscanner can't.)
8f6f95c4bd64d5f10dfddfdcd03c19d6?d=identicon&s=25 Rick Denatale (rdenatale)
on 2007-06-03 17:01
(Received via mailing list)
On 6/3/07, Devin Mullins <twifkak@comcast.net> wrote:
> Nobuyoshi Nakada wrote:
> > String#index also sets $~.
> For that matter, so does String#scan.

Hence:
irb(main):001:0> "abcdefabc".scan(/abc/) {puts "#{$~.inspect}, #{$~}"}
#<MatchData:0xb7b0220c>, abc
#<MatchData:0xb7b021e4>, abc
=> "abcdefabc"

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
47df9cfb356c3ee0523cc3571b169730?d=identicon&s=25 Kenneth McDonald (Guest)
on 2007-06-04 00:45
(Received via mailing list)
Is $~ thread safe?

To bad it has to be done this way (though my library will hide it). I
first looked at Ruby several years ago, and at that time, didn't go
further with it because it was too PERLish for me. (PERL was great for
its time, but speaking as someone who actually had to maintain a lot of
PERL code, it's actually a pretty grotty language). One of the things
that brought me back to Ruby was the fact that an effort was being made
to move Ruby away from its PERLisms. But I guess it'll take a while
longer...

Thanks everyone,
Ken
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2007-06-04 00:50
(Received via mailing list)
Kenneth McDonald wrote:
> Is $~ thread safe?

Yes. All the regex match "global" variables are actually per-thread. See
  p.319 of Pick Axe 2nd ed.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-06-04 11:48
(Received via mailing list)
On 04.06.2007 00:44, Kenneth McDonald wrote:
> Is $~ thread safe?

Yes.

> To bad it has to be done this way (though my library will hide it). I
> first looked at Ruby several years ago, and at that time, didn't go
> further with it because it was too PERLish for me. (PERL was great for
> its time, but speaking as someone who actually had to maintain a lot of
> PERL code, it's actually a pretty grotty language). One of the things
> that brought me back to Ruby was the fact that an effort was being made
> to move Ruby away from its PERLisms. But I guess it'll take a while
> longer...
>
> Thanks everyone,

Ken, I still don't understand why exactly you need MatchData objects.
What are you trying to achieve?

Kind regards

  robert
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-06-04 12:20
(Received via mailing list)
On 6/3/07, Kenneth McDonald <kenneth.m.mcdonald@sbcglobal.net> wrote:
>
> What I want is just something like regexp.match(string, n),
Hmm apart of using #scan and #index with $~ as indicated, I do not
think that there is a performance penalty if you do

rg.match(string[n..-1])

Cheers
Robert
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-06-04 12:21
(Received via mailing list)
On 6/4/07, Robert Dober <robert.dober@gmail.com> wrote:

> rg.match(string[n..-1])

My bad how stupid, am I thinking in C????
Robert
45196398e9685000d195ec626d477f0e?d=identicon&s=25 Trans (Guest)
on 2007-06-04 12:56
(Received via mailing list)
On Jun 4, 6:19 am, "Robert Dober" <robert.do...@gmail.com> wrote:
>
> > What I want is just something like regexp.match(string, n),
>
> Hmm apart of using #scan and #index with $~ as indicated, I do not
> think that there is a performance penalty if you do
>
> rg.match(string[n..-1])

How can that be? You have to create a whole new String. If that can be
avoided in the internal implementation then adding an optional offset
index to #match is not an unreasonable idea.

T.
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-06-04 13:30
(Received via mailing list)
On 6/4/07, Trans <transfire@gmail.com> wrote:
> >
> How can that be? You have to create a whole new String.
Beating a dead man Tom? As mentioned I had a terrible slip to C in my
reasoning, no idea why :(
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2007-06-04 13:48
(Received via mailing list)
Hi --

On Mon, 4 Jun 2007, Kenneth McDonald wrote:

> Is $~ thread safe?
>
> To bad it has to be done this way (though my library will hide it). I first
> looked at Ruby several years ago, and at that time, didn't go further with it
> because it was too PERLish for me. (PERL was great for its time, but speaking
> as someone who actually had to maintain a lot of PERL code, it's actually a
> pretty grotty language). One of the things that brought me back to Ruby was
> the fact that an effort was being made to move Ruby away from its PERLisms.
> But I guess it'll take a while longer...

The best thing is really just to use Ruby without thinking about Perl.
They're very different languages, and get mentioned in the same breath
far too often.


David
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-06-04 13:51
(Received via mailing list)
On 04.06.2007 13:28, Robert Dober wrote:
>> > > take the substring past that match and do a match against the
>> > Hmm apart of using #scan and #index with $~ as indicated, I do not
>> > think that there is a performance penalty if you do
>> >
>> > rg.match(string[n..-1])
>>
>> How can that be? You have to create a whole new String.
> Beating a dead man Tom? As mentioned I had a terrible slip to C in my
> reasoning, no idea why :(
>> If that can be avoided in the internal implementation then adding an
>> optional offset
>> index to #match is not an unreasonable idea.

Robert, actually string[n..-1] is cheaper than you might assume: I
believe the new string shares the char buffer with the old string, so
you basically just get a new String object with a different offset - the
large bit (the char data) is not copied.

Kind regards

  robert
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-06-04 14:07
(Received via mailing list)
On 6/4/07, Robert Klemme <shortcutter@googlemail.com> wrote:
> On 04.06.2007 13:28, Robert Dober wrote:

> Robert, actually string[n..-1] is cheaper than you might assume: I
> believe the new string shares the char buffer with the old string, so
> you basically just get a new String object with a different offset - the
> large bit (the char data) is not copied.
I am afraid that this is not true anymore when the slice is passed as
a formal parameter, the data has to be copied :(

irb(main):011:0> def change(x)
irb(main):012:1> x << "changed"
irb(main):013:1> end
=> nil
irb(main):014:0> a="abcdef"
=> "abcdef"
irb(main):015:0> change(a[1..2])
=> "bcchanged"
irb(main):016:0> a
=> "abcdef"

Cheers
Robert
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-06-04 14:22
(Received via mailing list)
On 04.06.2007 14:06, Robert Dober wrote:
> irb(main):011:0> def change(x)
> irb(main):012:1> x << "changed"
> irb(main):013:1> end
> => nil
> irb(main):014:0> a="abcdef"
> => "abcdef"
> irb(main):015:0> change(a[1..2])
> => "bcchanged"
> irb(main):016:0> a
> => "abcdef"

Copying in this case is not caused by using the string as a parameter
but by appending to it.

I thought this thread was about /scanning/ which is a read only
operation.  Did I miss something?

Kind regards

  robert
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-06-04 14:51
(Received via mailing list)
On 6/4/07, Robert Klemme <shortcutter@googlemail.com> wrote:
> >
>
> Copying in this case is not caused by using the string as a parameter
> but by appending to it.
>
> I thought this thread was about /scanning/ which is a read only
> operation.  Did I miss something?
No you did not, theoretically it might work like this:

def change( x )
   x << changed # copy on write
end

a="some string"
b=a[1..3]            # shallow copy
b << "changed"  # copy on write
a << "changed"  # no copy of course

but do you think it does? Note that the object must have state to know
when and how to copy the underlying data, I am about to read string.c
but it is quite complicated and I got some work to do :(.

Cheers
Robert
This topic is locked and can not be replied to.