I need a string#all_indices method--is there such a thing?

In ruby you can use string#index as follows:
str = “some text”
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

I wrote the following, which works, but there must be a more elegant
way:

class String
def all_indices(regex)
indices = []
index = 0
while index && index < self.length #index will be nil upon first
match failure, otherwise quit loop when index is equal to string
length
index = self.index(regex, index)
if index.is_a? Numeric #avoids getting a nil into the indices
array
indices << index
index +=1
end
end
indices
end
end
p “this is a test string for the ts in the worldt”.all_indices(/t/)
p “what is up with all the twitter hype”.all_indices(/w/)

>> [0, 10, 13, 16, 26, 30, 36, 45]

>> [0, 11, 25]

scan

Scan gives you the matches, not the indices (which is what I need).

“this is a test for scan”.scan(/t/)
=> [“t”, “t”, “t”]

On Fri, Aug 28, 2009 at 10:25 AM, timr[email protected] wrote:

   indices << index

What about
class String
def indices rgx, idx=0
[].tap{ |r|
loop do
idx = index rgx, idx
break unless idx
r << idx
idx += 1
end
}
end
end

p “baaababbabbbba”.indices( /a/ )

Sorry, bad idea.

Hi,

Am Freitag, 28. Aug 2009, 17:40:05 +0900 schrieb timr:

Scan gives you the matches, not the indices (which is what I need).

“this is a test for scan”.scan(/t/)
=> [“t”, “t”, “t”]

There’s a trick to do it with String#scan:

a = []
“this is a test for scan”.scan( /t/) { a.push $`.length }
a

This does not work when the matches overlap.

“banana”.scan /ana/ #=> [“ana”]

Bertram

On Fri, Aug 28, 2009 at 5:25 PM, timr[email protected] wrote:

In ruby you can use string#index as follows:
str = “some text”
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

Does this do what you want?

class String
def all_indices(reg)
tmp,idx = [],[]
(0…self.length).each{|x| tmp[x] = self[x…-1]}
tmp.each_with_index{|y,i| idx << i if y =~ /\A#{reg}/}
idx
end
end

p “this is a test string for the ts in the worldt”.all_indices(/th/)
#> [0, 26, 36]

It may not be very fast for very long strings ( I didn’t check).
But for strings like your example it seems OK.

Harry

p “this is a test string for the ts in the worldt”.all_indices(/th/)
#> [0, 26, 36]

Harry

Sorry, it looks like I had an unnecessary line in there.

class String
def all_indices(reg)
idx = []
(0…self.length).each{|x| idx << x if self[x…-1] =~ /\A#{reg}/}
idx
end
end

p “this is a test string for the ts in the worldt”.all_indices(/th/)
#> [0, 26, 36]
p “banana”.all_indices(/ana/) #> [1, 3]

Harry

On Aug 28, 4:25 am, timr [email protected] wrote:

In ruby you can use string#index as follows:
str = “some text”
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

Facets has:

def index_all(s, reuse=false)
s = Regexp.new(Regexp.escape(s)) unless Regexp===s
ia = []; i = 0
while (i = index(s,i))
ia << i
i += (reuse ? 1 : $~[0].size)
end
ia
end

}

end
end

[].tap?
you must have defined a tap method for array somewhere. But not in the
code you showed. Can’t run the code without a definition for tap.
Thanks,
Tim

On Aug 28, 2:01 am, Robert D. [email protected] wrote:

way:
array

>> [0, 11, 25]

  end
}

end
end

p “baaababbabbbba”.indices( /a/ )


If you tell the truth you don’t have to remember anything.

Samuel Clemens (some call him Mark Twain)

Oh, tap is new in 1.9. Sorry, I hadn’t come across it before and was
in 1.8.6 so it wasn’t running. Got it now.

On Aug 28, 5:02 am, Harry K. [email protected] wrote:

(0...self.length).each{|x| idx << x if self[x..-1] =~ /\A#{reg}/}


A Look into Japanese Ruby List in Englishhttp://www.kakueki.com/ruby/list.html

This works and the code is more concise than what I had, but it is a
brute force approach that test for matches from every possible
substring. That would be a bit slow.

At 2009-08-28 04:20AM, “timr” wrote:

     indices << index
     index +=1
   end
 end
 indices

end
end
p “this is a test string for the ts in the worldt”.all_indices(/t/)
p “what is up with all the twitter hype”.all_indices(/w/)

>> [0, 10, 13, 16, 26, 30, 36, 45]

>> [0, 11, 25]

This is a bit simpler:
class String
def all_indices(substring)
idx = 0
indices = []
loop do
idx = index(substring, idx)
break if idx.nil?
indices << idx
idx += 1
end
indices
end
end

require 'test/unit'
class TestAllIndices < Test::Unit::TestCase
  def test_it
    assert_equal(
      [0, 10, 13, 16, 26, 30, 36, 45],
      "this is a test string for the ts in the 

worldt".all_indices(/t/)
)
assert_equal(
[0, 11, 25],
“what is up with all the twitter hype”.all_indices(/w/)
)
assert_equal(
[12, 17, 26, 41],
“the quick brown fox jumps over the lazy dog”.all_indices(‘o’)
)
assert_equal(
[1, 3, 5],
“bananana”.all_indices(‘ana’)
)
end
end

Bertram S. wrote:

a = []
“this is a test for scan”.scan( /t/) { a.push $`.length }
a

This does not work when the matches overlap.

“banana”.scan /ana/ #=> [“ana”]

Bertram

Same difficulty with overlap, but for variety:

class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

p “foo bar baz”.all_indexes(/…/)
p “banana”.all_indexes(/ana/)

END

Output:

[0, 3, 6]
[1]

timr:

But what if I want to get all the indices for a regex
in the string? Is there an string#all_indices method?

How about the below?

class String
def all_indices needle
all = []
offset = 0
loop do
i = index needle, offset
break if i.nil?
all << i
offset = i + 1
end
all
end
end

— Shot

On Fri, Aug 28, 2009 at 4:55 PM, timr[email protected] wrote:

}

end
end

[].tap?
you must have defined a tap method for array somewhere. But not in the
code you showed. Can’t run the code without a definition for tap.
Thanks,
Tim

Sorry I am an unconditional one-niner. I really should be more careful
to mark 1.9 only features with comments. At least for some more weeks
:wink:

On Sat, Aug 29, 2009 at 12:05 AM, timr[email protected] wrote:

p “this is a test string for the ts in the worldt”.all_indices(/th/)

This is not fast enough?

class String
def all_indices(reg)
idx = []
(0…self.length).each{|x| idx << x if self[x…-1] =~ /\A#{reg}/}
idx
end
end

p (“this is a test string for the ts in the
worldt”*1000).all_indices(/th/)

I guess you are processing some big strings.
Speed is not what you asked for.
Well, until now :slight_smile:

Harry

Hi –

On Sat, 29 Aug 2009, Joel VanderWerf wrote:

Same difficulty with overlap, but for variety:

class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

Just to add to the collection: there’s also $~.offset(1)[0]

David


David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

Joel VanderWerf wrote:

class String
def all_indexes re
a=[];scan(re) {a<<$~.begin(0)};a
end
end

p “foo bar baz”.all_indexes(/…/)
p “banana”.all_indexes(/ana/)

and this variant counts overlaps:

p “banana”.all_indexes(/(?=ana)/)

On Aug 28, 4:25 am, timr [email protected] wrote:

In ruby you can use string#index as follows:
str = “some text”
str.index(/t/)
=>5

But what if I want to get all the indices for a regex in the string?
Is there an string#all_indices method?

A lot of solutions have been given here. It would be nice to see a
test/benchmark matrix to compare them, if anyone is up to it.