Split a string based on change of character

From: James Edward G. II [mailto:[email protected]]

>> require “strscan” # => true

>> scanner = StringScanner.new(“ZBBBCZZ”) # => #<StringScanner 0/7 @

“ZBBBC…”>

>> char_runs = Array.new # => []

>> char_runs << scanner.matched while scanner.scan(/(.)\1*/m) # =>

nil

>> char_runs # => [“Z”, “BBB”, “C”, “ZZ”]

i just started playing w string scan after getting a hint fr dblack and
reading this rubyish example fr James. i think stringscanner is an ideal
solution for string scanning related problems. I noticed that
stringscanner#scan returns the match, so,

irb> s = StringScanner.new(“ZBBBCZZ”)
=> <StringScanner 0/7 @ “ZBBBC…”>
irb> a=[]
=> []
irb> a << x while x=s.scan(/(.)\1*/m)
=> nil
irb> a
=> [“Z”, “BBB”, “C”, “ZZ”]

again, short and readable. ruby rocks.
kind regards -botp

ps: stringscanner docs are here
http://www.ruby-doc.org/stdlib/libdoc/strscan/rdoc/index.html

Peña wrote:

From: Logan C. [mailto:[email protected]]

On 8/13/07, Brad P. [email protected] wrote:

> Enumerator.new(s, :scan, /(.)\1*/).map {$&}

What’s wrong with s.enum_for(:scan, /(.)\1*/).map { $& } ?

somehow i missed the enumerator hack. thank you logan and brad for the update.
kind regards -botp

Hey cool … ‘enum_for’ exactly what I was looking for. I couldn’t
understand why it didn’t exist and it does. Scratch my suggestion
for ‘enum_scan’.

B

Andrew S. wrote:

For a string “ZBBBCZZ”, I want to produce a list [“Z”, “BBB”, “C”, “ZZ”]
That is, break the string into pieces based on change of character.

Though this works:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Another variant which gets rid of one of the capture
groups and does introduce an artificial split character

Enumerator.new(s, :scan, /(.)\1*/).map {$&}

Note the $& will not work in the example

x = s.scan(/((.)\2*)/).map {|i| i[0]}

because the map is run on an array after the
scan has happened. To run the map inline with
the scan you need the Enumerator object.

I doubt using Enumerator is any faster though.

Wouldn’t it be nicer if scan returned an
enumerable instead of an array. We could
define

class String
def scan_enum regexp
Enumerator.new self, :scan, regexp
end
end

and then be able to do

s.scan_enum(/(.)\1*/).map {$&}

On 8/14/07, Brad P. [email protected] wrote:

kind regards -botp
s.enum_for.scan(/(.)\1*/).map { $& }
end
def enum_for
(To be correct though a full implementation should use the BlankSlate
class and properly implement the original enum_for interface )

That’s the anti-clear IMO. Any time you use method_missing theres
potential
for breakage, and I don’t even see a good reason in this case.
s.enum[erator]_for[the
method](:scan …

Any time I’m tempted to use . as an argument separator I think about
variables, ie

foo = s.enum_for
… code …
foo.scan(/(.)\1*/).map { $& }

Raise your hand if you think foo is a string now.

… months later …
puts “Hello, #{foo}!”

B

From: Logan C. [mailto:[email protected]]

On 8/13/07, Brad P. [email protected] wrote:

> Enumerator.new(s, :scan, /(.)\1*/).map {$&}

What’s wrong with s.enum_for(:scan, /(.)\1*/).map { $& } ?

somehow i missed the enumerator hack. thank you logan and brad for the
update.
kind regards -botp

Brad P. wrote:

Hey cool … ‘enum_for’ exactly what I was looking for. I couldn’t
understand why it didn’t exist and it does. Scratch my suggestion
for ‘enum_scan’.

B

Would it not be clearer if enum_for worked as

s.enum_for.scan(/(.)\1*/).map { $& }

Quickie

require ‘enumerator’

module Enumerable
class Expr
def initialize( enum )
@enum = enum
end

     def method_missing(name, *args)
         @enum.enum_for_old(name, *args)
     end
 end

end

class Object
alias :enum_for_old :enum_for
def enum_for
Enumerable::Expr.new(self)
end
end

s = “aabbccvvvvfg dddd”
r = s.enum_for.scan(/(.)\1*/).map {$&}
puts r

(To be correct though a full implementation should use the BlankSlate
class and properly implement the original enum_for interface )

B

On 8/13/07, Brad P. [email protected] wrote:

Another variant which gets rid of one of the capture
the scan you need the Enumerator object.
end
end

and then be able to do

s.scan_enum(/(.)\1*/).map {$&}

What’s wrong with s.enum_for(:scan, /(.)\1*/).map { $& } ?