Split a string based on change of character

starmaerker · August 11, 2007, 2:53am

For a string “ZBBBCZZ”, I want to produce a list [“Z”, “BBB”, “C”, “ZZ”]
That is, break the string into pieces based on change of character.

Though this works:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I’m new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = “ZBBBCCZZ”
x = [’’.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python’s itertools?

Thanks,
/-\

starmaerker · August 11, 2007, 5:56am

From: Andrew S. [mailto:[email protected]]

s = “ZBBBCZZ”

x = s.scan(/((.)\2*)/).map {|i| i[0]}

when it comes to string patterns like this, nothing beats regex

import itertools

s = “ZBBBCCZZ”

x = [‘’.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python’s itertools?

hmm, you seem to like this than your previous regex+map solution, why?
(i ask because i prefer your first solution --not that it’s ruby)

in 1.9 or the upcoming ruby, it keeps getting better and better and may
look like this,

s = “ZBBBCZZ”
x = s.split(‘’).group_by{|x| x}.entries

or possibly to

x = s.split(‘’).group_by.entries

but unfortunately i don’t have a 1.9 build here to test (grrr, shouldn’t
have deleted that vm).

kind regards -botp

starmaerker · August 11, 2007, 8:06am

— Peña, Botp [email protected] wrote:

hmm, you seem to like this than your previous regex+map solution, why? (i ask
because i prefer your first solution --not that it’s ruby)

Actually, I’m not super happy with either solution.

The annoyance with the regex solution:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

is that you capture the backref (when you don’t really want to), only
to discard it in the map, which seems a bit awkard and inefficient.
However, I can’t see any way around this owing to the regex semantics
of returning all fields in parens (one has the same problem in Perl
and Python, BTW). [If there were a way to specify a non-capturing
back-ref, that would do the trick.]

in 1.9 or the upcoming ruby, it keeps getting better and better and may look
like this,

s = “ZBBBCZZ”
x = s.split(‘’).group_by{|x| x}.entries

My reading of:

eigenclass.org

indicates that Enumerable#group_by can’t work because it would seem to
lose
the ordering and, grouping by key, will have only one group for ‘Z’
above,
when I want two distinct groups. (I would be delighted to be proved
wrong,
however).

I also scanned the Facets library but didn’t find anything obvious.

Cheers,
/-\

starmaerker · August 11, 2007, 11:14am

On Sat, Aug 11, 2007 at 09:52:24AM +0900, Andrew S. wrote:

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = “ZBBBCCZZ”
x = [’’.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python’s itertools?

Nothing off the top of my head, but how does this work for you ?

in_str.split('').inject([]) do |m,l|
    if m.last and m.last[0].chr == l
        m[-1] += l
    else
        m << l
    end
    m
end

Its not too lines, but it will return the same array

enjoy

-jeremy

starmaerker · August 11, 2007, 11:27am

Andrew S. wrote:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Maybe this ist faster:

result = []
“ZBBBCZZ”.scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [[“Z”], [“BBB”], [“C”], [“ZZ”]]

Wolfgang NÃ¡dasi-Donner

starmaerker · August 11, 2007, 11:29am

Maybe this ist faster:

result = []
“ZBBBCZZ”.scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [[“Z”], [“BBB”], [“C”], [“ZZ”]]

Wolfgang NÃ¡dasi-Donner

result = []
“ZBBBCZZ”.scan(/((.)\2*)/){result.push [$~[0]]}
p erg # => [[“Z”], [“BBB”], [“C”], [“ZZ”]]

Sorry - typo by translation of variable name

Wolfgang NÃ¡dasi-Donner

starmaerker · August 11, 2007, 2:45pm

Hi –

On Sat, 11 Aug 2007, Peña, Botp wrote:

hmm, you seem to like this than your previous regex+map solution, why? (i ask because i prefer your first solution --not that it’s ruby)

in 1.9 or the upcoming ruby, it keeps getting better and better and may look like this,

s = “ZBBBCZZ”
x = s.split(’’).group_by{|x| x}.entries

or possibly to

x = s.split(’’).group_by.entries

I’m going to have to get special glasses that can read invisible
ink…

David

starmaerker · August 11, 2007, 12:01pm

Andrew S. schrieb:

For a string “ZBBBCZZ”, I want to produce a list [“Z”, “BBB”, “C”, “ZZ”]
That is, break the string into pieces based on change of character.

Though this works:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

you may want to write it as …map{|i,|i}

I’m new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

import itertools
s = “ZBBBCCZZ”
x = [’’.join(g) for k, g in itertools.groupby(s)]
Does anyone know if Ruby has a similar library to Python’s itertools?

No idea, here is another variant to play with:

x = /#{s.gsub(/(.)\1*/, ‘(\1+)’)}/.match(s).captures

funny little problem.

cheers

Simon

starmaerker · August 11, 2007, 3:15pm

Hi –

On Sat, 11 Aug 2007, Andrew S. wrote:

For a string “ZBBBCZZ”, I want to produce a list [“Z”, “BBB”, “C”, “ZZ”]
That is, break the string into pieces based on change of character.

Though this works:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I’m new to Ruby and am interested to learn if there is a better way to
do it.

Probably not better, but just for fun, here’s a way using the strscan
extension. I’d be very interested if anyone can get this to be less
clunky – in particular, the - [""] at the end.

require ‘strscan’
s = StringScanner.new(“AABCCCDAAAEE”)

s.string.split(//).inject([]) {|a,b| a << s.scan_until(/(?!#{b})/) } -
[""]

=> [“AA”, “B”, “CCC”, “D”, “AAA”, “EE”]

David

starmaerker · August 11, 2007, 3:41pm

On Aug 11, 2007, at 2:52 AM, Andrew S. wrote:

For a string “ZBBBCZZ”, I want to produce a list [“Z”, “BBB”, “C”,
“ZZ”]
That is, break the string into pieces based on change of character.

Though this works:

s = “ZBBBCZZ”
x = s.scan(/((.)\2*)/).map {|i| i[0]}

Yeah, it’s short but I agree with things you dislike about it. My
approach was essentially the same as Jeremy’s;

s.split(//).inject([]) {|g, c| (g.last && g.last[c] ? g.last : g)
<< c; g}

That’s just playing around though, I think that approach is not better.

In my view a better idiom would be to split on character switches.
That would be concise. But as you know if you put groups you get them
back. I see no way to express the condition for boundaries without
using groups.

– fxn

starmaerker · August 11, 2007, 6:25pm

On Aug 11, 2007, at 8:14 AM, [email protected] wrote:

s = “ZBBBCZZ”
require ‘strscan’
s = StringScanner.new(“AABCCCDAAAEE”)

s.string.split(//).inject([]) {|a,b| a << s.scan_until(/(?!#
{b})/) } - [""]

=> [“AA”, “B”, “CCC”, “D”, “AAA”, “EE”]

My best effort:

require “strscan”
=> true

scanner = StringScanner.new(“ZBBBCZZ”)
=> #<StringScanner 0/7 @ “ZBBBC…”>

char_runs = Array.new
=> []

char_runs << scanner.matched while scanner.scan(/(.)\1*/m)
=> nil

char_runs
=> [“Z”, “BBB”, “C”, “ZZ”]

James Edward G. II

starmaerker · August 12, 2007, 1:45am

On Aug 10, 7:52 pm, Andrew S. [email protected] wrote:

/-\

Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.http://au.docs.yahoo.com/mail/unlimitedstorage.html

s = “ZBBBCZZ”
==>“ZBBBCZZ”
s.scan( /((.)\2*)/ ).transpose.first
==>[“Z”, “BBB”, “C”, “ZZ”]
s.gsub( /(.)(?!\1)/, “\1\n” ).split
==>[“Z”, “BBB”, “C”, “ZZ”]

starmaerker · August 11, 2007, 7:01pm

On 8/11/07, [email protected] [email protected] wrote:

s = “ZBBBCZZ”
x = s.split(‘’).group_by{|x| x}.entries

or possibly to

x = s.split(‘’).group_by.entries

I’m going to have to get special glasses that can read invisible
ink…

whoops, sorry =)
that should be

fr
x = s.split(‘’).group_by{|x| x}.entries.map{|x| x.join}

to
x = s.split(‘’).group_by.entries.map{|x| x.join}

i assume that group_by without a block would group the elements by
themselves. maybe i should name it group not group_by

kind regards -botp

starmaerker · August 12, 2007, 6:27am

From: William J. [mailto:[email protected]]

s = “ZBBBCZZ”

==>“ZBBBCZZ”

s.scan( /((.)\2*)/ ).transpose.first

==>[“Z”, “BBB”, “C”, “ZZ”]

s.gsub( /(.)(?!\1)/, “\1\n” ).split

==>[“Z”, “BBB”, “C”, “ZZ”]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

starmaerker · August 12, 2007, 10:11am

Peña schrieb:

From: William J. [mailto:[email protected]]

s = “ZBBBCZZ”

==>“ZBBBCZZ”

s.scan( /((.)\2*)/ ).transpose.first

==>[“Z”, “BBB”, “C”, “ZZ”]

s.gsub( /(.)(?!\1)/, “\1\n” ).split

==>[“Z”, “BBB”, “C”, “ZZ”]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

Yeah, nice!

i think one can simplify from

s.gsub( /(.)(?!\1)/, “\1\n” ).split

to

s.gsub(/(.)\1*/, '\0 ').split

?

cheers

Simon

starmaerker · August 12, 2007, 1:28pm

On 8/12/07, Simon KrÃ¶ger [email protected] wrote:

ruby hacker, James, that is cool! gotta keep this.
s.gsub(/(.)\1*/, '\0 ').split

Yes it appears so. Another variation would be (this lets you use the
method on strings that contain whitespace already correctly):
require ‘enumerator’
s.enum_for(:gsub, /(.)\1*/).to_a

Which is sort of back to the original scan method.

?

starmaerker · August 12, 2007, 1:38pm

Hi –

On Sun, 12 Aug 2007, botp wrote:

themselves. maybe i should name it group not group_by
Actually I think group_by with nothing specified just returns an
enumerator over the array itself, so it probably will never be used (I
hope

I don’t think group_by will work for this problem, though, because it
groups everything together:

irb(main):014:0> s
=> “AABCDAAE”
irb(main):015:0> s.split(//).group_by {|x| x }.map {|e| e.join }
=> [“AAAAA”, “CC”, “EE”, “DD”, “BB”]

Notice how all the A’s got put in one result.

David

starmaerker · August 12, 2007, 7:25pm

On 8/12/07, [email protected] [email protected] wrote:

starmaerker · August 12, 2007, 7:29pm

On 8/12/07, [email protected] [email protected] wrote:

irb(main):015:0> s.split(//).group_by {|x| x }.map {|e| e.join }
=> [“AAAAA”, “CC”, “EE”, “DD”, “BB”]
Notice how all the A’s got put in one result.

arrghh, sorry, yes. it’s really grouping w no regards to sequence.
thank you for the update
kind regards -botp

starmaerker · August 13, 2007, 9:46am

On Aug 12, 3:06 am, Simon Kröger [email protected] wrote:

ruby hacker, James, that is cool! gotta keep this.
s.gsub(/(.)\1*/, '\0 ').split

?

cheers

Simon

Yes, with the possible exception of
“\1\n” . I was anticipating the need to allow
the string to contain any character but a
newline.

s = ‘ZBBBC ZZ’
==>“ZBBBC ZZ”
s.gsub(/(.)\1*/, “\0\n”).split(“\n”)
==>[“Z”, “BBB”, “C”, " ", “ZZ”]