Break apart a string by kind of characters

dewie · September 27, 2007, 5:54pm

Hi all, I’ve an interesting problem. Imagine the following string:

‘a1000aa’

I want to break it apart like so:

[ ‘a’, ‘1000’, ‘aa’ ]

I did a search on the forums and came up with this regex:

‘a1000aa’.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ ‘a’, ‘1’, ‘000’, ‘aa’ ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

dewie · September 27, 2007, 6:31pm

Daniel W. wrote:

Hi all, I’ve an interesting problem. Imagine the following string:

‘a1000aa’

I want to break it apart like so:

[ ‘a’, ‘1000’, ‘aa’ ]

I did a search on the forums and came up with this regex:

‘a1000aa’.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ ‘a’, ‘1’, ‘000’, ‘aa’ ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

I figured out one possible solution. Granted, it’s not as elegant as a
single regex, but it works and I understand it. Here goes…

First, I opened up class String to add some convenience and make things
a bit shorter:

class String

def letter?
self.first.scan(/[A-Za-z]/).empty? ? false : true
end

def digit?
self.first.scan(/[0123456789]/).empty? ? false : true
end

end

Any my method:

def break_apart_rule_increment
groups = Array.new
string = ‘a1000aa’

string.each_char do |character|
# Put the first character into a group.
groups << character and next if groups.empty?

# If this character is of the same kind as the last,
# add it to the group, otherwise, create a new group
# and put it there.
if (groups.last.letter? and character.letter?) or

(groups.last.digit? and character.digit?)
groups.last << character
else
groups << character
end
end

groups
end

dewie · September 27, 2007, 7:17pm

On Sep 27, 9:55 am, Daniel W. [email protected] wrote:

Hi all, I’ve an interesting problem. Imagine the following string:

‘a1000aa’

I want to break it apart like so:

[ ‘a’, ‘1000’, ‘aa’ ]

irb(main):001:0> s = ‘a1000aa’
=> “a1000aa”
irb(main):002:0> s.split( /(\d+)/ )
=> [“a”, “1000”, “aa”]

dewie · September 27, 2007, 7:40pm

On Sep 27, 11:11 am, Phrogz [email protected] wrote:

irb(main):001:0> s = ‘a1000aa’
=> “a1000aa”
irb(main):002:0> s.split( /(\d+)/ )
=> [“a”, “1000”, “aa”]

Or, if you want multiple types of character groupings:

irb(main):001:0> s = ‘hello world, you crazy world!’
=> “hello world, you crazy world!”

irb(main):003:0> s.scan( /[aeiou]+|[b-df-hj-np-tv-z]+|[^a-z]+/ )
=> [“h”, “e”, “ll”, “o”, " ", “w”, “o”, “rld”, ", ", “y”, “ou”, " ",
“cr”, “a”, “zy”, " ", “w”, “o”, “rld”, “!”]

dewie · September 28, 2007, 2:05am

Gavin K. wrote:

irb(main):001:0> s = ‘a1000aa’
=> “a1000aa”
irb(main):002:0> s.split( /(\d+)/ )
=> [“a”, “1000”, “aa”]

WOW! Freakin’ awesome!

One caveat…

irb(main):004:0> ‘11aa1000aaa’.split(/(\d+)/)
=> ["", “11”, “aa”, “1000”, “aaa”]

For some reason it answers with a blank element, but I’m sure that’s an
easy one to solve.

Thanks, Gavin!

dewie · September 28, 2007, 2:09am

On Sep 27, 2007, at 7:05 PM, Daniel W. wrote:

irb(main):004:0> ‘11aa1000aaa’.split(/(\d+)/)
=> ["", “11”, “aa”, “1000”, “aaa”]

For some reason it answers with a blank element, but I’m sure
that’s an
easy one to solve.

If you just want digits and non-digits, I suggest:

‘11aa1000aaa’.scan(/\D+|\d+/)
=> [“11”, “aa”, “1000”, “aaa”]

James Edward G. II

dewie · September 28, 2007, 8:33am

James G. wrote:

If you just want digits and non-digits, I suggest:

‘11aa1000aaa’.scan(/\D+|\d+/)
=> [“11”, “aa”, “1000”, “aaa”]

I LOVE it! I gotta brush up on my regex skills. Wait, I need to get some
regex skills first.

Thanks, Edward; that made my night.

dewie · September 28, 2007, 1:09pm

Gavin K. wrote:

On Sep 27, 9:55 am, Daniel W. [email protected] wrote:

Hi all, I’ve an interesting problem. Imagine the following string:

‘a1000aa’

I want to break it apart like so:

[ ‘a’, ‘1000’, ‘aa’ ]

irb(main):001:0> s = ‘a1000aa’
=> “a1000aa”
irb(main):002:0> s.split( /(\d+)/ )
=> [“a”, “1000”, “aa”]

Gavin, how in the WORLD does this bit of black magic work and how did
you ever figure it out???

dewie · September 28, 2007, 2:24pm

On Sep 28, 2007, at 6:09 AM, Lloyd L. wrote:

irb(main):001:0> s = ‘a1000aa’
=> “a1000aa”
irb(main):002:0> s.split( /(\d+)/ )
=> [“a”, “1000”, “aa”]

Gavin,

I’m not Gavin, but…

how in the WORLD does this bit of black magic work

Captures in a Regexp passed to split() are returned as part of the
result.

and how did you ever figure it out???

Interestingly, the documentation doesn’t seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

James Edward G. II

dewie · September 28, 2007, 3:08pm

On Sep 28, 7:23 am, James Edward G. II [email protected]
wrote:

I’m not Gavin, but…

Ditto

Interestingly, the documentation doesn’t seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

Ditto

James Edward G. II

Not ditto