String mangling


#1

Imagine an input string
aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee…

I have regexp for the parts a,b,c
and e can be considered as else.

So how can I efficiently search/step through the string from left to
right, while calling for each section the fitting handler, kind of

case section
/aaaa/ …

/bbb/ …

/cccc/ …

else

end

Thanks for ideas!
Martin


#2

On 12/14/05, Martin P. removed_email_address@domain.invalid wrote:

Imagine an input string
aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee…

I have regexp for the parts a,b,c
and e can be considered as else.

So how can I efficiently search/step through the string from left to
right, while calling for each section the fitting handler, kind of

irb(main):001:0> s = ‘aaaaabbccccceeebbbbbbbbbbeaaabaacccceee’
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceee”
irb(main):002:0> s.scan(/(a+)|(b+)|(c+)|([^abc]+)/)
=> [[“aaaaa”, nil, nil, nil], [nil, “bb”, nil, nil], [nil, nil,
“ccccc”, nil], [nil, nil, nil, “eee”], [nil, “bbbbbbbbbb”, nil, nil],
[nil, nil, nil, “e”], [“aaa”, nil, nil, nil], [nil, “b”, nil, nil],
[“aa”, nil, nil, nil], [nil, nil, “cccc”, nil], [nil, nil, nil,
“eee”]]
irb(main):003:0>


#3

Martin P. wrote:

/aaaa/ …

/bbb/ …

/cccc/ …

else

end

You are pretty close:

s=‘aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee’
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee”

s.scan /a+|b+|c+/ do |m|
?> case m

when /a+/
  puts "A"
when /b+/
  puts "B"
when /c+/
  puts "C"

end
end
A
B
C
B
A
B
A
C
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee”

Or, if you want to avoid a second match:

s.scan /(a+)|(b+)|(c+)/ do |m|
?> case ( m.inject(0) {|i,e| break i if e; i + 1} )

when 0
  puts "A"
when 1
  puts "B"
when 2
  puts "C"

end
end
A
B
C
B
A
B
A
C
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee”

Kind regards

robert

#4

Martin P. wrote:

Imagine an input string
aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee…

I have regexp for the parts a,b,c
and e can be considered as else.

So how can I efficiently search/step through the string from left to
right, while calling for each section the fitting handler, kind of

You could use String#scan to find bits that find sections that match any
of your requirements, then check to see which matched (your patterns
could be more complicated, but still distinguishable from one another)

str = ‘aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee’

a_rx = /a+/
b_rx = /b+/
c_rx = /c+/

str.scan(/(?:#{a_rx}|#{b_rx}|#{c_rx})/) do | part |
case part
when a_rx
# …
when b_rx
# …
when c_rx
# …
end
end


#5

On Dec 14, 2005, at 6:32 AM, Simon S. wrote:

irb(main):001:0> s = ‘aaaaabbccccceeebbbbbbbbbbeaaabaacccceee’
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceee”
irb(main):002:0> s.scan(/(a+)|(b+)|(c+)|([^abc]+)/)
=> [[“aaaaa”, nil, nil, nil], [nil, “bb”, nil, nil], [nil, nil,
“ccccc”, nil], [nil, nil, nil, “eee”], [nil, “bbbbbbbbbb”, nil, nil],
[nil, nil, nil, “e”], [“aaa”, nil, nil, nil], [nil, “b”, nil, nil],
[“aa”, nil, nil, nil], [nil, nil, “cccc”, nil], [nil, nil, nil,
“eee”]]
irb(main):003:0>

My similar thought:

str = ‘aaaaabbccccceeebbbbbbbbbbeaaabaacccceee’
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceee”

str.scan(/((\w)\2*)/).map { |chunk| chunk.first }
=> [“aaaaa”, “bb”, “ccccc”, “eee”, “bbbbbbbbbb”, “e”, “aaa”, “b”,
“aa”, “cccc”, “eee”]

James Edward G. II


#6

Robert K. removed_email_address@domain.invalid wrote:

/aaaa/ …

/bbb/ …

/cccc/ …

else

end
[…]
Or, if you want to avoid a second match:

of course I want :slight_smile:

A
B
C
B
A
B
A
C
=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee”

…but this doesn’t allow a processing step in the “else” case?

Martin


#7

Martin P. wrote:

Imagine an input string
aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee…

I have regexp for the parts a,b,c
and e can be considered as else.

So how can I efficiently search/step through the string from left to
right, while calling for each section the fitting handler, kind of

case section
/aaaa/ …

/bbb/ …

/cccc/ …

else

end

Aside from the oft-mentioned String#scan, you might look into
using StringScanner (require ‘strscan’) from the stdlib. It is
very good for more complex cases of scanning. Documentation is
available, for example, at http://www.ruby-doc.org/stdlib.

Thanks for ideas!
Martin

E


#8

Martin P. wrote:

case section
Or, if you want to avoid a second match:

  puts "C"

=> “aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee”

…but this doesn’t allow a processing step in the “else” case?

You can have an else clause - but it will never be called. I guess you
will process entries between matches. In that case scan won’t help - at
least not as used in my example.

A simple option would be to use #split with a group around the whole
regexp and then operate on the array of strings you get. Whether that’s
feasible (volume?) in you case I cannot decide.

s.split(/((?:a+)|(?:b+)|(?:c+))/.each do |m|
case section
/aaaa/ …

/bbb/ …

/cccc/ …
else
end
end

Kind regards

robert