Forum: Ruby string mangling

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
64c2db1e7c76e72ea9f06ffe6972e680?d=identicon&s=25 Martin Pirker (Guest)
on 2005-12-14 13:20
(Received via mailing list)
Imagine an input string
aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee...

I have regexp for the parts a,b,c
and e can be considered as else.

So how can I efficiently search/step through the string from left to
right, while calling for each section the fitting handler, kind of

case section
  /aaaa/ ...

  /bbb/  ...

  /cccc/ ..

  else

end


Thanks for ideas!
Martin
896cfc242a7762467c2a0b2af86598e5?d=identicon&s=25 Simon Strandgaard (Guest)
on 2005-12-14 13:35
(Received via mailing list)
On 12/14/05, Martin Pirker <crf@sbox.tu-graz.ac.at> wrote:
> Imagine an input string
> aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee...
>
> I have regexp for the parts a,b,c
> and e can be considered as else.
>
> So how can I efficiently search/step through the string from left to
> right, while calling for each section the fitting handler, kind of

irb(main):001:0> s = 'aaaaabbccccceeebbbbbbbbbbeaaabaacccceee'
=> "aaaaabbccccceeebbbbbbbbbbeaaabaacccceee"
irb(main):002:0> s.scan(/(a+)|(b+)|(c+)|([^abc]+)/)
=> [["aaaaa", nil, nil, nil], [nil, "bb", nil, nil], [nil, nil,
"ccccc", nil], [nil, nil, nil, "eee"], [nil, "bbbbbbbbbb", nil, nil],
[nil, nil, nil, "e"], ["aaa", nil, nil, nil], [nil, "b", nil, nil],
["aa", nil, nil, nil], [nil, nil, "cccc", nil], [nil, nil, nil,
"eee"]]
irb(main):003:0>
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2005-12-14 14:14
(Received via mailing list)
Martin Pirker wrote:
>   /aaaa/ ...
>
>   /bbb/  ...
>
>   /cccc/ ..
>
>   else
>
> end

You are pretty close:

>> s='aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee'
=> "aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee"
>> s.scan /a+|b+|c+/ do |m|
?>   case m
>>     when /a+/
>>       puts "A"
>>     when /b+/
>>       puts "B"
>>     when /c+/
>>       puts "C"
>>   end
>> end
A
B
C
B
A
B
A
C
=> "aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee"

Or, if you want to avoid a second match:

>> s.scan /(a+)|(b+)|(c+)/ do |m|
?>   case ( m.inject(0) {|i,e| break i if e; i + 1} )
>>     when 0
>>       puts "A"
>>     when 1
>>       puts "B"
>>     when 2
>>       puts "C"
>>   end
>> end
A
B
C
B
A
B
A
C
=> "aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee"
>>

Kind regards

    robert
669b7046f02e5dfc4bda4421f1069731?d=identicon&s=25 Alex Fenton (Guest)
on 2005-12-14 14:14
(Received via mailing list)
Martin Pirker wrote:
> Imagine an input string
> aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee...
>
> I have regexp for the parts a,b,c
> and e can be considered as else.
>
> So how can I efficiently search/step through the string from left to
> right, while calling for each section the fitting handler, kind of

You could use String#scan to find bits that find sections that match any
of your requirements, then check to see which matched (your patterns
could be more complicated, but still distinguishable from one another)

str = 'aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee'

a_rx = /a+/
b_rx = /b+/
c_rx = /c+/

str.scan(/(?:#{a_rx}|#{b_rx}|#{c_rx})/) do | part |
  case part
  when a_rx
    # ...
  when b_rx
    # ...
  when c_rx
    # ...
  end
end
64c2db1e7c76e72ea9f06ffe6972e680?d=identicon&s=25 Martin Pirker (Guest)
on 2005-12-14 14:23
(Received via mailing list)
Robert Klemme <bob.news@gmx.net> wrote:
>>   /aaaa/ ...
>>
>>   /bbb/  ...
>>
>>   /cccc/ ..
>>
>>   else
>>
>> end
[...]
> Or, if you want to avoid a second match:

of course I want :-)

> A
> B
> C
> B
> A
> B
> A
> C
> => "aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee"

....but this doesn't allow a processing step in the "else" case?

Martin
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2005-12-14 14:39
(Received via mailing list)
On Dec 14, 2005, at 6:32 AM, Simon Strandgaard wrote:

> irb(main):001:0> s = 'aaaaabbccccceeebbbbbbbbbbeaaabaacccceee'
> => "aaaaabbccccceeebbbbbbbbbbeaaabaacccceee"
> irb(main):002:0> s.scan(/(a+)|(b+)|(c+)|([^abc]+)/)
> => [["aaaaa", nil, nil, nil], [nil, "bb", nil, nil], [nil, nil,
> "ccccc", nil], [nil, nil, nil, "eee"], [nil, "bbbbbbbbbb", nil, nil],
> [nil, nil, nil, "e"], ["aaa", nil, nil, nil], [nil, "b", nil, nil],
> ["aa", nil, nil, nil], [nil, nil, "cccc", nil], [nil, nil, nil,
> "eee"]]
> irb(main):003:0>

My similar thought:

 >> str = 'aaaaabbccccceeebbbbbbbbbbeaaabaacccceee'
=> "aaaaabbccccceeebbbbbbbbbbeaaabaacccceee"
 >> str.scan(/((\w)\2*)/).map { |chunk| chunk.first }
=> ["aaaaa", "bb", "ccccc", "eee", "bbbbbbbbbb", "e", "aaa", "b",
"aa", "cccc", "eee"]

James Edward Gray II
Fe9b2d0628c0943af374b2fe5b320a82?d=identicon&s=25 Eero Saynatkari (rue)
on 2005-12-14 20:39
Martin Pirker wrote:
> Imagine an input string
> aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee...
>
> I have regexp for the parts a,b,c
> and e can be considered as else.
>
> So how can I efficiently search/step through the string from left to
> right, while calling for each section the fitting handler, kind of
>
> case section
>   /aaaa/ ...
>
>   /bbb/  ...
>
>   /cccc/ ..
>
>   else
>
> end

Aside from the oft-mentioned String#scan, you might look into
using StringScanner (require 'strscan') from the stdlib. It is
very good for more complex cases of scanning. Documentation is
available, for example, at http://www.ruby-doc.org/stdlib.

> Thanks for ideas!
> Martin


E
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2005-12-15 11:08
(Received via mailing list)
Martin Pirker wrote:
>>> case section
>> Or, if you want to avoid a second match:
>>>>       puts "C"
>> => "aaaaabbccccceeebbbbbbbbbbeaaabaacccceeee"
>
> ...but this doesn't allow a processing step in the "else" case?

You can have an else clause - but it will never be called.  I guess you
will process entries between matches.  In that case scan won't help - at
least not as used in my example.

A simple option would be to use #split with a group around the whole
regexp and then operate on the array of strings you get.  Whether that's
feasible (volume?) in you case I cannot decide.

s.split(/((?:a+)|(?:b+)|(?:c+))/.each do |m|
 case section
   /aaaa/ ...

   /bbb/  ...

   /cccc/ ..
   else
 end
end


Kind regards

    robert
This topic is locked and can not be replied to.