Forum: Ruby Splitting A String

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
72ee7b478a235f646c3495f03ebbe676?d=identicon&s=25 Andrew Stewart (Guest)
on 2007-03-16 14:35
(Received via mailing list)
Hello,

What's a (good) way to convert this:

   'a quick "brown fox" jumped "over the lazy" dog'

into this:

   [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?

Thanks!

Regards,
Andy Stewart
B32a1911d783e0a95e0b062fd5d0b64b?d=identicon&s=25 Jan Friedrich (janfri)
on 2007-03-16 14:47
Andrew Stewart wrote:
> > What's a (good) way to convert this:
>
>    'a quick "brown fox" jumped "over the lazy" dog'
>
> into this:
>
>    [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?


require 'csv'
CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')


regards
Jan
77f10b060029122b3c6ad0d5ccba9b28?d=identicon&s=25 Satish Talim (indianguru)
on 2007-03-16 14:52
(Received via mailing list)
On 3/16/07, Andrew Stewart <boss@airbladesoftware.com> wrote:
>
> Thanks!
>
> Regards,
> Andy Stewart
>

Do this -
'a quick "brown fox" jumped "over the lazy" dog'.split

Satish Talim
Learning Ruby - http://rubylearning.com/
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-03-16 14:58
(Received via mailing list)
On Mar 16, 2007, at 8:47 AM, Jan Friedrich wrote:

> require 'csv'
> CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')

Wow, that's mighty clever.  I didn't even think of trying that.  Nice
job.

James Edward Gray II
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-03-16 14:59
(Received via mailing list)
On Mar 16, 2007, at 8:51 AM, Satish Talim wrote:

>>    [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?
>>
>> Thanks!
>>
>> Regards,
>> Andy Stewart
>>
>
> Do this -
> 'a quick "brown fox" jumped "over the lazy" dog'.split

Not quite the same.  Look again.  ;)

James Edward Gray II
72ee7b478a235f646c3495f03ebbe676?d=identicon&s=25 Andrew Stewart (Guest)
on 2007-03-16 15:03
(Received via mailing list)
Hello Jan,

On 16 Mar 2007, at 13:47, Jan Friedrich wrote:
> require 'csv'
> CSV.parse_line('a quick "brown fox" jumped "over the lazy" dog', ' ')

Nice!

Thank you!
Andy Stewart
B32a1911d783e0a95e0b062fd5d0b64b?d=identicon&s=25 Jan Friedrich (janfri)
on 2007-03-16 15:04
Satish Talim wrote:
> 'a quick "brown fox" jumped "over the lazy" dog'.split
This was also my first idea, but

['a', 'quick', '"brown', 'fox"', 'jumped', '"over', 'the', 'lazy"',
'dog'] != ['a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog']

regards
Jan
77f10b060029122b3c6ad0d5ccba9b28?d=identicon&s=25 Satish Talim (indianguru)
on 2007-03-16 15:10
(Received via mailing list)
Sorry, I goofed!!

Satish
7223c62b7310e164eb79c740188abbda?d=identicon&s=25 Xavier Noria (Guest)
on 2007-03-16 15:15
(Received via mailing list)
On Mar 16, 2007, at 2:35 PM, Andrew Stewart wrote:

> Hello,
>
> What's a (good) way to convert this:
>
>   'a quick "brown fox" jumped "over the lazy" dog'
>
> into this:
>
>   [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ]

Can quotes be escaped? If not there's a simple regexp that does the job:

   str = 'a quick "brown fox" jumped "over the lazy" dog'
   puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

You can also handle slashes, but it gets uglier.

-- fxn
7223c62b7310e164eb79c740188abbda?d=identicon&s=25 Xavier Noria (Guest)
on 2007-03-16 15:22
(Received via mailing list)
On Mar 16, 2007, at 3:14 PM, Xavier Noria wrote:

>>   [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ]
>
> Can quotes be escaped? If not there's a simple regexp that does the
> job:
>
>   str = 'a quick "brown fox" jumped "over the lazy" dog'
>   puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

Heh, reading it I recalled there's a more specific idiom for that
last select:

   str = 'a quick "brown fox" jumped "over the lazy" dog'
   puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact

-- fxn
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2007-03-16 15:26
(Received via mailing list)
On Fri, Mar 16, 2007 at 10:35:01PM +0900, Andrew Stewart wrote:
> Hello,
>
> What's a (good) way to convert this:
>
>   'a quick "brown fox" jumped "over the lazy" dog'
>
> into this:
>
>   [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ] ?
>
Here's yet another way:

#!/usr/bin/env ruby
require 'test/unit'
require 'strscan'
class TestScan < Test::Unit::TestCase
  def test_splitter
    assert_equal( %w(a b c), splitter(%q{a b c}))
    assert_equal(["the", "\"quick brown\"", "fox", "jumped", "over",
"the", "lazy", "dog"], splitter(%{the "quick brown" fox jumped over the
lazy dog}))
  end
end

def splitter(s)
  res = []
  scanner = StringScanner.new(s)
  scanner.skip(/\s*/)
  until scanner.eos?
    if scanner.scan(/"/)
      # quoted string
      scanner.scan(/([^"]*")/)
      res << '"' + scanner[1]
    elsif scanner.scan(/(\S+)/)
      res << scanner[1]
    end
    scanner.skip(/\s*/)
  end
  res
end
__END__
7223c62b7310e164eb79c740188abbda?d=identicon&s=25 Xavier Noria (Guest)
on 2007-03-16 15:27
(Received via mailing list)
On Mar 16, 2007, at 3:21 PM, Xavier Noria wrote:

>   puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact

Sorry, that regexp was part of a test and got copied by accident. I
just meant to clean up the select, that's:

   str = 'a quick "brown fox" jumped "over the lazy" dog'
   puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

-- fxn
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-03-16 15:40
(Received via mailing list)
On 16.03.2007 15:27, Xavier Noria wrote:
> On Mar 16, 2007, at 3:21 PM, Xavier Noria wrote:
>
>>   puts str.scan(/"((?:\\.|[^"])*)"|(\w+)/).flatten.compact
>
> Sorry, that regexp was part of a test and got copied by accident. I just
> meant to clean up the select, that's:
>
>   str = 'a quick "brown fox" jumped "over the lazy" dog'
>   puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

And solutions with #inject:

irb(main):010:0> require 'enumerator'
=> true
irb(main):011:0> str = 'a quick "brown fox" jumped "over the lazy" dog'
=> "a quick \"brown fox\" jumped \"over the lazy\" dog"
irb(main):012:0> str.to_enum(:scan, /"([^"]*)"|(\S+)/).inject([]) {|a,m|
a << m.compact!.shift}
=> ["a", "quick", "brown fox", "jumped", "over the lazy", "dog"]
irb(main):013:0> str.to_enum(:scan, /"([^"]*)"|(\S+)/).inject([])
{|a,(m,n)| a << (m||n)}
=> ["a", "quick", "brown fox", "jumped", "over the lazy", "dog"]

But honestly, I found Jan's solution much more elegant.  Great stuff!

Kind regards

  robert
944973814a5fb4df6912106ff16074a0?d=identicon&s=25 Brendan Baldwin (Guest)
on 2007-03-16 15:54
(Received via mailing list)
s = 'a quick "brown fox" jumped "over the lazy" dog'
a = s.split(/"([^"]+)"/).map{|s|s.strip}
7223c62b7310e164eb79c740188abbda?d=identicon&s=25 Xavier Noria (Guest)
on 2007-03-16 16:06
(Received via mailing list)
On Mar 16, 2007, at 3:53 PM, Brendan Baldwin wrote:

> s = 'a quick "brown fox" jumped "over the lazy" dog'
> a = s.split(/"([^"]+)"/).map{|s|s.strip}

Preserving part of the separator is a good trick, but the split
itself is wrong:

   ["a quick", "brown fox", "jumped", "over the lazy", "dog"]

-- fxn
9dec3df8319c613f6f4f14a27da0fdb4?d=identicon&s=25 Kyle Schmitt (Guest)
on 2007-03-16 16:37
(Received via mailing list)
OK this was my answer, but it didn't quite work...
I need to work harder on my regexes I guess
s = 'a quick "brown fox" jumped "over the lazy" dog'
a = s.gsub(/("[a-z]*) ([a-z ]*")/i,"#{$1}_#{$2}")
a.each_index{|i| a[i].gsub!('_',' ')}
51a34236538906ab994cf9f2e533d14d?d=identicon&s=25 Lou Scoras (ljscoras)
on 2007-03-17 02:17
(Received via mailing list)
Not a good solution by any means, but somebody might find it
interesting.  Assumes balanced quotes, no escaping, etc.

    require 'enumerator'

    def my_split s
      s.split('"')                   .  # Just split on the quote
        to_enum(:each_slice,2)       .  # ... and deal w/ pairs
        inject([]) {
           |a,(e,o)| a               .
                     concat(
                        e.split(' ') +  # Split unquotes
                        [o]             # Stuff in quotes is okay as is
                     )
        }                            .
        compact                         # Finnally remove nils
    end
C515daf003a781a638d8a01e41a935a0?d=identicon&s=25 George Ogata (Guest)
on 2007-03-17 06:54
(Received via mailing list)
On 3/17/07, Andrew Stewart <boss@airbladesoftware.com> wrote:
> Thanks!
If you're looking for shell-quoting-like behavior, I actually think
it's more appropriate to use shellwords for this:

irb(main):001:0> require 'shellwords'
=> true
irb(main):002:0> Shellwords.shellwords 'a quick "brown fox" jumps
"over the lazy" dog'
=> ["a", "quick", "brown fox", "jumps", "over the lazy", "dog"]

It will also handle sloshing:

irb(main):003:0> Shellwords.shellwords 'a\ b c'
=> ["a b", "c"]

And distinguish between single and double quotes (for better or for
worse):

irb(main):004:0> Shellwords.shellwords %{"a\\"a"}
=> ["a\"a"]
irb(main):005:0> Shellwords.shellwords %{'a\\'}
=> ["a\\"]

Shellwords is in the standard library.

Regards,
George.
72ee7b478a235f646c3495f03ebbe676?d=identicon&s=25 Andrew Stewart (Guest)
on 2007-03-19 11:39
(Received via mailing list)
On 17 Mar 2007, at 05:53, George Ogata wrote:
> If you're looking for shell-quoting-like behavior, I actually think
> it's more appropriate to use shellwords for this:
[snip]
> Shellwords is in the standard library.

Thanks for the pointer -- I didn't know about Shellwords.

Regards,
Andy Stewart
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-09-25 23:02
(Received via mailing list)
On Sun, Mar 18, 2007 at 01:34:07PM +0900, Bernard Kenik wrote:
> >To:
> >>>Hello,
>
> 'a quick "brown fox" jumped "over the lazy" dog'.gsub('"','').split(/ /)

However that splits into individual words. If you look carefully at the
example, the quoted strings "brown fox" and "over the lazy" need to end
up
in a *single* array element in the result.

Regards,

Brian.
Feee221f9eb7818d90625ea141bfd60c?d=identicon&s=25 Bernard Kenik (bbiker)
on 2007-09-25 23:04
(Received via mailing list)
ruby-talk-admin@ruby-lang.org wrote:
> ruby-talk@ruby-lang.org (ruby-talk ML)
>>>
>>> What's a (good) way to convert this:
>>>
>>>   'a quick "brown fox" jumped "over the lazy" dog'
>>>
>>> into this:
>>>
>>>   [ 'a', 'quick', 'brown fox', 'jumped', 'over the lazy', 'dog' ]
>>
 From a newbie

'a quick "brown fox" jumped "over the lazy" dog'.gsub('"','').split(/ /)
This topic is locked and can not be replied to.