Splitting A String

Hello,

What’s a (good) way to convert this:

‘a quick “brown fox” jumped “over the lazy” dog’

into this:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ] ?

Thanks!

Regards,
Andy S.

Andrew S. wrote:

What’s a (good) way to convert this:

‘a quick “brown fox” jumped “over the lazy” dog’

into this:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ] ?

require ‘csv’
CSV.parse_line(‘a quick “brown fox” jumped “over the lazy” dog’, ’ ')

regards
Jan

On Mar 16, 2007, at 8:47 AM, Jan F. wrote:

require ‘csv’
CSV.parse_line(‘a quick “brown fox” jumped “over the lazy” dog’, ’ ')

Wow, that’s mighty clever. I didn’t even think of trying that. Nice
job.

James Edward G. II

On 3/16/07, Andrew S. [email protected] wrote:

Thanks!

Regards,
Andy S.

Do this -
‘a quick “brown fox” jumped “over the lazy” dog’.split

Satish T.
Learning R. - http://rubylearning.com/

Hello Jan,

On 16 Mar 2007, at 13:47, Jan F. wrote:

require ‘csv’
CSV.parse_line(‘a quick “brown fox” jumped “over the lazy” dog’, ’ ')

Nice!

Thank you!
Andy S.

Satish T. wrote:

‘a quick “brown fox” jumped “over the lazy” dog’.split
This was also my first idea, but

[‘a’, ‘quick’, ‘"brown’, ‘fox"’, ‘jumped’, ‘"over’, ‘the’, ‘lazy"’,
‘dog’] != [‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’]

regards
Jan

On Mar 16, 2007, at 8:51 AM, Satish T. wrote:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ] ?

Thanks!

Regards,
Andy S.

Do this -
‘a quick “brown fox” jumped “over the lazy” dog’.split

Not quite the same. Look again. :wink:

James Edward G. II

On Mar 16, 2007, at 2:35 PM, Andrew S. wrote:

Hello,

What’s a (good) way to convert this:

‘a quick “brown fox” jumped “over the lazy” dog’

into this:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ]

Can quotes be escaped? If not there’s a simple regexp that does the job:

str = ‘a quick “brown fox” jumped “over the lazy” dog’
puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

You can also handle slashes, but it gets uglier.

– fxn

On Mar 16, 2007, at 3:14 PM, Xavier N. wrote:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ]

Can quotes be escaped? If not there’s a simple regexp that does the
job:

str = ‘a quick “brown fox” jumped “over the lazy” dog’
puts str.scan(/"([^"]*)"|(\w+)/).flatten.select {|s| s}

Heh, reading it I recalled there’s a more specific idiom for that
last select:

str = ‘a quick “brown fox” jumped “over the lazy” dog’
puts str.scan(/"((?:\.|[^"])*)"|(\w+)/).flatten.compact

– fxn

Sorry, I goofed!!

Satish

On Fri, Mar 16, 2007 at 10:35:01PM +0900, Andrew S. wrote:

Hello,

What’s a (good) way to convert this:

‘a quick “brown fox” jumped “over the lazy” dog’

into this:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ] ?

Here’s yet another way:

#!/usr/bin/env ruby
require ‘test/unit’
require ‘strscan’
class TestScan < Test::Unit::TestCase
def test_splitter
assert_equal( %w(a b c), splitter(%q{a b c}))
assert_equal([“the”, ““quick brown””, “fox”, “jumped”, “over”,
“the”, “lazy”, “dog”], splitter(%{the “quick brown” fox jumped over the
lazy dog}))
end
end

def splitter(s)
res = []
scanner = StringScanner.new(s)
scanner.skip(/\s*/)
until scanner.eos?
if scanner.scan(/"/)
# quoted string
scanner.scan(/([^"]")/)
res << ‘"’ + scanner[1]
elsif scanner.scan(/(\S+)/)
res << scanner[1]
end
scanner.skip(/\s
/)
end
res
end
END

On Mar 16, 2007, at 3:21 PM, Xavier N. wrote:

puts str.scan(/"((?:\.|[^"])*)"|(\w+)/).flatten.compact

Sorry, that regexp was part of a test and got copied by accident. I
just meant to clean up the select, that’s:

str = ‘a quick “brown fox” jumped “over the lazy” dog’
puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

– fxn

s = ‘a quick “brown fox” jumped “over the lazy” dog’
a = s.split(/"([^"]+)"/).map{|s|s.strip}

On Mar 16, 2007, at 3:53 PM, Brendan Baldwin wrote:

s = ‘a quick “brown fox” jumped “over the lazy” dog’
a = s.split(/"([^"]+)"/).map{|s|s.strip}

Preserving part of the separator is a good trick, but the split
itself is wrong:

[“a quick”, “brown fox”, “jumped”, “over the lazy”, “dog”]

– fxn

On 16.03.2007 15:27, Xavier N. wrote:

On Mar 16, 2007, at 3:21 PM, Xavier N. wrote:

puts str.scan(/"((?:\.|[^"])*)"|(\w+)/).flatten.compact

Sorry, that regexp was part of a test and got copied by accident. I just
meant to clean up the select, that’s:

str = ‘a quick “brown fox” jumped “over the lazy” dog’
puts str.scan(/"([^"]*)"|(\w+)/).flatten.compact

And solutions with #inject:

irb(main):010:0> require ‘enumerator’
=> true
irb(main):011:0> str = ‘a quick “brown fox” jumped “over the lazy” dog’
=> “a quick “brown fox” jumped “over the lazy” dog”
irb(main):012:0> str.to_enum(:scan, /"([^"])"|(\S+)/).inject([]) {|a,m|
a << m.compact!.shift}
=> [“a”, “quick”, “brown fox”, “jumped”, “over the lazy”, “dog”]
irb(main):013:0> str.to_enum(:scan, /"([^"]
)"|(\S+)/).inject([])
{|a,(m,n)| a << (m||n)}
=> [“a”, “quick”, “brown fox”, “jumped”, “over the lazy”, “dog”]

But honestly, I found Jan’s solution much more elegant. Great stuff!

Kind regards

robert

OK this was my answer, but it didn’t quite work…
I need to work harder on my regexes I guess
s = ‘a quick “brown fox” jumped “over the lazy” dog’
a = s.gsub(/("[a-z]) ([a-z ]")/i,"#{$1}#{$2}")
a.each_index{|i| a[i].gsub!(’
’,’ ')}

Not a good solution by any means, but somebody might find it
interesting. Assumes balanced quotes, no escaping, etc.

require 'enumerator'

def my_split s
  s.split('"')                   .  # Just split on the quote
    to_enum(:each_slice,2)       .  # ... and deal w/ pairs
    inject([]) {
       |a,(e,o)| a               .
                 concat(
                    e.split(' ') +  # Split unquotes
                    [o]             # Stuff in quotes is okay as is
                 )
    }                            .
    compact                         # Finnally remove nils
end

On 17 Mar 2007, at 05:53, George O. wrote:

If you’re looking for shell-quoting-like behavior, I actually think
it’s more appropriate to use shellwords for this:
[snip]
Shellwords is in the standard library.

Thanks for the pointer – I didn’t know about Shellwords.

Regards,
Andy S.

[email protected] wrote:

[email protected] (ruby-talk ML)

What’s a (good) way to convert this:

‘a quick “brown fox” jumped “over the lazy” dog’

into this:

[ ‘a’, ‘quick’, ‘brown fox’, ‘jumped’, ‘over the lazy’, ‘dog’ ]

From a newbie

‘a quick “brown fox” jumped “over the lazy” dog’.gsub(’"’,’’).split(/ /)

On 3/17/07, Andrew S. [email protected] wrote:

Thanks!
If you’re looking for shell-quoting-like behavior, I actually think
it’s more appropriate to use shellwords for this:

irb(main):001:0> require ‘shellwords’
=> true
irb(main):002:0> Shellwords.shellwords ‘a quick “brown fox” jumps
“over the lazy” dog’
=> [“a”, “quick”, “brown fox”, “jumps”, “over the lazy”, “dog”]

It will also handle sloshing:

irb(main):003:0> Shellwords.shellwords ‘a\ b c’
=> [“a b”, “c”]

And distinguish between single and double quotes (for better or for
worse):

irb(main):004:0> Shellwords.shellwords %{"a\“a”}
=> [“a"a”]
irb(main):005:0> Shellwords.shellwords %{‘a\’}
=> [“a\”]

Shellwords is in the standard library.

Regards,
George.