I want to split a string into words, but group quoted words together
such that…
some words “some quoted text” some more words
would get split up into:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
So far I’m drawing a blank on the ‘Ruby way’ to do this and the only
solutions I can think of are turning out to be fairly ugly.
Any advice would be great. Thanks in advance.
On 2006.01.07 09:08, Richard L. wrote:
solutions I can think of are turning out to be fairly ugly.
Any advice would be great. Thanks in advance.
Naively, you can try something like this:
s = ‘foo bar “baz quux” roo’
s.scan(/(?:"")|(?:"(.*[^\])")|(\w+)/).flatten.compact
Elaborate as necessary (add support for single quotes or something).
R.Livsey
E
Richard L. [email protected] writes:
I want to split a string into words, but group quoted words together
such that…
some words “some quoted text” some more words
would get split up into:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
How about the csv module? Despite the name, you don’t have to use
commas.
require ‘csv’
CSV::parse_line(‘some words “some quoted text” some more words’, ’ ')
I hope this helps,
Tim
On Jan 6, 2006, at 6:08 PM, Richard L. wrote:
only solutions I can think of are turning out to be fairly ugly.
Any advice would be great. Thanks in advance.
I agree that CSV is the way to go, but here’s a direct attempt:
example = %Q{some words “some quoted text” some more words}
=> “some words “some quoted text” some more words”
example.scan(/\s+|\w+|"[^"]*"/).
?> reject { |token| token =~ /^\s+$/ }.
?> map { |token| token.sub(/^"/, “”).sub(/"$/, “”) }
=> [“some”, “words”, “some quoted text”, “some”, “more”, “words”]
Hope that gives you some fresh ideas.
James Edward G. II
some words “some quoted text” some more words
would get split up into:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
s = 'some words “some quoted text” some more words
sa = s.split(/"/).collect { |x| x.strip }
(0…sa.size).to_a.zip(sa).collect { |i,x| (i&1).zero? ? x.split : x
}.flatten
(0…sa.size).to_a.zip(sa).collect { |i,x| (i&1).zero? ? x.split : x }.flatten
Just realized that Range responds to zip, so the to_a is unnecessary.
This looks slightly cleaner to me:
(1…sa.size).zip(sa).collect { |i,x| (i&1).zero? ? x : x.split }.flatten
On Jan 7, 2006, at 1:08, Richard L. wrote:
I want to split a string into words, but group quoted words
together such that…
some words “some quoted text” some more words
would get split up into:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
Curiously, someone asked exactly that on freenode#perl tonight.
If the input is that simple and is assumed to be well-formed this is
enough:
irb(main):005:0> %q{some words “some quoted text” some “” more
words}.scan(/"[^"]*"|\S+/)
=> [“some”, “words”, ““some quoted text””, “some”, “”"", “more”,
“words”]
Since nothing was said about this, it does not handle escaped quotes,
and it assumes quotes are always balanced, so a field cannot be %q
{"foo}, for example.
– fxn
example = %Q{some words “some quoted text” some more words}
=> “some words "some quoted text" some more words”
example.scan(/\s+|\w+|“[^”]*“/).
?> reject { |token| token =~ /^\s+$/ }.
?> map { |token| token.sub(/^”/, “”).sub(/"$/, “”) }
=> [“some”, “words”, “some quoted text”, “some”, “more”, “words”]
impressive
So long
Michael ‘entropie’ Trommer; http://ackro.org
ruby -e “0.upto((a=‘njduspAhnbjm/dpn’).size-1){|x| a[x]-=1}; p
‘mailto:’+a”
Hi –
On Sat, 7 Jan 2006, James Edward G. II wrote:
So far I’m drawing a blank on the ‘Ruby way’ to do this and the only
solutions I can think of are turning out to be fairly ugly.
Any advice would be great. Thanks in advance.
I agree that CSV is the way to go, but here’s a direct attempt:
Me too (end of disclaimer
example = %Q{some words “some quoted text” some more words}
=> “some words "some quoted text" some more words”
example.scan(/\s+|\w+|“[^”]*“/).
?> reject { |token| token =~ /^\s+$/ }.
?> map { |token| token.sub(/^”/, “”).sub(/"$/, “”) }
=> [“some”, “words”, “some quoted text”, “some”, “more”, “words”]
I think you could do less work:
example.scan(/“[^”]+“|\S+/).map { |word| word.delete('”') }
(Or am I overlooking some reason you’d want to capture sequences of
spaces?)
I changed the \w+ to \S+ (and moved it after the | to avoid having it
sponge up too much) in case the words included non-\w characters.
I guess with zero-width positive lookbehind/ahead one could do it
without the map operation.
David
–
David A. Black
[email protected]
“Ruby for Rails”, from Manning Publications, coming April 2006!
NEWER EDITION AVAILABLE
The Well-Grounded Rubyist, Second Edition is now available. An eBook of the previous edition, The Well-Grounded Rubyist is included at no additional cost when you buy the revised edition!
Ruby for Rails helps Rails...
On Sat, 7 Jan 2006, Tim Heaney wrote:
How about the csv module? Despite the name, you don’t have to use
commas.
require ‘csv’
CSV::parse_line(‘some words “some quoted text” some more words’, ’ ')
I hope this helps,
briliant!
-a
On Jan 6, 2006, at 8:33 PM, [email protected] wrote:
(Or am I overlooking some reason you’d want to capture sequences of
spaces?)
I changed the \w+ to \S+ (and moved it after the | to avoid having it
sponge up too much) in case the words included non-\w characters.
You’re right, that’s better all around.
I guess with zero-width positive lookbehind/ahead one could do it
without the map operation.
You can drop the map(), if you’re willing to replace it with two
other calls:
example = %Q{some words “some quoted text” some more words}
=> “some words “some quoted text” some more words”
example.scan(/"([^"]+)"|(\S+)/).flatten.compact
=> [“some”, “words”, “some quoted text”, “some”, “more”, “words”]
James Edward G. II
On Mon, 2006-01-09 at 18:13 +0900, William J. wrote:
s = ‘some words “some quoted text” some more words’
p s.split( / "(. ?)" *| / )
Which along with the CSV solution can’t handle complex cases:
s=‘one two" "‘with quotes’ "three "’
s.split( / "(. ?)" *| / )
=> [“one”, “two”, " ", “'with”, “quotes’”, "three "]
require ‘csv’
CSV::parse_line(s)
=> []
but Shellwords can:
require ‘shellwords’
Shellwords.shellwords(s)
=> [“one”, “two with quotes”, "three "]
Richard L. wrote:
I want to split a string into words, but group quoted words together
such that…
some words “some quoted text” some more words
would get split up into:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
s = ‘some words “some quoted text” some more words’
p s.split( / "(. ?)" *| / )
Geoff Jacobsen wrote:
require ‘csv’
CSV::parse_line(s)
=> []
but Shellwords can:
require ‘shellwords’
Shellwords.shellwords(s)
=> [“one”, “two with quotes”, "three "]
Another option is to use scan instead of split:
‘some words “some quoted text” some more words’.scan
%r{"(?:(?:[^"]|\.)*)"|\S+}
=> [“some”, “words”, ““some quoted text””, “some”, “more”, “words”]
With some additional effort even the quotes can be removed (using
grouping
for example).
r=[];‘some words “some quoted text” some more
words’.scan(%r{"((?:[^"]|\.)*)"|(\S+)}) {|m| r << m.detect {|x|x}};r
=> [“some”, “words”, “some quoted text”, “some”, “more”, “words”]
Kind regards
robert
On Tue, 2006-01-10 at 04:23 +0900, William J. wrote:
[“some”, “words”, “some quoted text”, “some”, “more”, “words”]
s = ‘some words “some quoted text” some more words’
p s.split( / "(. ?)" *| / )
Which along with the CSV solution can’t handle complex cases:
s=‘one two" "‘with quotes’ "three "’
s.split( / "(. ?)" *| / )
=> [“one”, “two”, " ", “'with”, “quotes’”, "three "]
…
The shellwords “solution” is a solution to a different problem, not
to this one. It can’t even handle a simple case:
require ‘shellwords’
s = “why can’t you think?”
Shellwords.shellwords(s)
ArgumentError: Unmatched single quote: 't you think?
I agree my example doesn’t match the originators request but I think
there is enough ambiguity about the post to postulate that they may want
more real-world cases such as:
s=‘symbol “William said: “why can’t you think?”” 123 “foo”’
Shellwords.shellwords(s)
=> [“symbol”, “William said: “why can’t you think?””, “123”,
“foo”]
So Shellwords may indeed be a solution to this problem but the problem
is not stated precisely enough to know.
Geoff Jacobsen wrote:
require ‘csv’
CSV::parse_line(s)
=> []
but Shellwords can:
require ‘shellwords’
Shellwords.shellwords(s)
=> [“one”, “two with quotes”, "three "]
This is not a “more complex case”; it is an invalid case.
The original poster simply wanted to avoid splitting on spaces
within double quotes, not within single quotes.
The shellwords “solution” is a solution to a different problem, not
to this one. It can’t even handle a simple case:
require ‘shellwords’
s = “why can’t you think?”
Shellwords.shellwords(s)
ArgumentError: Unmatched single quote: 't you think?