Forum: Ruby on Rails Regex for splitting string

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Yash (Guest)
on 2006-04-05 16:19
Hi

We have a search website where the user can type in individual words
separated by spaces and/or phrases enclosed in single or double quotes.
We are looking for a way to obtain a list of words and phrases from the
search string.
Can someone help?

Thanks,
Yash
Alex Y. (Guest)
on 2006-04-05 17:20
(Received via mailing list)
Yash wrote:
> Hi
>
> We have a search website where the user can type in individual words
> separated by spaces and/or phrases enclosed in single or double quotes.
> We are looking for a way to obtain a list of words and phrases from the
> search string.
> Can someone help?

   string.scan(/\w+/)
to give an array of words, or
   string.split(/\W+/)
to split on non-words.  You can string.gsub(/["']/, '') if you want to
get rid of quotes.
Yash (Guest)
on 2006-04-05 18:46
If the input string is:
Java Ruby 'Ruby on rails' "software development" "technology"

The list of words should be:
Java
Ruby
Ruby on rails
software development
technology

With your approach, the result will be:
Java
Ruby
Ruby
on
rails
software
development
technology

Alex Y. wrote:
> Yash wrote:
>> Hi
>>
>> We have a search website where the user can type in individual words
>> separated by spaces and/or phrases enclosed in single or double quotes.
>> We are looking for a way to obtain a list of words and phrases from the
>> search string.
>> Can someone help?
>
>    string.scan(/\w+/)
> to give an array of words, or
>    string.split(/\W+/)
> to split on non-words.  You can string.gsub(/["']/, '') if you want to
> get rid of quotes.
Richard L. (Guest)
on 2006-04-05 19:41
(Received via mailing list)
Yash wrote:

 >>> We have a search website where the user can type in individual
words
 >>> separated by spaces and/or phrases enclosed in single or double
quotes.
 >>> We are looking for a way to obtain a list of words and phrases from
the
 >>> search string.

> If the input string is:
> Java Ruby 'Ruby on rails' "software development" "technology"
>
> The list of words should be:
> Java
> Ruby
> Ruby on rails
> software development
> technology

 >> example = 'some text and \'some inside\' test "double quotes"'

Using the CSV module:

 >> require 'csv'
 >> CSV::parse_line(example, ' ')
=> ["some", "text", "and", "'some", "inside'", "test", "double quotes"]

Fairly elegant, but doesn't handle single quotes like you want

or

 >> example.split( / *["'](.*?)["'] *| / )
=> ["some", "text", "and", "some inside", "test", "double quotes"]

Which seems to be more like you want.

Hope that helps.
Hank M. (Guest)
on 2006-04-05 19:41
(Received via mailing list)
I'm not enough of a regex guru to do it in one, so I'd probably do it in
two:

/"([^"]+)"|('[^']+)'/  to grab the quotes ... replace occurences with ''
in
original string ...

Then use the \w|\W from below to get individual tokens ... at sompoint
cleaning the original string of anything you constitute garbage.
Hank M. (Guest)
on 2006-04-05 19:44
(Received via mailing list)
Oops, I seem to be capturing the opening single quote ... should move
that
...

/"([^"]+)"|'([^']+)'/
James L. (Guest)
on 2006-04-05 20:21
(Received via mailing list)
On 4/5/06, Yash <removed_email_address@domain.invalid> wrote:
> If the input string is:
> Java Ruby 'Ruby on rails' "software development" "technology"
>
> The list of words should be:
> Java
> Ruby
> Ruby on rails
> software development
> technology

irb(main):086:0> s = 'Java Ruby \'Ruby on rails\' "software
development" "technology"'
=> "Java Ruby 'Ruby on rails' \"software development\" \"technology\""

irb(main):087:0> a =
s.split(/(".*")|('.*')|((?=[^"'])\w+(?=[^"']))/).find_all {|s|
s.match(/\w+/)}
=> ["Java", "Ruby", "'Ruby on rails'", "\"software development\"
\"technology\""]

If you don't mind having some array elements that are all whitespace
you can drop the "find_all" part.

-- James
James L. (Guest)
on 2006-04-05 20:24
(Received via mailing list)
On 4/5/06, James L. <removed_email_address@domain.invalid> wrote:
>
> If you don't mind having some array elements that are all whitespace
> you can drop the "find_all" part.

Of course, the second I hit "send" I realized that I pasted the wrong
regex.

a = s.split(/(".*?")|('.*?')|((?=[^"'])\w+(?=[^"']))/).find_all {|x|
x.match(/\w+/)}
=> ["Java", "Ruby", "'Ruby on rails'", "\"software development\"",
"\"technology
\""]

-- James
This topic is locked and can not be replied to.