New at regexp and Ruby need help on parsing a string

kadabra · November 23, 2007, 9:27am

I’m building a little test console for a ruby project. When using a
function I might get something like this:

input_string =“and stuff and nice things not bad girls not greasy boys
and girlsandboys”

As you already have guessed, I want the following in some kind of
format:

smoking_table = {“and” => [“stuff”, “nice things”, “girlsandboys”],
“not” => [“bad girls”,“greasy boys”]}

Thus, a regexp that splits a string on code words like “and” and “not”
is what I need.

Please help me

kadabra · November 23, 2007, 11:34am

Gabra Kadabra wrote:

I’m building a little test console for a ruby project. When using a
function I might get something like this:

input_string =“and stuff and nice things not bad girls not greasy boys
and girlsandboys”

As you already have guessed, I want the following in some kind of
format:

smoking_table = {“and” => [“stuff”, “nice things”, “girlsandboys”],
“not” => [“bad girls”,“greasy boys”]}

Thus, a regexp that splits a string on code words like “and” and “not”
is what I need.

Please help me

Try this:

str = ‘and stuff and nice things not bad girls not greasy boys
and girlsandboys’

smoking_table = {‘and’=>[], ‘not’=>[]}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

case pieces[index]
when 'and ’
smoking_table[‘and’] << pieces[index+1].strip
index +=2
when 'not ’
smoking_table[‘not’] << pieces[index+1].strip
index += 2
else
index += 1
end

end

p smoking_table

kadabra · November 23, 2007, 11:47am

Gabra Kadabra wrote:

I’m building a little test console for a ruby project. When using a
function I might get something like this:

input_string =“and stuff and nice things not bad girls not greasy boys
and girlsandboys”

As you already have guessed, I want the following in some kind of
format:

smoking_table = {“and” => [“stuff”, “nice things”, “girlsandboys”],
“not” => [“bad girls”,“greasy boys”]}

Thus, a regexp that splits a string on code words like “and” and “not”
is what I need.
Please help me

One possible implementation is:

smoking_table = { :and => [], :not => [] }

=> {:and => [“stuff”, “nice things”, “girlsandboys”],
:not => [“bad girls”, “greasy boys”]}

I hope that this works for you,

Raul

kadabra · November 23, 2007, 11:50am

7stud – wrote:

Try this:

str = ‘and stuff and nice things not bad girls not greasy boys
and girlsandboys’

smoking_table = {‘and’=>[], ‘not’=>[]}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

case pieces[index]
when 'and ’
smoking_table[‘and’] << pieces[index+1].strip
index +=2
when 'not ’
smoking_table[‘not’] << pieces[index+1].strip
index += 2
else
index += 1
end

end

p smoking_table

Normally when you split() a string, you do something like this:

str = ‘aXbXc’
pieces = str.split(‘X’)
p pieces
–>[“a”, “b”, “c”]

Notice that the pattern you use to split the string is not part of the
results-it’s chopped out of the string and the pieces are what’s left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:

["", "and ", "stuff ", "and ", "nice things ", "not ", "bad girls ",
"not ", “greasy boys\n”, "and ", “girlsandboys”]

By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ’ or 'not '. The 'and ’
or 'not ’ then serves as an identifier for each piece of the string.

kadabra · November 23, 2007, 2:10pm

Raul P. wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.

kadabra · November 23, 2007, 4:29pm

Raul,
Interesting solution. One question, how did you print the output? I’m
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I’m assuming that’s not the correct way to
do it.
Thanks,
PV

Raul P. wrote:

Gabra Kadabra wrote:

I’m building a little test console for a ruby project. When using a
function I might get something like this:

input_string =“and stuff and nice things not bad girls not greasy boys
and girlsandboys”

As you already have guessed, I want the following in some kind of
format:

smoking_table = {“and” => [“stuff”, “nice things”, “girlsandboys”],
“not” => [“bad girls”,“greasy boys”]}

Thus, a regexp that splits a string on code words like “and” and “not”
is what I need.
Please help me

One possible implementation is:

smoking_table = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

=> {:and => [“stuff”, “nice things”, “girlsandboys”],
:not => [“bad girls”, “greasy boys”]}

I hope that this works for you,

Raul

kadabra · November 23, 2007, 7:09pm

Gabra Kadabra wrote:

Raul P. wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Don’t be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way–yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex’s sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.

In addition, I find one liners hard to decipher, and since I don’t
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.

Peter V. wrote:

I used puts smoking_table. I’m assuming that’s not the correct
way to do it.

Use the p command instead of puts to get the nice dictionary format.

kadabra · November 23, 2007, 7:23pm

Gabra Kadabra wrote:

Raul P. wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.

I totally agree with you; this is not a subject that you learn ‘just
trying’ or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your
friends.

By the way, the code above did not deal with ‘notorious bad girls’ (I
mean words beginning with ‘not’); I had only checked for an absence of
prefix, not of suffix. So, here it is (the ‘\b’ before and after a word
makes sure that it is indeed a ‘word’):

str = “and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys”

h = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip) }
end

p h # => {:and=>[“stuff”, “nice things”, “girslsandboys”],
# :not=>[“notorious bad girls”, “greasy boys”]}

Peter V. wrote

Interesting solution. One question, how did you print the output? I’m
a newbie and the output I got when I tried your solution came out …

By default, the puts/print methods for hashes concatenate keys and
values; you can use ‘p’ (or ‘puts inspect’) to see the hash. If you are
in irb, just writing the name of the hash will show it to you.

Regards
Raul

kadabra · November 23, 2007, 7:36pm

When I typed the final solution, an unwanted ‘}’ got in. I post again
the code:

Regards
Raul

kadabra · November 24, 2007, 10:31pm

On Nov 23, 10:29 am, Peter V. [email protected]
wrote:

and girlsandboys"

I hope that this works for you,

Raul

–
Posted viahttp://www.ruby-forum.com/.- Hide quoted text -

Show quoted text -

p smoking_table

(Same as stud’s example).

HTH,
Richard