Thus, a regexp that splits a string on code words like “and” and “not”
is what I need.
Please help me
Try this:
str = ‘and stuff and nice things not bad girls not greasy boys
and girlsandboys’
smoking_table = {‘and’=>[], ‘not’=>[]}
pieces = str.split(/(and |not )/)
len = pieces.length
index = 0
while index < len
case pieces[index]
when 'and ’
smoking_table[‘and’] << pieces[index+1].strip
index +=2
when 'not ’
smoking_table[‘not’] << pieces[index+1].strip
index += 2
else
index += 1
end
str = ‘and stuff and nice things not bad girls not greasy boys
and girlsandboys’
smoking_table = {‘and’=>[], ‘not’=>[]}
pieces = str.split(/(and |not )/)
len = pieces.length
index = 0
while index < len
case pieces[index]
when 'and ’
smoking_table[‘and’] << pieces[index+1].strip
index +=2
when 'not ’
smoking_table[‘not’] << pieces[index+1].strip
index += 2
else
index += 1
end
end
p smoking_table
Normally when you split() a string, you do something like this:
Notice that the pattern you use to split the string is not part of the
results-it’s chopped out of the string and the pieces are what’s left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:
By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ’ or 'not '. The 'and ’
or 'not ’ then serves as an identifier for each piece of the string.
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end
I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.
Don’t be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way–yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex’s sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.
In addition, I find one liners hard to decipher, and since I don’t
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.
Peter V. wrote:
I used puts smoking_table. I’m assuming that’s not the correct
way to do it.
Use the p command instead of puts to get the nice dictionary format.
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end
I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.
Thanks.
I totally agree with you; this is not a subject that you learn ‘just
trying’ or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your
friends.
By the way, the code above did not deal with ‘notorious bad girls’ (I
mean words beginning with ‘not’); I had only checked for an absence of
prefix, not of suffix. So, here it is (the ‘\b’ before and after a word
makes sure that it is indeed a ‘word’):
str = “and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys”
h = { :and => [], :not => [] }
str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip) }
end
p h # => {:and=>[“stuff”, “nice things”, “girslsandboys”],
# :not=>[“notorious bad girls”, “greasy boys”]}
Peter V. wrote
Interesting solution. One question, how did you print the output? I’m
a newbie and the output I got when I tried your solution came out …
By default, the puts/print methods for hashes concatenate keys and
values; you can use ‘p’ (or ‘puts inspect’) to see the hash. If you are
in irb, just writing the name of the hash will show it to you.