#split vs. #length. Different returns

tomx · February 8, 2013, 7:48am

I am wondering why these two lines of code at the bottom, which seem to
say the same thing, produce different results.

text is simply a long string.

words = text.scan(/\w+/)

stop_words = %w{the a by on for of are with just but and to the my I has
some in}
key_words = text.split{/\w…/}.select{|word| !stop_words.include?(word)}

This line of code results in a higher percentrage of key words to stop

words 76.58%
key_words_to_stop_words = ((key_words.length.to_f /
text.split{/\w…/}.count.to_f) * 100)

This line has been rendered as a comment, but produces 75.13% when run

through ruby

key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)

puts “#{key_words_to_stop_words} % of key words.”

tomx · February 8, 2013, 9:57am

On Fri, Feb 8, 2013 at 7:48 AM, Tom S. [email protected] wrote:

This line of code results in a higher percentrage of key words to stop

words 76.58%
key_words_to_stop_words = ((key_words.length.to_f /
text.split{/\w…/}.count.to_f) * 100)

This line has been rendered as a comment, but produces 75.13% when run

through ruby

key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)

puts “#{key_words_to_stop_words} % of key words.”

String#split doesn’t receive a block to specify where to split. So

text.split {/\w…/} is the same as text.split, which splits the text
by whitespace.

1.9.2p290 :008 > text = “one, two.three four five”
=> “one, two.three four five”
1.9.2p290 :009 > text.scan(/\w+/)
=> [“one”, “two”, “three”, “four”, “five”]
1.9.2p290 :010 > text.split
=> [“one,”, “two.three”, “four”, “five”]

Jesus.

tomx · February 10, 2013, 3:37am

This:

words = text.scan(/\w+/)

“Now is the winter of our discontent”.scan(/\w+/)
=> [“Now”, “is”, “the”, “winter”, “of”, “our”, “discontent”]

is not the same as this:

text.split(/\w…/)

“Now is the winter of our discontent”.split(/\w…/)
=> ["", " ", “”, " ", “”, " ", “”, " ", “”, “”, “t”]