Forum: Ruby #split vs. #length. Different returns.

Posted by Tom Stut (tomst)
on 2013-02-08 07:48
I am wondering why these two lines of code at the bottom, which seem to
say the same thing, produce different results.

text is simply a long string.

words = text.scan(/\w+/)

stop_words = %w{the a by on for of are with just but and to the my I has
some in}
key_words = text.split{/\w../}.select{|word| !stop_words.include?(word)}

# This line of code results in a higher percentrage of key words to stop
words 76.58%
key_words_to_stop_words = ((key_words.length.to_f /
text.split{/\w../}.count.to_f) * 100)
# This line has been rendered as a comment, but produces 75.13% when run
through ruby
# key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)
* 100)

puts "#{key_words_to_stop_words} % of key words."
Posted by "Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> (Guest)
on 2013-02-08 09:57
(Received via mailing list)
On Fri, Feb 8, 2013 at 7:48 AM, Tom Stut <lists@ruby-forum.com> wrote:
>
> # This line of code results in a higher percentrage of key words to stop
> words 76.58%
> key_words_to_stop_words = ((key_words.length.to_f /
> text.split{/\w../}.count.to_f) * 100)
> # This line has been rendered as a comment, but produces 75.13% when run
> through ruby
> # key_words_to_stop_words = ((key_words.length.to_f/ words.length.to_f)
> * 100)
>
> puts "#{key_words_to_stop_words} % of key words."

String#split doesn't receive a block to specify where to split. So

text.split {/\w../} is the same as text.split, which splits the text
by whitespace.

1.9.2p290 :008 > text = "one, two.three four five"
 => "one, two.three four five"
1.9.2p290 :009 > text.scan(/\w+/)
 => ["one", "two", "three", "four", "five"]
1.9.2p290 :010 > text.split
 => ["one,", "two.three", "four", "five"]


Jesus.
Posted by tamouse mailing lists (Guest)
on 2013-02-10 03:37
(Received via mailing list)
This:
> words = text.scan(/\w+/)

"Now is the winter of our discontent".scan(/\w+/)
 => ["Now", "is", "the", "winter", "of", "our", "discontent"]


is not the same as this:
> text.split(/\w../)

"Now is the winter of our discontent".split(/\w../)
 => ["", " ", "", " ", "", " ", "", " ", "", "", "t"]
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.