I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.
def count_words(string)
res = Hash.new(0)
string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
return res
end
on 2013-01-28 18:39
on 2013-01-28 18:46
For those more versed than myself, I have a follow on question (thank
for
posting this Jooma).
In this example can't you get rid of return res?
Wayne
----- Original Message ----
From: jooma lavata <lists@ruby-forum.com>
To: ruby-talk ML <ruby-talk@ruby-lang.org>
Sent: Mon, January 28, 2013 11:40:24 AM
Subject: Please explain in English
I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.
def count_words(string)
res = Hash.new(0)
string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
return res
end
on 2013-01-28 18:55
On Mon, Jan 28, 2013 at 6:39 PM, jooma lavata <lists@ruby-forum.com>
wrote:
> end
That's not a very idiomatic way, because the result of the map
function, which returns an array, is ignored. This signals that map is
not the correct method to use. Now, with that said:
string.downcase #=> returns a new string with all the characters
downcased
.scan(/\w+/) #=> return an array of strings with each match of the
regular expression. \w+ means: one or more word characters, so this
should return an array of words.
.map #=> returns a new array where each position is filled with the
result of invoking the block with each element of the array. Example:
[1,2,3].map {|x| "x is #{x}"} #=> ["x is 1", "x is 2", "x is 3"]
res[word] = string.downcase.scan(\b#{word}\b/).size
What this means is, take the string, downcase it again, scan it for
the current word surrounded by word boundaries (so, whole word), take
the size of that array and place it in the hash under the key for this
word.
This is extremely inefficient, since, first of all, for each word it's
downcasing the string again, and then scanning for each word through
the full string again (which you are already doing). So this seems to
be O(N^2), where a single pass through the string should suffice.
Also, the block-less form of scan and using map like that is creating
many intermediate objects that are not used.
I'd do something like:
res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res
This uses the block form of scan, which instead of building an array,
just yields each match to the block. Since we are not doing anything
with that array, this is more efficient. We take advantage of the
default value of hash, which is set to 0, to just increment the count
for each word.
Hope this helps,
Jesus.
on 2013-01-28 18:57
Regex is critical to this one. \w is word boundary. Scan returns everything that matches that regex with a boolean true. Down case isn't necessary. The word count would be the same either way. Now if you just want to count words you don't even need that hash. If you're trying to count instances of words that's a different story. Suggested reading: Enumerables, Blocks, Scan, Inject, and Reduce. Enumerable covers most of those. Read the Ruby docs. Seeing as I'm on my phone at the moment, could someone else rewrote that code a bit? It'd look all types of funky if I did right now. Cheers.
on 2013-01-28 18:58
nevermind... Now I see what's going on. (just had to run it in irb and look at the results with and without the return res). ----- Original Message ---- From: Wayne Brisette <wbrisett@att.net> To: ruby-talk ML <ruby-talk@ruby-lang.org> Sent: Mon, January 28, 2013 11:45:45 AM Subject: Re: Please explain in English For those more versed than myself, I have a follow on question (thank for posting this Jooma). In this example can't you get rid of return res? Wayne
on 2013-01-28 18:59
On Mon, Jan 28, 2013 at 6:45 PM, Wayne Brisette <wbrisett@att.net> wrote: > For those more versed than myself, I have a follow on question (thank for > posting this Jooma). > > In this example can't you get rid of return res? You could, using inject, but some people might say this is less readable, and also creates some intermediate object that is not really needed: string.downcase.scan(/\w+/).inject(Hash.new(0)) {|h, word| h[word] += 1; h} Jesus.
on 2013-01-28 19:00
I'll try to break it down, let us know if there's anything further that
needs clarifying.
#Declare a method with one argument
def count_words(string)
#Create an empty Hash (aka Dictionary) to modify it later
res = Hash.new(0)
#Convert the whole string to lowercase (returns a new object, doesn't
modify in place)
string.downcase
#Use Regex to return each word ( "+" means until a non-word character)
as an enumerator
.scan(/\w+/)
#Iterate through each of the words and return (map) a new object (which
isn't used in this case)
.map{|word|
#Populate the hash on each iteration (overwriting existing values)
res[word] =
#Get the "size" of the array returned by searching the string for all
instance of the current word
string.downcase.scan(/\b#{word}\b/).size}
#Explicitly return the hash ("return" isn't strictly required as this is
the last line)
return res
end
I can't helping feeling that there is a more efficient way to do this,
given that the loop iterates needlessly multiple times over the
duplicates.
This does the same thing (not sure whether it's faster):
def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
string.scan(/\b#{word}\b/).size }
res
end
on 2013-01-28 19:16
"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#1094106:
> string.downcase.scan(/\w+/) {|word| res[word] += 1}
I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.
on 2013-01-28 21:15
On Mon, Jan 28, 2013 at 7:16 PM, Joel Pearson <lists@ruby-forum.com> wrote: > "Jess Gabriel y Galn" <jgabrielygalan@gmail.com> wrote in post > #1094106: >> string.downcase.scan(/\w+/) {|word| res[word] += 1} > > I tried benchmarking out of curiosity and that is a lot faster! Nicely > done. I guess the reason is that you avoid the intermediate arrays. Jesus.
on 2013-01-29 10:25
On Jan 28, 2013, at 10:01 , Joel Pearson <lists@ruby-forum.com> wrote: > def count_words(string) > res = {} > string.downcase! > string.scan( /\w+/ ).uniq.each{ |word| res[word] = > string.scan(/\b#{word}\b/).size } > res > end This modifies the argument coming in. Don't ever call downcase! or other mutating methods on an argument or you'll wind up in debugging hell. Make a copy instead: string = string.downcase
on 2013-01-29 10:25
On Jan 28, 2013, at 12:13 , Jess Gabriel y Galn <jgabrielygalan@gmail.com> wrote: > On Mon, Jan 28, 2013 at 7:16 PM, Joel Pearson <lists@ruby-forum.com> wrote: >> "Jess Gabriel y Galn" <jgabrielygalan@gmail.com> wrote in post >> #1094106: >>> string.downcase.scan(/\w+/) {|word| res[word] += 1} >> >> I tried benchmarking out of curiosity and that is a lot faster! Nicely >> done. > > I guess the reason is that you avoid the intermediate arrays. I suspect only scanning once is much more important than the extra arrays.
on 2013-01-29 10:39
Ryan Davis wrote in post #1094171: > On Jan 28, 2013, at 10:01 , Joel Pearson <lists@ruby-forum.com> wrote: > >> def count_words(string) >> res = {} >> string.downcase! >> string.scan( /\w+/ ).uniq.each{ |word| res[word] = >> string.scan(/\b#{word}\b/).size } >> res >> end > > This modifies the argument coming in. Don't ever call downcase! or other > mutating methods on an argument or you'll wind up in debugging hell. > Make a copy instead: > > string = string.downcase Thanks, I thought that those two things were equivalent. Doesn't string = string.downcase overwrite the argument string anyway?
on 2013-01-29 11:45
On Mon, Jan 28, 2013 at 6:55 PM, Jess Gabriel y Galn <jgabrielygalan@gmail.com> wrote: > res = Hash.new(0) > string.downcase.scan(/\w+/) {|word| res[word] += 1} > return res And to answer Wayne's question how to get rid of the "return": Hash.new(0).tap do |res| string.downcase.scan(/\w+/) {|word| res[word] += 1} end Kind regards robert
on 2013-01-29 12:08
On Tue, Jan 29, 2013 at 10:25 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote: >> >> I guess the reason is that you avoid the intermediate arrays. > > I suspect only scanning once is much more important than the extra arrays. Sure, you are right. I didn't really read Joel's proposal, and assume he had removed the double scan. Jesus.
on 2013-01-29 13:19
Ryan Davis wrote in post #1094171: > This modifies the argument coming in. Don't ever call downcase! or other > mutating methods on an argument or you'll wind up in debugging hell. > Make a copy instead: > > string = string.downcase Ah, I didn't know that a bang method would also change the argument outside of the current scope as well! Dangerous. irb(main):001:0> a = 'a' => "a" irb(main):002:0> def t1(b) irb(main):003:1> b.upcase irb(main):004:1> end => nil irb(main):005:0> def t2(b) irb(main):006:1> b.upcase! irb(main):007:1> end => nil irb(main):008:0> t1 a => "A" irb(main):009:0> a => "a" irb(main):010:0> t2 a => "A" irb(main):011:0> a => "A"
on 2013-01-29 18:00
On Tue, Jan 29, 2013 at 1:19 PM, Joel Pearson <lists@ruby-forum.com> wrote: > Ryan Davis wrote in post #1094171: >> This modifies the argument coming in. Don't ever call downcase! or other >> mutating methods on an argument or you'll wind up in debugging hell. >> Make a copy instead: >> >> string = string.downcase > > Ah, I didn't know that a bang method would also change the argument > outside of the current scope as well! Dangerous. That's why there is the exclamation mark in the first place. It means "potentially dangerous method" (defined by Matz). Btw, this does not have that much to do with scope but it's rather which object gets changed. All places in code which reference that particular instance will notice the change once they use the object. > irb(main):008:0> t1 a > => "A" > irb(main):009:0> a > => "a" > irb(main):010:0> t2 a > => "A" > irb(main):011:0> a > => "A" Yeah, String methods with exclamation mark typically change the instance itself whereas the "less dangerous" brothers typically return a modified instance. Kind regards robert
on 2013-01-29 18:00
As usual Robert, you've shown me a very elegant way to handle this!
Thanks!
Wayne
----- Original Message ----
From: Robert Klemme <shortcutter@googlemail.com>
And to answer Wayne's question how to get rid of the "return":
Hash.new(0).tap do |res|
string.downcase.scan(/\w+/) {|word| res[word] += 1}
end
Kind regards
robert
on 2013-01-29 18:37
i have a project in netbeans 6.8. I created a global module so...
module SharedVariables
@prueba = 1
def variable
@prueba ||= 1
end
def variable= (var)
@prueba = var
end
end
this module are in global_var.rb file and want call this module by
other ruby file....
what to do ?
thanks
on 2013-01-30 08:16
On Tue, Jan 29, 2013 at 6:38 PM, sasan sasgho <lists@ruby-forum.com>
wrote:
> what to do ?
First of all, please do not hijack other threads. Then, please
explain what your goal is, i.e. what you want to achieve.
Kind regards
robert
on 2013-01-31 08:30
On Tue, Jan 29, 2013 at 3:16 PM, Wayne Brisette <wbrisett@att.net>
wrote:
> As usual Robert, you've shown me a very elegant way to handle this! Thanks!
You're welcome! But I think the elegance is rather due to language
and library design than me. Thank Matz!
Kind regards
robert
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.