Say that we want to count the number of words in a document. I know we
can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don’t want to
count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
Instead of
line.split.size
you do a
line.split.select { |word| .... }.size
and have the select block return true if it is a word which should be
counted.
SW Engineer wrote in post #1175922:
Say that we want to count the number of words in a document. I know we
can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
text.scan(/\b\w{2,}\b/).size
Regis d’Aubarede wrote in post #1175993:
text.scan(/\b\w{2,}\b/).size
This doesn’t exclude numbers and email addresses…
SW Engineer wrote in post #1175922:
Say that we want to count the number of words in a document. I know we
can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don’t want to
count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
Following Ronald’s route(using reject instead of select) you can use
something like this:
text = “some words 16 ss i [email protected] \n 14 51 51 other words”
regexp = /^(\d+|.|[A-Z0-9._%±]+@[A-Z0-9.-]+.[A-Z]{2,4})$/i
count = text.each_line.inject(0){|s, l| s += l.split.reject{|w| w =~
regexp }.size }
puts count
The regular expression for the emails is a copy from here
http://www.regular-expressions.info/email.html also you can see there
the limitations of this approach