Counting words with some exceptions

Say that we want to count the number of words in a document. I know we
can do the following:

text.each_line(){ |line| totalWords = totalWords + line.split.size }

Say, that I just want to add some exceptions, such that, I don’t want to
count the following as words:

(1) numbers
(2) standalone letters
(3) email addresses

How can we do that?

Thanks.

Instead of

line.split.size

you do a

line.split.select { |word| .... }.size

and have the select block return true if it is a word which should be
counted.

SW Engineer wrote in post #1175922:

Say that we want to count the number of words in a document. I know we
can do the following:

text.each_line(){ |line| totalWords = totalWords + line.split.size }

text.scan(/\b\w{2,}\b/).size

Regis d’Aubarede wrote in post #1175993:

text.scan(/\b\w{2,}\b/).size

This doesn’t exclude numbers and email addresses…

SW Engineer wrote in post #1175922:

Say that we want to count the number of words in a document. I know we
can do the following:

text.each_line(){ |line| totalWords = totalWords + line.split.size }

Say, that I just want to add some exceptions, such that, I don’t want to
count the following as words:

(1) numbers
(2) standalone letters
(3) email addresses

How can we do that?

Thanks.

Following Ronald’s route(using reject instead of select) you can use
something like this:

text = “some words 16 ss i [email protected] \n 14 51 51 other words”
regexp = /^(\d+|.|[A-Z0-9._%±][email protected][A-Z0-9.-]+.[A-Z]{2,4})$/i
count = text.each_line.inject(0){|s, l| s += l.split.reject{|w| w =~
regexp }.size }
puts count

The regular expression for the emails is a copy from here
http://www.regular-expressions.info/email.html also you can see there
the limitations of this approach

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs