Counting words


#1

I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal


#2

On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal M. wrote:

I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Here is a naive implementation:

class String
def words
scan(/\b\S+\b/)
end
end

‘this is a sentence with some words’.words
=> [“this”, “is”, “a”, “sentence”, “with”, “some”, “words”]
‘this is a sentence with some words’.words.size
=> 7

marcel


#3

On 4/28/06, Jamal M. removed_email_address@domain.invalid wrote:

I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal

I’m a bit of a nuby, and this is my first post to the list, but I
think the following one-liner will do the job:

number_of_words = string.split(/\s/).length

I haven’t tested it because I’m at work without access to a Ruby
interpreter :(.


#4

On 4/28/06, Bira removed_email_address@domain.invalid wrote:

number_of_words = string.split(/\s/).length

Eh, sorry. I meant to write:

number_of_words = string.split(/\s+/).length

The “+” is needed to cover words with more than one whitespace
character between them.


#5

2006/4/28, Jamal M. removed_email_address@domain.invalid:

I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

s.scan(/\w+/).size


#6

One way is like this:

irb(main):020:0> a=“This is a test.”
=> “This is a test.”
irb(main):021:0> a.scan(/\b\S.*?\b/).size
=> 4
irb(main):022:0>

The Regexp in line 21 rewritten in a more readable form is:

a.scan(/
\b (?# a word boundary )
\S (?# a character that is not a space )
.? (?# maybe () some more characters (.), but don’t be greedy
(?))
\b (?# a word boundary )
/x

btw, the Regexp above actually works because of the x at the end,
meaning an extended regexp.

Regards,
JJ

On Friday, April 28, 2006, at 04:35PM, Jamal M.
removed_email_address@domain.invalid wrote:

I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal


Help everyone. If you can’t do that, then at least be nice.


#7

“Marcel Molina Jr.” removed_email_address@domain.invalid writes:

def words
scan(/\b\S+\b/)
end
end

And quite bit more efficient, memory-wise:

class String
def count_words
n = 0
scan(/\b\S+\b/) { n += 1}
n
end
end

Making String#count take regexps would be nice (same for #delete).


#8

Bira wrote:


Bira
http://compexplicita.blogspot.com
http://sinfoniaferida.blogspot.com

Just plain string.split.length will work as well, and should handle line
breaks too:

irb(main):001:0> “these are some words”.split.length
=> 4
irb(main):002:0> “these are \n some\nwords”.split.length
=> 4
irb(main):003:0> “these are \n some\nwords”.split
=> [“these”, “are”, “some”, “words”]
irb(main):004:0>

Hope that helps.

-Justin