Forum: Ruby Counting words

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jamal M. (Guest)
on 2006-04-28 21:47
(Received via mailing list)
I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects?  I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal
Marcel Molina Jr. (Guest)
on 2006-04-28 21:50
(Received via mailing list)
On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal M. wrote:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.

Here is a naive implementation:

class String
  def words
    scan(/\b\S+\b/)
  end
end

'this is a sentence with some words'.words
=> ["this", "is", "a", "sentence", "with", "some", "words"]
'this is a sentence with some words'.words.size
=> 7

marcel
Bira (Guest)
on 2006-04-28 21:50
(Received via mailing list)
On 4/28/06, Jamal M. <removed_email_address@domain.invalid> wrote:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.
>
> Jamal
>

I'm a bit of a nuby, and this is my first post to the list, but I
think the following one-liner will do the job:

number_of_words = string.split(/\s/).length

I haven't tested it because I'm at work without access to a Ruby
interpreter :(.
Bira (Guest)
on 2006-04-28 21:53
(Received via mailing list)
On 4/28/06, Bira <removed_email_address@domain.invalid> wrote:
> number_of_words = string.split(/\s/).length

Eh, sorry. I meant to write:

number_of_words = string.split(/\s+/).length

The "+" is needed to cover words with more than one whitespace
character between them.
Robert K. (Guest)
on 2006-04-28 22:46
(Received via mailing list)
2006/4/28, Jamal M. <removed_email_address@domain.invalid>:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.

s.scan(/\w+/).size
John J. (Guest)
on 2006-04-29 01:04
(Received via mailing list)
One way is like this:

irb(main):020:0> a="This is a test."
=> "This is a test."
irb(main):021:0> a.scan(/\b\S.*?\b/).size
=> 4
irb(main):022:0>

The Regexp in line 21 rewritten in a more readable form is:

a.scan(/
  \b        (?# a word boundary )
  \S        (?# a character that is not a space )
  .*?       (?# maybe (*) some more characters (.), but don't be greedy
(?))
  \b        (?# a word boundary )
  /x

btw, the Regexp above actually works because of the x at the end,
meaning an extended regexp.

Regards,
  JJ

On Friday, April 28, 2006, at 04:35PM, Jamal M.
<removed_email_address@domain.invalid> wrote:

>I've research this but am still having trouble getting it right ....
>Can someone give me code that counts the number of words in a string via
>RegExp and MatchData objects?  I think I'd like a word to be defined as
>contiguous characters surrounded by white space (or the start/end of the
>string), though am open to other interpretations.
>
>Jamal
>
>
>


---
Help everyone. If you can't do that, then at least be nice.
Justin C. (Guest)
on 2006-04-29 01:10
(Received via mailing list)
Bira wrote:
> --
> Bira
> http://compexplicita.blogspot.com
> http://sinfoniaferida.blogspot.com
>

Just plain string.split.length will work as well, and should handle line
breaks too:

> irb(main):001:0> "these     are     some     words".split.length
> => 4
> irb(main):002:0> "these are \n some\nwords".split.length
> => 4
> irb(main):003:0> "these are \n some\nwords".split
> => ["these", "are", "some", "words"]
> irb(main):004:0>


Hope that helps.

-Justin
Christian N. (Guest)
on 2006-05-03 21:08
(Received via mailing list)
"Marcel Molina Jr." <removed_email_address@domain.invalid> writes:

>   def words
>     scan(/\b\S+\b/)
>   end
> end

And quite bit more efficient, memory-wise:

class String
  def count_words
    n = 0
    scan(/\b\S+\b/) { n += 1}
    n
  end
end

Making String#count take regexps would be nice (same for #delete).
This topic is locked and can not be replied to.