Forum: Ruby Counting words

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
2cf408af3f08d3575c9cd7697158a8f1?d=identicon&s=25 Jamal Mazrui (Guest)
on 2006-04-28 19:47
(Received via mailing list)
I've research this but am still having trouble getting it right ....
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects?  I think I'd like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.

Jamal
Cee0292fffa691f1fb320d5400200e99?d=identicon&s=25 Marcel Molina Jr. (Guest)
on 2006-04-28 19:50
(Received via mailing list)
On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal Mazrui wrote:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.

Here is a naive implementation:

class String
  def words
    scan(/\b\S+\b/)
  end
end

'this is a sentence with some words'.words
=> ["this", "is", "a", "sentence", "with", "some", "words"]
'this is a sentence with some words'.words.size
=> 7

marcel
439c401f95ee2fac0be4c1727dd74dea?d=identicon&s=25 Bira (Guest)
on 2006-04-28 19:50
(Received via mailing list)
On 4/28/06, Jamal Mazrui <Jamal.Mazrui@fcc.gov> wrote:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.
>
> Jamal
>

I'm a bit of a nuby, and this is my first post to the list, but I
think the following one-liner will do the job:

number_of_words = string.split(/\s/).length

I haven't tested it because I'm at work without access to a Ruby
interpreter :(.
439c401f95ee2fac0be4c1727dd74dea?d=identicon&s=25 Bira (Guest)
on 2006-04-28 19:53
(Received via mailing list)
On 4/28/06, Bira <u.alberton@gmail.com> wrote:
> number_of_words = string.split(/\s/).length

Eh, sorry. I meant to write:

number_of_words = string.split(/\s+/).length

The "+" is needed to cover words with more than one whitespace
character between them.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-04-28 20:46
(Received via mailing list)
2006/4/28, Jamal Mazrui <Jamal.Mazrui@fcc.gov>:
> I've research this but am still having trouble getting it right ....
> Can someone give me code that counts the number of words in a string via
> RegExp and MatchData objects?  I think I'd like a word to be defined as
> contiguous characters surrounded by white space (or the start/end of the
> string), though am open to other interpretations.

s.scan(/\w+/).size
695abc793d51f62f781ef035c232f826?d=identicon&s=25 John Johnson (Guest)
on 2006-04-28 23:04
(Received via mailing list)
One way is like this:

irb(main):020:0> a="This is a test."
=> "This is a test."
irb(main):021:0> a.scan(/\b\S.*?\b/).size
=> 4
irb(main):022:0>

The Regexp in line 21 rewritten in a more readable form is:

a.scan(/
  \b        (?# a word boundary )
  \S        (?# a character that is not a space )
  .*?       (?# maybe (*) some more characters (.), but don't be greedy
(?))
  \b        (?# a word boundary )
  /x

btw, the Regexp above actually works because of the x at the end,
meaning an extended regexp.

Regards,
  JJ

On Friday, April 28, 2006, at 04:35PM, Jamal Mazrui
<Jamal.Mazrui@fcc.gov> wrote:

>I've research this but am still having trouble getting it right ....
>Can someone give me code that counts the number of words in a string via
>RegExp and MatchData objects?  I think I'd like a word to be defined as
>contiguous characters surrounded by white space (or the start/end of the
>string), though am open to other interpretations.
>
>Jamal
>
>
>


---
Help everyone. If you can't do that, then at least be nice.
F3b7109c91841c7106784d229418f5dd?d=identicon&s=25 Justin Collins (Guest)
on 2006-04-28 23:10
(Received via mailing list)
Bira wrote:
> --
> Bira
> http://compexplicita.blogspot.com
> http://sinfoniaferida.blogspot.com
>

Just plain string.split.length will work as well, and should handle line
breaks too:

> irb(main):001:0> "these     are     some     words".split.length
> => 4
> irb(main):002:0> "these are \n some\nwords".split.length
> => 4
> irb(main):003:0> "these are \n some\nwords".split
> => ["these", "are", "some", "words"]
> irb(main):004:0>


Hope that helps.

-Justin
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-05-03 19:08
(Received via mailing list)
"Marcel Molina Jr." <marcel@vernix.org> writes:

>   def words
>     scan(/\b\S+\b/)
>   end
> end

And quite bit more efficient, memory-wise:

class String
  def count_words
    n = 0
    scan(/\b\S+\b/) { n += 1}
    n
  end
end

Making String#count take regexps would be nice (same for #delete).
This topic is locked and can not be replied to.