I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.
Jamal
On Sat, Apr 29, 2006 at 02:43:30AM +0900, Jamal M. wrote:
I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.
Here is a naive implementation:
class String
def words
scan(/\b\S+\b/)
end
end
‘this is a sentence with some words’.words
=> [“this”, “is”, “a”, “sentence”, “with”, “some”, “words”]
‘this is a sentence with some words’.words.size
=> 7
marcel
On 4/28/06, Jamal M. [email protected] wrote:
I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.
Jamal
I’m a bit of a nuby, and this is my first post to the list, but I
think the following one-liner will do the job:
number_of_words = string.split(/\s/).length
I haven’t tested it because I’m at work without access to a Ruby
interpreter :(.
On 4/28/06, Bira [email protected] wrote:
number_of_words = string.split(/\s/).length
Eh, sorry. I meant to write:
number_of_words = string.split(/\s+/).length
The “+” is needed to cover words with more than one whitespace
character between them.
2006/4/28, Jamal M. [email protected]:
I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.
s.scan(/\w+/).size
One way is like this:
irb(main):020:0> a=“This is a test.”
=> “This is a test.”
irb(main):021:0> a.scan(/\b\S.*?\b/).size
=> 4
irb(main):022:0>
The Regexp in line 21 rewritten in a more readable form is:
a.scan(/
\b (?# a word boundary )
\S (?# a character that is not a space )
.? (?# maybe () some more characters (.), but don’t be greedy
(?))
\b (?# a word boundary )
/x
btw, the Regexp above actually works because of the x at the end,
meaning an extended regexp.
Regards,
JJ
On Friday, April 28, 2006, at 04:35PM, Jamal M.
[email protected] wrote:
I’ve research this but am still having trouble getting it right …
Can someone give me code that counts the number of words in a string via
RegExp and MatchData objects? I think I’d like a word to be defined as
contiguous characters surrounded by white space (or the start/end of the
string), though am open to other interpretations.
Jamal
Help everyone. If you can’t do that, then at least be nice.
“Marcel Molina Jr.” [email protected] writes:
def words
scan(/\b\S+\b/)
end
end
And quite bit more efficient, memory-wise:
class String
def count_words
n = 0
scan(/\b\S+\b/) { n += 1}
n
end
end
Making String#count take regexps would be nice (same for #delete).
Bira wrote:
–
Bira
http://compexplicita.blogspot.com
http://sinfoniaferida.blogspot.com
Just plain string.split.length will work as well, and should handle line
breaks too:
irb(main):001:0> “these are some words”.split.length
=> 4
irb(main):002:0> “these are \n some\nwords”.split.length
=> 4
irb(main):003:0> “these are \n some\nwords”.split
=> [“these”, “are”, “some”, “words”]
irb(main):004:0>
Hope that helps.
-Justin