Management of words in a string

aris · July 6, 2012, 5:52am

Hi All.

I’m trying to make a program in which you must enter a string and
calculate the number of words entered.

The problem is that you deal with whole words in a string, only handle
characters or letters.

As I can implement the above?

Thanks.

yankees26 · July 6, 2012, 8:49am

On Fri, Jul 6, 2012 at 5:52 AM, Joao S. [email protected] wrote:

Hi All.

I’m trying to make a program in which you must enter a string and
calculate the number of words entered.

The problem is that you deal with whole words in a string, only handle
characters or letters.

As I can implement the above?

You can use String#split method. You have to define very well what is
a word for you. For example, consider things like “one-way street” or
“it’s raining”, and also be careful with punctuation. A simplistic
approach could be just to use the default split behaviour, which
splits by the spaces:

s = “this has words. how many? let’s see”
[5] pry(main)> s.split
=> [“this”, “has”, “words.”, “how”, “many?”, “let’s”, “see”]
[6] pry(main)> s.split.size
=> 7

You can pass a regular expression to the split method to tune how you
split.

Jesus.

yankees26 · July 6, 2012, 10:06am

Hi,

Joao S. wrote in post #1067618:

As I can implement the above?

For large text you may use String#scan, which has the advantage of not
collecting all words in an array like String#split does:

input_text = ‘This is a sentence.’
word_count = input_text.strip.scan(/\s+/).size + 1

But like Jesus already said, this simple approach will not always work.
If the “words” in your text may contain whitespace, then looking for
whitespace will obviously fail. You’ll have to use a dictionary in this
case. This would also cover errors (missing or superfluous whitespace).

yankees26 · July 6, 2012, 11:50am

On Fri, Jul 6, 2012 at 10:06 AM, Jan E. [email protected] wrote:

Hi,

Joao S. wrote in post #1067618:

As I can implement the above?

For large text you may use String#scan, which has the advantage of not
collecting all words in an array like String#split does:

word_count = 0
input_text.scan(/\w+/){ word_count += 1}

input_text = ‘This is a sentence.’
word_count = input_text.strip.scan(/\s+/).size + 1

I don’t think this usage of #scan is a good approach, because it will
yield totally wrong results:

irb(main):002:0> input_text = ‘. : & #’
=> “. : & #”
irb(main):003:0> input_text.strip.scan(/\s+/).size + 1
=> 4

Whereas positive matching sequences of word characters is much closer
to the reality:

irb(main):004:0> input_text.scan(/\w+/).size
=> 0

But like Jesus already said, this simple approach will not always work.
If the “words” in your text may contain whitespace, then looking for
whitespace will obviously fail. You’ll have to use a dictionary in this
case. This would also cover errors (missing or superfluous whitespace).

It’s crucial to clarify the definition of “word”, I agree.

Kind regards

robert

yankees26 · July 6, 2012, 7:50pm

“Jesús Gabriel y Galán” [email protected] wrote in post
#1067632:

You can use String#split method. You have to define very well what is
a word for you. For example, consider things like “one-way street” or
“it’s raining”, and also be careful with punctuation. A simplistic
approach could be just to use the default split behaviour, which
splits by the spaces:

s = “this has words. how many? let’s see”
[5] pry(main)> s.split
=> [“this”, “has”, “words.”, “how”, “many?”, “let’s”, “see”]
[6] pry(main)> s.split.size
=> 7

You can pass a regular expression to the split method to tune how you
split.

Jesus.

and in case you want to count the words that begin with a particular
letter (for example “a”).

##############################################
ct=0

print "Enter a string: "
str=gets.chomp.to_s

puts “Word ==> #{str.split}”

if str.chr == “a”
ct=ct+1
end

puts “Number of words that start with a: #{ct}”

#################################################

yankees26 · July 6, 2012, 9:31pm

str.scan(/a\w+/).size

yankees26 · July 7, 2012, 8:15am

yeah sorry i was dump.

str = “bag of bananas and one apple”
str.scan(/\Wa\w+/).size
=> 2

yankees26 · July 7, 2012, 9:41am

Still wrong, sorry Hans

str = “apple and banana”
str.scan(/\Wa\w+/).size
=> 1

A correct regex would be (I hope I don’t get it wrong now) /\ba\B/.

– Matma R.

yankees26 · July 6, 2012, 11:47pm

Hans M. wrote in post #1067740:

str.scan(/a\w+/).size

Clearly wrong.

str = “bag of bananas”
str.scan(/a\w+/).size
=> 2

yankees26 · July 7, 2012, 12:59pm

Hans M. wrote in post #1067783:

hm still wrong, the best thing i could do is this:

Try \ba\w*

(\b = word boundary)

yankees26 · July 7, 2012, 10:54am

hm still wrong, the best thing i could do is this:

str = “a bag of bananas and one apple”
str.scan(Regexp.union(/^a\w*/,/\Wa\w*/))
=> [“a”, " and", " apple"]