Splitting strings

Hi all,

I have a text file with phrases that I’m looking to split into chunks.

The following keyword list:

the brown fox jumped,
over the fence,

Which should produce the following output:

the,
the brown,
the brown fox,
the brown fox jumped,
brown fox,
brown fox jumped,
fox,
fox jumped,
jumped,
over,
over the,
over the fence,
the,
the fence,
fence

I’m currently using the following code which splits after each space:

def count_frequency
the_file=‘D:/Ruby/projects/data.txt’
h = Hash.new
f = File.open(the_file, “r”)
f.each_line { |line|
words = line.split
words.each { |w|
if h.has_key?(w)
h[w] = h[w] + 1
else
h[w] = 1
end
}
}

sort the hash by value, and then print it in this sorted order

h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
puts “”#{elem[0]}" has #{elem[1]} occurrences"
}

end

By the look of this I just need to append to the words array more words
with a different slice?

Many thanks in advance,

Ryan

By the look of this I just need to append to the words array
more words with a different slice?

Next time, just say: “I don’t have any idea how to program in ruby. I
look around the internet for scripts that other people wrote and try to
piece them together, but I don’t know how to do that very well because I
don’t know any programming. In other words, I am looking for free
programming services. Can anyone help?”

lines = [
‘the brown fox jumped,’,
‘over the fence,’
]

results = []

lines.each do |line|
line.chomp!(’,’)
words = line.split(’ ')

words.inject(’’) do |accumulator, word|
accumulator << "#{word} "
results << “#{accumulator.chomp(’ ')},”
accumulator
end

end

puts results

–output:–
the,
the brown,
the brown fox,
the brown fox jumped,
over,
over the,
over the fence,

You’re right. I’m still a beginner to Ruby, however I have still tried
researching what I’m looking for and come up with no results. I tried
manipulating the starting code but did not return any related results so
I asked a question… surely thats what these forums are for! (I’m sure
you would have done something like this - learning by example when you
first started too!)

Although I don’t appreciate your tone and communication skills (perhaps
you need a lesson) thank you for your technical help.

Ryan

Ryan M. wrote in post #1011820:

You’re right. I’m still a beginner to Ruby, however I have still tried
researching what I’m looking for and come up with no results. I tried
manipulating the starting code but did not return any related results so
I asked a question… surely thats what these forums are for! (I’m sure
you would have done something like this - learning by example when you
first started too!)

Although I don’t appreciate your tone and communication skills (perhaps
you need a lesson) thank you for your technical help.

Ryan, since you brought up communication skills: from your original
posting it is not entirely clear to me what you want to do. Do you want
to count word occurrences? Do you want to generate permutations of all
subsets of words found in a document? Or do you want to generate all
sub sequences of each phrase (line) in the document?

A few remarks: the usual counting idiom is this

counters = Hash.new 0

counters[key] += 1

If you need to append to Array per key, you can do

lists = Hash.new {|h, k| h[k] = []]

lists[key] << item

You open the file but do not close it (better use block form of
File.open or use File.foreach for even simpler code).

Maybe this does what you want - maybe not

ARGF.each do |line|
phrase = line.scan /\w+/
limit = phrase.length - 1

0.upto limit do |start|
start.upto limit do |stop|
puts phrase[start…stop].join ’ ’
end
end
end

Kind regards

robert

On Wed, Jul 20, 2011 at 12:18 PM, 7stud – [email protected]
wrote:

Posted via http://www.ruby-forum.com/.
Did you really post this through the forum? Interestingly there I
cannot see your sentence “As a beginner…”. How weird is that? Does
the forum → mailing list gateway add content? :slight_smile:

Kind regards

robert

On Wed, Jul 20, 2011 at 12:18 PM, 7stud – [email protected]
wrote:

As a beginner to ruby programming, you should be writing all programs
from scratch–not trying to alter some program you found on the
internets.

Good artists create. Great artists steal.


Phillip G.

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
– Leibniz

I laughed out loud when I read this.

Altering programs found online is a fantastic way to learn. Thank god
for open source/free software. Of course it depends on your goals, but i
would not limit learning to just writing everything from scratch.

Lake

Ryan M. wrote in post #1011820:

You’re right. I’m still a beginner to Ruby, however I have still tried
researching what I’m looking for and come up with no results.

Ok, taking on board all what has been said so far… this is what I’m
hoping to achieve (short term help needed as I am a beginner) is take a
list of strings from a text file and run through each string and split
it in as many combinations as possible, then count all the occurences of
each new strings that are split and provide them in the console as an
output.

Any help would be appreciated.

Regards,

Ryan

On Wed, Jul 20, 2011 at 8:16 PM, Ryan M. [email protected] wrote:

Ok, taking on board all what has been said so far… this is what I’m
hoping to achieve (short term help needed as I am a beginner) is take a
list of strings from a text file

Check File#read (either “ri File#read” on the command line, or on
ruby-doc.org). The gist:

data = File.read “myfile”

and run through each string and split it in as many combinations as possible

There’s a lot of splitting possible!

Though, I guess you want to split a sentence into its words, correct?

Either way, String#split is what you want (probably ‘A string".split("
")’, which splits the string at spaces).

, then count all the occurences of each new strings that are split and provide
them in the console as an
output.

Well, once you split your string, you get an Array of chunks (or
tokens, if you prefer): [“A”, “string”]. So the question is: Do you
want to get every possible combination, or a subset of these
combinations (as in the example provided in your OP)?


Phillip G.

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
– Leibniz

Hi Robert,

Your example works perfectly, thank you!

To incorporate the occurence count for each keyword do we need to put it
into a hash similar to the first example I gave or is it possible to
directly link that up with the output?

The previous example I had was:

words.each { |w|
w.lstrip
if h.has_key?(w)
h[w] = h[w] + 1
else
h[w] = 1
end
}
}

sort the hash by value, and then print it in this sorted order

h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
puts “”#{elem[0]}" has #{elem[1]} occurrences"
}

Many thanks again for your help.

Regards,

Ryan

On Wed, Jul 20, 2011 at 8:16 PM, Ryan M. [email protected] wrote:

Ok, taking on board all what has been said so far… this is what I’m
hoping to achieve (short term help needed as I am a beginner) is take a

Unfortunately there is still a lot of room for interpretation left…

list of strings from a text file and run through each string and split

How do you obtain the list of strings? Is a string a line from the
text file? And, as Phillip asked, how do you want your strings to be
split?

it in as many combinations as possible,

Does order matter or not? Example: do you consider “a b” and “b a” to
be the same combination or two separate combinations?

then count all the occurences of
each new strings that are split and provide them in the console as an
output.

Do you want to count the parts of the original string (line) or the
combination, i.e do you want to count “a” and “b” or “a b”?

Btw, did you try out the code I sent earlier?

Kind regards

robert

Ryan M. wrote in post #1011936:

Hi Robert,

Your example works perfectly, thank you!

You’re welcome!

To incorporate the occurence count for each keyword do we need to put it
into a hash similar to the first example I gave or is it possible to
directly link that up with the output?

Please see what I called “counting idiom” above.

sort the hash by value, and then print it in this sorted order

h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
puts “”#{elem[0]}" has #{elem[1]} occurrences"
}

To print in descending order you can as well do

counts.sort_by {|w,c| -c}.each do |w,c|
printf “%6d %s\n”, c, w
end

Kind regards

robert

Ryan M. wrote in post #1012277:

Ok, so far at the minute then I have included the counter hash so I have
the following:

def count_frequency
the_file=‘D:/Rails/projects/data.txt’
h = Hash.new
words = Hash.new
f = File.open(the_file, “r”)
counts = Hash.new 0

if.each do |line|
  phrase = line.scan /\w+/
  limit = phrase.length - 1


  0.upto limit do |start|
    start.upto limit do |stop|
      puts phrase[start..stop].join(' ')
    end
  end

   counts.sort_by {|w,c| -c}.each do |w,c|
     printf "%6d %s\n", c, w
   end
end

end

Would the counter hash with the key go underneath the ‘puts’ in the loop
so that it records each step? At the minute it still just outputs the
new strings without the ordering.

At the moment I would be surprised to see any output from counts because
you never update it. You also do not close the file properly (you could
make your life easier by using File.foreach) and I believe there is also
a spelling error (“if.each”). Did this program actually run and work?

Btw, the_file should rather be a method argument IMHO.

Kind regards

robert

Hi Robert,

Sorry that was a typo with the ‘if’. I changed that to ‘f’.

Unfortunately no I could not get it working. I will update the file so
it closes as you mentioned. How would I go about intergrating the count
with the phrase[start…stop] to insert those into the hash.

Sorry its probably an extremely basic question…

Thanks again,

Ryan

Ok, so far at the minute then I have included the counter hash so I have
the following:

def count_frequency
the_file=‘D:/Rails/projects/data.txt’
h = Hash.new
words = Hash.new
f = File.open(the_file, “r”)
counts = Hash.new 0

if.each do |line|
  phrase = line.scan /\w+/
  limit = phrase.length - 1


  0.upto limit do |start|
    start.upto limit do |stop|
      puts phrase[start..stop].join(' ')
    end
  end

   counts.sort_by {|w,c| -c}.each do |w,c|
     printf "%6d %s\n", c, w
   end
end

end

Would the counter hash with the key go underneath the ‘puts’ in the loop
so that it records each step? At the minute it still just outputs the
new strings without the ordering.

Many thanks,

Ryan

Ryan M. wrote in post #1012878:

Unfortunately no I could not get it working. I will update the file so
it closes as you mentioned. How would I go about intergrating the count
with the phrase[start…stop] to insert those into the hash.

I think you got that information above (see
http://www.ruby-forum.com/topic/2176493?reply_to=1012878#1011844).

Kind regards

robert

Ah yes I see what you mean now. Ok so now I’m left with trying to use
the hash key to assign the phrases. Do I need to incrument the h and k
each time or just one of them?

lists = Hash.new {|h, k| h[k] = []}

f.each do |line|
  phrase = line.scan /\w+/
  limit = phrase.length - 1

  0.upto limit do |start|
    start.upto limit do |stop|
      lists[h,k] << [start..stop].join(' ')
    end
  end

end

lists.sort_by {|w,c| -c}.each do |w,c|
   printf "%6d %s\n", c, w
end

Thanks again,

Ryan