Forum: Ruby Gathering ngrams with the highest probability

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
47b36de21d7ecbc824c81d24802a6290?d=identicon&s=25 Minkoo Seo (pool007)
on 2006-04-02 11:15
(Received via mailing list)
Hi group.

I'm writing some scientific applications with Ruby, and found a
frequent problem that I want to solve with Ruby.

I got tons of instances of NGram whose definition is as follows:

NGram = Struct.new :seq, :prob

I have a list of instances of NGram like:

....
#<struct NGram seq=["AO", "S"], prob=-139918.174804688>
#<struct NGram seq=["AY", "T"], prob=-46389.6875>
#<struct NGram seq=["HH", "IH"], prob=18983.1796875>
#<struct NGram seq=["OW", "Z", "AH"], prob=-326323.640625>
#<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
#<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
#<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
#<struct NGram seq=["IH", "S", "T"], prob=-17305.6640625>
#<struct NGram seq=["IH", "S", "T"], prob=-17477.390625>
#<struct NGram seq=["IH", "S", "T"], prob=34243.34375>
#<struct NGram seq=["IH", "S", "T"], prob=-2125.265625>
#<struct NGram seq=["IH", "S", "T"], prob=-9046.7890625>
#<struct NGram seq=["IH", "S", "T"], prob=-18200.265625>
#<struct NGram seq=["K", "L", "AH"], prob=-110206.140625>
#<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
....

What I want to derive from this data is the list of NGram instances
each of which is unique with regard to seq. At the same time, the prob
of each ngram in the list must be that of the highest prob.

For example, from the ngram list I've shown above, I want to derive a
list like the folloing:

....
#<struct NGram seq=["AO", "S"], prob=-139918.174804688>
#<struct NGram seq=["AY", "T"], prob=-46389.6875>
#<struct NGram seq=["HH", "IH"], prob=18983.1796875>
#<struct NGram seq=["OW", "Z", "AH"], prob=-35945.25>
#<struct NGram seq=["T", "AH", "L"], prob=20778.7421875>
#<struct NGram seq=["HH", "IH", "S"], prob=37747.3046875>
#<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
....

What I've written so far is

# Sort by prob in descending order
ngrams.sort_by { |ngram|

    # Compare seq

    # Then, compare prob
}

result = []

# Collect unique ngrams with the highest prob.
ngrams.inject(nil) { |prev, cur|
    if prev.nil?
        result << cur
        prev = cur
    elsif prev.seq != cur.seq
        result << cur
        prev = cur
    end
}

return result

And it does not seem to be good even to me. Not to mention unwritten
sort_by block, I used result = [] statement which might be get rid of.

Any idea for better code?

Sincerely,
Minkoo Seo
Db212dec0d83349ef63c6100957b52d4?d=identicon&s=25 Robert Feldt (Guest)
on 2006-04-02 11:24
(Received via mailing list)
On 4/2/06, Minkoo Seo <minkoo.seo@gmail.com> wrote:
>
> #<struct NGram seq=["IH", "S", "T"], prob=34243.34375>
>
> #<struct NGram seq=["K", "L", "AH"], prob=-92664.984375>
> }
>         prev = cur
>     end
> }
>
> return result
>
ngrams.inject({}) do |highest, ngram|
  seq = ngram.seq
  best_now = highest[seq]
  highest[seq] = ngram unless (best_now && best_now.prob > ngram.prob)
  highest
end.values

/RF
934180817a3765d132193a5428f99051?d=identicon&s=25 Sylvain Joyeux (Guest)
on 2006-04-02 11:30
(Received via mailing list)
ngrams.inject({}) do |table, ngram|
	if old = table[ngram.seq]
		table[ngram.seq] = ngram if ngram.prob > old.prob
	else
		table[ngram.seq] = ngram
	end
	table
end
This topic is locked and can not be replied to.