Hey All, I got me this fancy method for classifying documents that basically does this at one point: p = 1 words.each do |w| p *= calc_prob(w) end chi = -2.0 * Math.log(p) I'm finding that p is often going to 0.0 b/c the numbers returned by calc_prob are sometimes outlandishly small (or there are just so many words in the doc that the loop runs long enough to zero out the variable p). This causes problems for the call to Math.log of course (e.g., Errno::EDOM). I have tried two things. First, after some desperate flailing on google I added: require 'rational' require 'mathn' to my script and hoped that ruby would read my mind WRT using rationals where possible & that rationals would extend the reach of ruby's arithmetic into the too-outlandishly-small-for-floats range. When that did not seem to avail me, I put this in my words.do loop: if p == 0.0 then p = Float::MIN end That works, but makes me wonder if there's a smarter thing to do w/ those rational and mathn libs to really get the effect I hoped for just from including them in my script. Is there? Many thanks! -Roy

on 2009-03-16 19:01

on 2009-03-16 22:08

rpardee@gmail.com wrote: > Hey All, (...) > chi = -2.0 * Math.log(p) > > I'm finding that p is often going to 0.0 b/c the numbers returned by > calc_prob are sometimes outlandishly small (...) > That works, but makes me wonder if there's a smarter thing to do w/ > those rational and mathn libs to really get the effect I hoped for > just from including them in my script. > > Is there? > > Many thanks! > > -Roy Are you sure you required 'mathn' before defining your calc_prob method? big = 10**100 small = 1/big p small.zero? # true require 'mathn' small = 1/big p small.zero? # false p small.class # Rational p -2.0*Math.log(small) hth, Siep

on 2009-03-17 04:01

On Mar 16, 2:06 pm, Siep Korteling <s.kortel...@gmail.com> wrote: > > just from including them in my script. > > hth, > > Siep Thanks for the response! I think the issue may be that I'm not doing any division--just multiplication. Check it out: irb(main):001:0> require 'mathn' => true irb(main):002:0> x = 0.5 => 0.5 irb(main):003:0> 1000.times do irb(main):004:1* x *= x irb(main):005:1> end => 1000 irb(main):006:0> x => 0.0 irb(main):007:0> x.class => Float irb(main):008:0> But the more I think about it, the more I think I'm fussing over nothing (ha ha!). I think if my p var goes to zero, I should just set it = Float::MIN & break out of that loop. My calc_prob method will only ever return values <= 1, so there's no sense in letting it continue to spin down the value of p (if you can tell what I'm trying to say). Thanks! -Roy

on 2009-03-17 19:15

On Mar 16, 10:56 pm, Roy Pardee <rpar...@gmail.com> wrote: > irb(main):005:1> end > only ever return values <= 1, so there's no sense in letting it > continue to spin down the value of p (if you can tell what I'm trying > to say). > > Thanks! > > -Roy Roy, It all depends on how much range of data you want. If you need more granularity at the tiny end, you can always re-normalize... just initialize p to be 1e6 or something, rather than 1. Then after the log you can just subtract the constant exponent to get back to your original range. -t3ch.dude

on 2009-03-17 23:42

I don't think you need more precision. Basic math can help you here: log(a*b) = log(a) + log(b) so logp=0 words.each do |w| logp += Math.log( calc_prob(w) ) end chi = -2.0 * logp