Forum: Ruby Re: Installation of Rb-GSL setup config problems

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
unknown (Guest)
on 2006-05-08 21:15
(Received via mailing list)
Dear Ara,

well thank you very much again.
Actually, I am trying to find parameter values for statistical
distributions
from data, i.e.,
you have, say, a normal distribution, and some measured data that are
distributed
according to it, but you don't know the mean and variance parameters.
Then, one can solve some equations called maximum likelihood equations
for
that.
I was writing a Ruby script to do that. I wanted it to be able to find
parameters for
several well-known distributions. One of these is a lesser well-known
one,
the
Cauchy distribution, which is sometimes used to model random events
where
extremely high values have a higher probability than if the data were
distributed
according to the normal distribution. Some people claim that, e.g.,
natural
disasters
generate insurance claims that are distributed according to it or that
the
changes
in stock market values  are distributed like that ( when a crash
occurs, and
your money is lost, you know the famous (normal distribution)
random-walk
pricing models are wrong).
This distribution has two parameters and I took a set of combined
likelihood
equations out of a book, which are solved numerically.

Now, thanks to you, I can solve them. Below is the code.

Best regards,

Axel


--------------------------------------------

require "gsl"
include Math
include MultiMin

class Hash
def  ml_fit
fitted_params={}
# fit distribution parameters  from data using maximum likelihood or
method
#  of moments
# self is a hash with keys  'distribution', and values any of the
following:
#
'normal','bernoulli','binomial','uniform','gamma','exponential','cauchy'
# 'data' - an Array with the data measured.

n=self['data'].length
if  self.has_key?('distribution')==false
raise 'No distribution specified'
else
if  self['distribution'].downcase=='normal' or
self['distribution'].downcase=='gaussian'
mu_val=0.0
self['data'].each{|x|  mu_val=mu_val+x}
mu_val=mu_val/self['data'].length
fitted_params.store('mu_fitted',mu_val)
sigma_val=0.0
self['data'].each{|x|  sigma_val=sigma_val+(x-mu_val)*(x-mu_val)}
sigma_val=sigma_val/self['data'].length
fitted_params.store('sigma_square_fitted',sigma_val)
end

if  self['distribution'].downcase=='bernoulli'
# p(x)=p^x*(1-p)^(1-x), if x \in \{0,1\}, p(x)=0  otherwise
p_fitted=0.0
self['data'].each{|x|  p_fitted=p_fitted+x}
p_fitted=p_fitted/self['data'].length
fitted_params.store('p_fitted',p_fitted)
end
if  self['distribution'].downcase=='binomial'
# needs parameter n  !
#  p(x)=\choose{n,x} p^x (1-p)^(n-x), if x \in  \{0,..,n\}
if  self.has_key?('params')==false
raise('parameter n  missing')
end
p_fitted=0.0
self['data'].each{|x|  p_fitted=p_fitted+x}
p_fitted=p_fitted/self['data'].length/self['params']['n']
fitted_params.store('p_fitted',p_fitted)
end
if  self['distribution'].downcase=='poisson'
# p(x)=\frac{\lambada^x}{x!} exp(-\lambda), if x \in  \{0,1,..\}
lambda_fitted=0.0
self['data'].each{|x|  lambda_fitted=lambda_fitted+x}
p_fitted=p_fitted/self['data'].length
fitted_params.store('lambda_fitted',lambda_fitted)
end
if  self['distribution'].downcase=='uniform'
# p(x)=1/(b-a), if x \in  \[a,b\]
a_fitted=self['data'].max.to_f
b_fitted=self['data'].min.to_f
fitted_params.store('a_fitted',a_fitted)
fitted_params.store('b_fitted',b_fitted)
end
if  self['distribution'].downcase=='gamma'
# p_{\alpha,\lambda}(x)=0, if x \leq  0;
#  p_{\alpha,\lambda}(x)=\frac{\lambda^{\alpha}}{\Gamma_{\alpha}}
x^{\alpha-1}exp(-\lambda*x), if x  >;
#  use method of moments to estimate  params
m_1_fitted=0.0
self['data'].each{|x|  m_1_fitted=m_1_fitted+x}
m_1_fitted=m_1_fitted/self['data'].length

m_2_fitted=0.0
self['data'].each{|x|  m_2_fitted=m_2_fitted+x*x}
m_2_fitted=m_1_fitted/self['data'].length
alpha_fitted=(m_2_fitted*m_2_fitted)/(m_2_fitted-m_1_fitted*m_1_fitted)
lambda_fitted=m_1_fitted/(m_2_fitted-m_1_fitted*m_1_fitted)
fitted_params.store('alpha_fitted',alpha_fitted)
fitted_params.store('lambda_fitted',lambda_fitted)
end
if  self['distribution'].downcase=='exponential'
#  p(x)=\lambda*exp(-\lambda*x)
lambda_fitted=0.0
self['data'].each{|x|  lambda_fitted=lambda_fitted+x}
lambda_fitted=lambda_fitted/self['data'].length
if  lambda_fitted==0
raise('error fitting exponential distribution parameter: lambda is
infinite')
else
lambda_fitted=1/lambda_fitted
fitted_params.store('lambda_fitted',lambda_fitted)
end
end
if  self['distribution'].downcase=='cauchy'
#  p(x,\alpha,\beta)=\frac{\beta}{\pi*(\beta^2+(x-\alpha)^2))
# likelihood equations for parameters  are
#
#  \frac{\partial L}{\partial \alpha}=0 <=> \sum_{i=1}^{n}
\frac{x_j-\alpha}{\beta^2+(x_j-\alpha)^2}=0
#
#  \frac{\partial L}{\partial \beta}=0 <=> \sum_{i=1}^{n}
\frac{\beta^2}{\beta^2+(x_j-\alpha)^2}=n/2
# ( see Johnson, Kotz, and Balakrishnan, (1994), Continuous Univariate
Distributions,
# Volumes  I and II, 2nd. Ed., John Wiley and Sons, chapter 16 for
details  )
# Solve  these equations using Nelder-Mead multidimensional
minimization
# do this several times to choose a good minimum from local  minima

alpha_v=[]
beta_v=[]

# now find the parameters, repeating the numerical  minimization
repetitions=2000  # but stop after finding 5 parameter sets (in which
the
minimum search converged) to suitable values  (beta>0)
temp_min_val=100000000.0
best_alpha=false
best_beta=false
repetitions.times  do
fmin_val,x,st=self.cauchy_minimize
if st==0 and  x[1]>0
alpha_v<<x[0]
beta_v<<x[1]
end

if  fmin_val.abs<temp_min_val.abs and  x[1]>0
best_alpha=x[0]
best_beta=x[1]
temp_min_val=fmin_val
end

#  stop if five good parameter values are  found
if  alpha_v.length>4
break
end
if fmin_val<temp_min_val and  x[1]>0
best_alpha=x[0]
best_beta=x[1]
end
end
=begin
p 'got back  x'
p  alpha_v
p  beta_v
p 'best  values'
p  best_alpha
p  best_beta
=end
fitted_params.store('alpha_fitted',best_alpha)
fitted_params.store('beta_fitted',best_beta)
end
end
return fitted_params
end

def  cauchy_minimize
n=self['data'].length.to_f
np=self['data'].length
values=self['data']
param_fn=Proc.new{|dist_params,data|
# params are measured  values
val=0.0
for x_k in  data
val=val+(x_k-dist_params[0])
/(dist_params[1]*dist_params[1]+(x_k-dist_params[0])*(x_k-dist_params[0]))
end
val=val.abs
for x_k in  data
val=val+(dist_params[1]*dist_params[1])/(dist_params[1]*dist_params[1]+(x_k-di
st_params[0])*(x_k-dist_params[0]))-n/2
end
val=val.abs
}
my_func=Function.alloc(param_fn,np)
my_func.set_params(self['data'])
x =  Vector.alloc([rand]*self['data'].length)
ss =  Vector.alloc(np)
ss.set_all(1.0)
minimizer = FMinimizer.alloc("nmsimplex",  np)
minimizer.set(my_func, x,  ss)
iter =  0
begin
iter +=  1
status =  minimizer.iterate()
status =  minimizer.test_size(1e-2)
if status ==  GSL::SUCCESS
# puts("converged to minimum  at")
end
x =  minimizer.x
# printf("%5d ",  iter);
for i in 0...np  do
# printf("%10.3e ",  x[i])
end
# printf("f() = %7.3f size = %.3f\n", minimizer.fval,  minimizer.size);
end while status == GSL::CONTINUE and iter <  100
return  minimizer.fval,x,status
end
end

# test it
=begin
a={'distribution','cauchy','data',[-1,1,2,-2]}   # usually needs more
data,
of course, to get a good  approximation
params=a.ml_fit
p  params
=end
This topic is locked and can not be replied to.