Dear Ara,
well thank you very much again.
Actually, I am trying to find parameter values for statistical
distributions
from data, i.e.,
you have, say, a normal distribution, and some measured data that are
distributed
according to it, but you don’t know the mean and variance parameters.
Then, one can solve some equations called maximum likelihood equations
for
that.
I was writing a Ruby script to do that. I wanted it to be able to find
parameters for
several well-known distributions. One of these is a lesser well-known
one,
the
Cauchy distribution, which is sometimes used to model random events
where
extremely high values have a higher probability than if the data were
distributed
according to the normal distribution. Some people claim that, e.g.,
natural
disasters
generate insurance claims that are distributed according to it or that
the
changes
in stock market values are distributed like that ( when a crash
occurs, and
your money is lost, you know the famous (normal distribution)
random-walk
pricing models are wrong).
This distribution has two parameters and I took a set of combined
likelihood
equations out of a book, which are solved numerically.
Now, thanks to you, I can solve them. Below is the code.
Best regards,
Axel
require “gsl”
include Math
include MultiMin
class Hash
def ml_fit
fitted_params={}
fit distribution parameters from data using maximum likelihood or
method
of moments
self is a hash with keys ‘distribution’, and values any of the
following:
‘normal’,‘bernoulli’,‘binomial’,‘uniform’,‘gamma’,‘exponential’,‘cauchy’
‘data’ - an Array with the data measured.
n=self[‘data’].length
if self.has_key?(‘distribution’)==false
raise ‘No distribution specified’
else
if self[‘distribution’].downcase==‘normal’ or
self[‘distribution’].downcase==‘gaussian’
mu_val=0.0
self[‘data’].each{|x| mu_val=mu_val+x}
mu_val=mu_val/self[‘data’].length
fitted_params.store(‘mu_fitted’,mu_val)
sigma_val=0.0
self[‘data’].each{|x| sigma_val=sigma_val+(x-mu_val)*(x-mu_val)}
sigma_val=sigma_val/self[‘data’].length
fitted_params.store(‘sigma_square_fitted’,sigma_val)
end
if self[‘distribution’].downcase==‘bernoulli’
p(x)=p^x*(1-p)^(1-x), if x \in {0,1}, p(x)=0 otherwise
p_fitted=0.0
self[‘data’].each{|x| p_fitted=p_fitted+x}
p_fitted=p_fitted/self[‘data’].length
fitted_params.store(‘p_fitted’,p_fitted)
end
if self[‘distribution’].downcase==‘binomial’
needs parameter n !
p(x)=\choose{n,x} p^x (1-p)^(n-x), if x \in {0,…,n}
if self.has_key?(‘params’)==false
raise(‘parameter n missing’)
end
p_fitted=0.0
self[‘data’].each{|x| p_fitted=p_fitted+x}
p_fitted=p_fitted/self[‘data’].length/self[‘params’][‘n’]
fitted_params.store(‘p_fitted’,p_fitted)
end
if self[‘distribution’].downcase==‘poisson’
p(x)=\frac{\lambada^x}{x!} exp(-\lambda), if x \in {0,1,…}
lambda_fitted=0.0
self[‘data’].each{|x| lambda_fitted=lambda_fitted+x}
p_fitted=p_fitted/self[‘data’].length
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
if self[‘distribution’].downcase==‘uniform’
p(x)=1/(b-a), if x \in [a,b]
a_fitted=self[‘data’].max.to_f
b_fitted=self[‘data’].min.to_f
fitted_params.store(‘a_fitted’,a_fitted)
fitted_params.store(‘b_fitted’,b_fitted)
end
if self[‘distribution’].downcase==‘gamma’
p_{\alpha,\lambda}(x)=0, if x \leq 0;
p_{\alpha,\lambda}(x)=\frac{\lambda^{\alpha}}{\Gamma_{\alpha}}
x^{\alpha-1}exp(-\lambda*x), if x >;
use method of moments to estimate params
m_1_fitted=0.0
self[‘data’].each{|x| m_1_fitted=m_1_fitted+x}
m_1_fitted=m_1_fitted/self[‘data’].length
m_2_fitted=0.0
self[‘data’].each{|x| m_2_fitted=m_2_fitted+xx}
m_2_fitted=m_1_fitted/self[‘data’].length
alpha_fitted=(m_2_fittedm_2_fitted)/(m_2_fitted-m_1_fittedm_1_fitted)
lambda_fitted=m_1_fitted/(m_2_fitted-m_1_fittedm_1_fitted)
fitted_params.store(‘alpha_fitted’,alpha_fitted)
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
if self[‘distribution’].downcase==‘exponential’
p(x)=\lambdaexp(-\lambdax)
lambda_fitted=0.0
self[‘data’].each{|x| lambda_fitted=lambda_fitted+x}
lambda_fitted=lambda_fitted/self[‘data’].length
if lambda_fitted==0
raise(‘error fitting exponential distribution parameter: lambda is
infinite’)
else
lambda_fitted=1/lambda_fitted
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
end
if self[‘distribution’].downcase==‘cauchy’
p(x,\alpha,\beta)=\frac{\beta}{\pi*(\beta^2+(x-\alpha)^2))
likelihood equations for parameters are
\frac{\partial L}{\partial \alpha}=0 <=> \sum_{i=1}^{n}
\frac{x_j-\alpha}{\beta^2+(x_j-\alpha)^2}=0
\frac{\partial L}{\partial \beta}=0 <=> \sum_{i=1}^{n}
\frac{\beta^2}{\beta^2+(x_j-\alpha)^2}=n/2
( see Johnson, Kotz, and Balakrishnan, (1994), Continuous Univariate
Distributions,
Volumes I and II, 2nd. Ed., John Wiley and Sons, chapter 16 for
details )
Solve these equations using Nelder-Mead multidimensional
minimization
do this several times to choose a good minimum from local minima
alpha_v=[]
beta_v=[]
now find the parameters, repeating the numerical minimization
repetitions=2000 # but stop after finding 5 parameter sets (in which
the
minimum search converged) to suitable values (beta>0)
temp_min_val=100000000.0
best_alpha=false
best_beta=false
repetitions.times do
fmin_val,x,st=self.cauchy_minimize
if st==0 and x[1]>0
alpha_v<<x[0]
beta_v<<x[1]
end
if fmin_val.abs<temp_min_val.abs and x[1]>0
best_alpha=x[0]
best_beta=x[1]
temp_min_val=fmin_val
end
stop if five good parameter values are found
if alpha_v.length>4
break
end
if fmin_val<temp_min_val and x[1]>0
best_alpha=x[0]
best_beta=x[1]
end
end
=begin
p ‘got back x’
p alpha_v
p beta_v
p ‘best values’
p best_alpha
p best_beta
=end
fitted_params.store(‘alpha_fitted’,best_alpha)
fitted_params.store(‘beta_fitted’,best_beta)
end
end
return fitted_params
end
def cauchy_minimize
n=self[‘data’].length.to_f
np=self[‘data’].length
values=self[‘data’]
param_fn=Proc.new{|dist_params,data|
params are measured values
val=0.0
for x_k in data
val=val+(x_k-dist_params[0])
/(dist_params[1]dist_params[1]+(x_k-dist_params[0])(x_k-dist_params[0]))
end
val=val.abs
for x_k in data
val=val+(dist_params[1]*dist_params[1])/(dist_params[1]dist_params[1]+(x_k-di
st_params[0])(x_k-dist_params[0]))-n/2
end
val=val.abs
}
my_func=Function.alloc(param_fn,np)
my_func.set_params(self[‘data’])
x = Vector.alloc([rand]*self[‘data’].length)
ss = Vector.alloc(np)
ss.set_all(1.0)
minimizer = FMinimizer.alloc(“nmsimplex”, np)
minimizer.set(my_func, x, ss)
iter = 0
begin
iter += 1
status = minimizer.iterate()
status = minimizer.test_size(1e-2)
if status == GSL::SUCCESS
puts(“converged to minimum at”)
end
x = minimizer.x
printf("%5d ", iter);
for i in 0…np do
printf("%10.3e ", x[i])
end
printf(“f() = %7.3f size = %.3f\n”, minimizer.fval, minimizer.size);
end while status == GSL::CONTINUE and iter < 100
return minimizer.fval,x,status
end
end
test it
=begin
a={‘distribution’,‘cauchy’,‘data’,[-1,1,2,-2]} # usually needs more
data,
of course, to get a good approximation
params=a.ml_fit
p params
=end