Re: Installation of Rb-GSL setup config problems

unknown · May 8, 2006, 7:15pm

Dear Ara,

well thank you very much again.
Actually, I am trying to find parameter values for statistical
distributions
from data, i.e.,
you have, say, a normal distribution, and some measured data that are
distributed
according to it, but you don’t know the mean and variance parameters.
Then, one can solve some equations called maximum likelihood equations
for
that.
I was writing a Ruby script to do that. I wanted it to be able to find
parameters for
several well-known distributions. One of these is a lesser well-known
one,
the
Cauchy distribution, which is sometimes used to model random events
where
extremely high values have a higher probability than if the data were
distributed
according to the normal distribution. Some people claim that, e.g.,
natural
disasters
generate insurance claims that are distributed according to it or that
the
changes
in stock market values are distributed like that ( when a crash
occurs, and
your money is lost, you know the famous (normal distribution)
random-walk
pricing models are wrong).
This distribution has two parameters and I took a set of combined
likelihood
equations out of a book, which are solved numerically.

Now, thanks to you, I can solve them. Below is the code.

Best regards,

Axel

require “gsl”
include Math
include MultiMin

class Hash
def ml_fit
fitted_params={}

fit distribution parameters from data using maximum likelihood or

method

of moments

self is a hash with keys ‘distribution’, and values any of the

following:

‘normal’,‘bernoulli’,‘binomial’,‘uniform’,‘gamma’,‘exponential’,‘cauchy’

‘data’ - an Array with the data measured.

n=self[‘data’].length
if self.has_key?(‘distribution’)==false
raise ‘No distribution specified’
else
if self[‘distribution’].downcase==‘normal’ or
self[‘distribution’].downcase==‘gaussian’
mu_val=0.0
self[‘data’].each{|x| mu_val=mu_val+x}
mu_val=mu_val/self[‘data’].length
fitted_params.store(‘mu_fitted’,mu_val)
sigma_val=0.0
self[‘data’].each{|x| sigma_val=sigma_val+(x-mu_val)*(x-mu_val)}
sigma_val=sigma_val/self[‘data’].length
fitted_params.store(‘sigma_square_fitted’,sigma_val)
end

if self[‘distribution’].downcase==‘bernoulli’

p(x)=p^x*(1-p)^(1-x), if x \in {0,1}, p(x)=0 otherwise

p_fitted=0.0
self[‘data’].each{|x| p_fitted=p_fitted+x}
p_fitted=p_fitted/self[‘data’].length
fitted_params.store(‘p_fitted’,p_fitted)
end
if self[‘distribution’].downcase==‘binomial’

needs parameter n !

p(x)=\choose{n,x} p^x (1-p)^(n-x), if x \in {0,…,n}

if self.has_key?(‘params’)==false
raise(‘parameter n missing’)
end
p_fitted=0.0
self[‘data’].each{|x| p_fitted=p_fitted+x}
p_fitted=p_fitted/self[‘data’].length/self[‘params’][‘n’]
fitted_params.store(‘p_fitted’,p_fitted)
end
if self[‘distribution’].downcase==‘poisson’

p(x)=\frac{\lambada^x}{x!} exp(-\lambda), if x \in {0,1,…}

lambda_fitted=0.0
self[‘data’].each{|x| lambda_fitted=lambda_fitted+x}
p_fitted=p_fitted/self[‘data’].length
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
if self[‘distribution’].downcase==‘uniform’

p(x)=1/(b-a), if x \in [a,b]

a_fitted=self[‘data’].max.to_f
b_fitted=self[‘data’].min.to_f
fitted_params.store(‘a_fitted’,a_fitted)
fitted_params.store(‘b_fitted’,b_fitted)
end
if self[‘distribution’].downcase==‘gamma’

p_{\alpha,\lambda}(x)=0, if x \leq 0;

p_{\alpha,\lambda}(x)=\frac{\lambda^{\alpha}}{\Gamma_{\alpha}}

x^{\alpha-1}exp(-\lambda*x), if x >;

use method of moments to estimate params

m_1_fitted=0.0
self[‘data’].each{|x| m_1_fitted=m_1_fitted+x}
m_1_fitted=m_1_fitted/self[‘data’].length

m_2_fitted=0.0
self[‘data’].each{|x| m_2_fitted=m_2_fitted+xx}
m_2_fitted=m_1_fitted/self[‘data’].length
alpha_fitted=(m_2_fittedm_2_fitted)/(m_2_fitted-m_1_fittedm_1_fitted)
lambda_fitted=m_1_fitted/(m_2_fitted-m_1_fittedm_1_fitted)
fitted_params.store(‘alpha_fitted’,alpha_fitted)
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
if self[‘distribution’].downcase==‘exponential’

p(x)=\lambdaexp(-\lambdax)

lambda_fitted=0.0
self[‘data’].each{|x| lambda_fitted=lambda_fitted+x}
lambda_fitted=lambda_fitted/self[‘data’].length
if lambda_fitted==0
raise(‘error fitting exponential distribution parameter: lambda is
infinite’)
else
lambda_fitted=1/lambda_fitted
fitted_params.store(‘lambda_fitted’,lambda_fitted)
end
end
if self[‘distribution’].downcase==‘cauchy’

p(x,\alpha,\beta)=\frac{\beta}{\pi*(\beta^2+(x-\alpha)^2))

likelihood equations for parameters are

\frac{\partial L}{\partial \alpha}=0 <=> \sum_{i=1}^{n}

\frac{x_j-\alpha}{\beta^2+(x_j-\alpha)^2}=0

\frac{\partial L}{\partial \beta}=0 <=> \sum_{i=1}^{n}

\frac{\beta^2}{\beta^2+(x_j-\alpha)^2}=n/2

( see Johnson, Kotz, and Balakrishnan, (1994), Continuous Univariate

Distributions,

Volumes I and II, 2nd. Ed., John Wiley and Sons, chapter 16 for

details )

Solve these equations using Nelder-Mead multidimensional

minimization

do this several times to choose a good minimum from local minima

alpha_v=[]
beta_v=[]

now find the parameters, repeating the numerical minimization

repetitions=2000 # but stop after finding 5 parameter sets (in which
the
minimum search converged) to suitable values (beta>0)
temp_min_val=100000000.0
best_alpha=false
best_beta=false
repetitions.times do
fmin_val,x,st=self.cauchy_minimize
if st==0 and x[1]>0
alpha_v<<x[0]
beta_v<<x[1]
end

if fmin_val.abs<temp_min_val.abs and x[1]>0
best_alpha=x[0]
best_beta=x[1]
temp_min_val=fmin_val
end

stop if five good parameter values are found

if alpha_v.length>4
break
end
if fmin_val<temp_min_val and x[1]>0
best_alpha=x[0]
best_beta=x[1]
end
end
=begin
p ‘got back x’
p alpha_v
p beta_v
p ‘best values’
p best_alpha
p best_beta
=end
fitted_params.store(‘alpha_fitted’,best_alpha)
fitted_params.store(‘beta_fitted’,best_beta)
end
end
return fitted_params
end

def cauchy_minimize
n=self[‘data’].length.to_f
np=self[‘data’].length
values=self[‘data’]
param_fn=Proc.new{|dist_params,data|

params are measured values

val=0.0
for x_k in data
val=val+(x_k-dist_params[0])
/(dist_params[1]dist_params[1]+(x_k-dist_params[0])(x_k-dist_params[0]))
end
val=val.abs
for x_k in data
val=val+(dist_params[1]*dist_params[1])/(dist_params[1]dist_params[1]+(x_k-di
st_params[0])(x_k-dist_params[0]))-n/2
end
val=val.abs
}
my_func=Function.alloc(param_fn,np)
my_func.set_params(self[‘data’])
x = Vector.alloc([rand]*self[‘data’].length)
ss = Vector.alloc(np)
ss.set_all(1.0)
minimizer = FMinimizer.alloc(“nmsimplex”, np)
minimizer.set(my_func, x, ss)
iter = 0
begin
iter += 1
status = minimizer.iterate()
status = minimizer.test_size(1e-2)
if status == GSL::SUCCESS

puts(“converged to minimum at”)

end
x = minimizer.x

printf("%5d ", iter);

for i in 0…np do

printf("%10.3e ", x[i])

end

printf(“f() = %7.3f size = %.3f\n”, minimizer.fval, minimizer.size);

end while status == GSL::CONTINUE and iter < 100
return minimizer.fval,x,status
end
end

test it

=begin
a={‘distribution’,‘cauchy’,‘data’,[-1,1,2,-2]} # usually needs more
data,
of course, to get a good approximation
params=a.ml_fit
p params
=end