Curve fitting to data

philippeqc · December 17, 2007, 12:43am

People,

Does anyone know of a Ruby app that will fit a curve to data eg fitting
a curve to:

-10 0
-9 19
-8 36
-7 51
-6 64
-5 75
-4 84
-3 91
-2 96
-1 99
0 100
1 99
2 96
3 91
4 84
5 75
6 64
7 51
8 36
9 19
10 0

Should give the formula for a parabola.

Thanks,

Phil.

–
Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

philippeqc · December 17, 2007, 12:57am

On 16/12/2007, Phil R. [email protected] wrote:

People,

Does anyone know of a Ruby app that will fit a curve to data eg fitting
a curve to:

-10 0
-9 19

This looks like a prime candidate for GNUPlot – which ruby has bindings
for.

– Thomas A.

philippeqc · December 17, 2007, 1:08am

Thomas,

On Sun, 2007-12-16 at 23:55 +0000, Thomas A. wrote:

On 16/12/2007, Phil R. [email protected] wrote:

People,

Does anyone know of a Ruby app that will fit a curve to data eg fitting
a curve to:

-10 0
-9 19

This looks like a prime candidate for GNUPlot – which ruby has bindings for.

I think GNUPlot requires the knowledge of the type of fn you are trying
to fit - I want the software to TELL me what sort of fn it is eg for the
data:

-10 0
-9 19
-8 36
-7 51
-6 64
-5 75
-4 84
-3 91
-2 96
-1 99
0 100
1 99
2 96
3 91
4 84
5 75
6 64
7 51
8 36
9 19
10 0

http://www.zunzun.com tells me that the formula for this data is:

y = a( atan(x) ) + b( x2 ) + c( sinh(x) ) + offset

I would like to be able to do this myself with my own (preferably Ruby)
code.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

philippeqc · December 17, 2007, 2:31am

Whoa, that looks interesting… cant say I know anything to help you,
but I will “bookmark” this thread to see if someone interesting emerges
here

philippeqc · December 17, 2007, 4:48am

Alex F. wrote:

-9 19
I’ve found to have a steeper curve than I cared to climb.
But nothing I know of exactly in Ruby. You might take a look at the
source code for the website you linked to:
CVS Info for project pythonequations

Maybe a fun one for a Ruby Q. sometime?

alex

Well … let’s see:

There’s no such thing as a “universal curve fitting algorithm”, in
the sense that you give it a set of points and it spits out a formula.
The main reason is that given any finite set of points, there are an
infinite number of possible curves that can be drawn through them
exactly. Some of these are fairly well behaved outside the range of the
input points, and some of them exhibit bizarre behavior.
There’s no such thing as a curve-fitting problem in a total vacuum,
without any context whatsoever. In other words, a request to fit a curve
to a set of points is meaningless without knowledge of how you will use
that fit.
Many of the “naive” techniques, like polynomial regression, behave
very badly. By “behave very badly”, I mean “much worse than plotting the
points in Excel and making a wild guess.”

Given all that, I don’t see much hope of finding or writing a Ruby
program to do this. Conversely, if the original poster has a model that
the points should have come from, based on some theory, writing Ruby
code (or R or Python code) to fit the data to the model is easy. Just
about every reasonable fitting algorithm is already coded in R, so if
you just want an answer without all that pesky learning stuff, it’s
probably easier to do it in R.

philippeqc · December 17, 2007, 3:41am

Phil R. wrote:

I think GNUPlot requires the knowledge of the type of fn you are trying
to fit - I want the software to TELL me what sort of fn it is eg for the
data:

You can access the “R” statistical package via Ruby, which seems to have
curve fitting capabilities. But this would involve learning R, which
I’ve found to have a steeper curve than I cared to climb.

More generally, I think for most scientific purposes, it’s a good idea
to have an idea of the type of curve (power, quadratic, cubic etc) that
might underlie the observed data. Most applications enforce this. If you
know the general form of the equation that’s being fitted ( eg ax^2 + bx

c for a quadratic), it would be possible to get estimates for a, b and
c by using eg ordinary least squares to find a solution that minimises
the difference between observed and predicted values. How you get to
that with acceptable time is an algorithmic question…

But nothing I know of exactly in Ruby. You might take a look at the
source code for the website you linked to:

Maybe a fun one for a Ruby Q. sometime?

alex

philippeqc · December 17, 2007, 7:17am

On Dec 16, 7:07 pm, Phil R. [email protected] wrote:

-4 84
7 51

Thanks,

Phil.

Isn’t the result zunzun spit out “wrong”? The data is most simply
described as an inverted parabola:

y = a x^2 + c

This points exactly to the problem others have mentioned. Given a
large enough space of functions, you can fit practically anything, but
what does it mean? Generally speaking, no one has any business
fitting 21 data points with 4 parameters.

JM

philippeqc · December 17, 2007, 4:51am

Alex,

On Mon, 2007-12-17 at 11:40 +0900, Alex F. wrote:

-9 19

source code for the website you linked to:
CVS Info for project pythonequations

Maybe a fun one for a Ruby Q. sometime?

I installed grace and that does pretty much what you said, which is not
quite what I want but interesting . .

Yeh, I did have a look at the PythonEquations stuff but it looks too
tough for me to translate/make use of - I would certainly be happy if
someone wanted to make it a Ruby Q.!

Regards,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

philippeqc · December 17, 2007, 3:57pm

-------- Original-Nachricht --------

Datum: Mon, 17 Dec 2007 15:31:03 +0900
Von: Phil R. [email protected]
An: [email protected]
Betreff: Re: Curve fitting to data

data:
-1 99
10 0

fitting 21 data points with 4 parameters.
Dear Phil,

it is of course preferable to have some idea about the underlying
relationship between data graphed, such as

y = ax^2 + bx + c,

and then fit that model (this can be done by solving a linear
equation,

Matrix([x_0^2,x_0,1],…,[x_n^2,x_n,1])*([a,b,c]^transpose)=[y_0,…,y_n]^transpose

(numbering data points as ((x_0,y_0),…(x_n,y_)) and
[…] indicating rows in the matrix or row vectors)),

as this is a linear equation in the parameters a,b,c .
You can do that with any software that solves linear or matrix
equations, i.e., rsruby or rb-gsl .

It is of course also true that one can basically draw arbitrary
curves to connect data points, if you don’t know that a model
like the above is “true”.

Now, one additional line of thought is pursued in the discipline
of “approximation theory” (see eg., Wikipedia, or for a deeper
insight,
Approximation Theory and Methods - M. J. D. Powell - Google Books).

Here, one starts with a points, as yours, and asks,

Given a distance measure between the data and the curve (“norm”) and a
set of admissible model curves (e.g., all continuous curves on an
interval),
which curves will minimize that norm ?

There are indeed some results available, such as Chebyshev or
Remez(Remes) approximation procedures.

This kind of procedure can be recommended when the functional
relationship of your data is rather complicated/not enormously
interesting/you distrust
simple models, you know something about the general wiggliness of the
underlying curve (see the Jackson theorems in Powell’s book), and you
need to have information about what you would have measured at some
point you didn’t actually measure and the result should be not too far
off …

Best regards,

Axel

philippeqc · December 17, 2007, 7:32am

JM,

On Mon, 2007-12-17 at 15:10 +0900, [email protected] wrote:

-6 64
5 75
I would like to be able to do this myself with my own (preferably Ruby)

This points exactly to the problem others have mentioned. Given a
large enough space of functions, you can fit practically anything, but
what does it mean? Generally speaking, no one has any business
fitting 21 data points with 4 parameters.

Sorry, my cut and paste had a typo - I left out the “^”, the formula
should have been:

     y = a( atan(x) ) + b( x^2 ) + c( sinh(x) ) + offset

which is closer to the parabola (and the zunzun display on the screen
seemed perfect) but yes, you are correct.

Thanks,

Phil.

Philip R.

Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275)
GPO Box 3411
Sydney NSW 2001
Australia
Fax: +61:(0)2-8221-9599
E-mail: [email protected]

philippeqc · December 18, 2007, 12:26pm

A quadratic indeed fits this function very well. The
zunzun.com function finder was run with the option
to try every possible function, rather than limit
to simple curves only. This can be done on the site
in two ways:

Use the function finder “Smoothness Control” to
only allow functions with a few coefficients
Use the function finder “Equation Family Inclusion”
and disallow the polyfunctionals - these generate
many thousands of basically random functions to test.

James Phillips
http;//zunzun.com
2548 Vera Cruz Drive
Birmingham, AL 35235 USA
zunzun at zunzun.com