curve fit exp

A common statistical problem is to describe a relationship between two measurements that are not linearly related (in a straight line).

When such a relationship can be mathematically defined (e.g. y=x²), variables can be transformed using programs in the Numerical Transformation Program Page and the relatively simple linear relationship retained.

Often however, a curved relationship that exists may appear regular and consistent, but a mathematical definition of that relationship is not available, and an empitical "best fit" algorithm, such as the polynomial curve fitting from the Curve Fitting Program Page is required.

The polynomial curve fit uses the formula y=a + b₁x + b₂x² + b₃x³ + b₄x⁴.... As each increase in power bends the relationship into a sharper curve, the combination of all the coefficients will be able to produce a curve of potentially any level of complexity. In bio-social science, however, curve fitting beyond the third power is seldom necessary or meaningful.

Curve fitting can be easily accomplished by using multiple regression as described in the Multiple Regression Explained Page , where the single x variable can be transformed into x², x³, x⁴, and so on, and the combination subjected to multiple regression analysis.

Curve fitting has been used successfully in laboratories, to define relationships between the results of a test (e.g. the depth of a color reaction) to the amount of a chemical (e.g. sugar) present.

The problem of using curve fitting when more than the mean values of the fit are required is the difficulty of assigning variance and the confidence interval of the fitted curve. The least square statistics is seldom useful here, as each of the coefficient has its own variance, and it is difficult to integrate them. An even more difficult issue is that, for many biological measurements, variance increases with the scale of measurement, so that the confidence interval around y increases as the x value increases.

Altman (see reference) described a two stage procedure that solves this problem. In the first stage, the standard curve fitting for the mean value is carried out. In the second stage, the distance between y of each data point and the mean y from the curve fit is obtained, and its absolute value used to perform another curve fit, so that a variable confidence interval can be defined.

The program in the Curve Fitting Program Page uses Altman's algorithm, and it can be used as follows.

The data is entered as two columns separated by spaces or tabs. Col 1 is the independent (x) variable, and col 2 the dependent (y) variable. Each data point is in a row.
The power the curve fit the mean can be defined, 1 a straight line, 2 a curve with one hump, 3 curve with 2 humps, and so on. The power is capped at 5 as curves fitted beyond that are seldom meaningful in biosocial sciences.
The power to curve fit the standard deviation around the mean can also be defined. Unless there is a good theoretical reason, 0 or 1 is usually sufficient. The power is capped at 3.
The percentage confidence interval required by the user. The 95% confidence interval is the most commonly used one, but the program allows users to change this to any percent (such as 90% or 99%)

x	y
1	10
1	11
2	18
2	22
3	20
3	30
4	19
4	31
5	30
5	45
6	40
6	60

The example data in the table to the left are from the program Curve Fitting Program Page is computer generated, so that x and y has a curved relationship, and the variance of y increases with x.

We will fit the mean y value to the power of 3, and the standard deviation to the power of 1. we will require the program to draw the 95% confidence interval of the curve oger the range of values

The results are as follows.

Mean regression line

	Coeff
Cons	-7
x₁	23.5317
x₂	-6.4881
x₃	0.6944

StanDard Deviation

	Coeff
Cons	-1.6711
x₁	2.3276

The output is to the right. The first table is the curve for the mean value, and here y = -7 + 23.53x - 6.49x² + 0.69x³. This is followed by the regression line for the standard deviation, SD = -1.67 + 2.33x, which defines the Standard Deviation from the curve fitted mean for any x value

If we were to combine the two formulae, we can now have the two equations that can be used to draw the 95% confidence interval lines.

From the first table, the curve of mean is y = -7 + 23.5317x - 6.4881x² + 0.6944x³

95%CI lines

	Low	High
Con	-3.7247	-10.2753
x₁	18.9697	28.0938
x₂	-6.4881	-6.4881
x₃	0.6944	0.6944

From the second table, the standard deviation from the mean curve is SD = -3.7247 + 10.2753x

The 95% confidence interval is mean ±1.96SD, so by combining the two fitted lines, we can obtain the upper and lower 95% CI lines, as shown in the table to the right. These are as follows.

The lower line : y = -3.7247 + 18.9697x - 6.4881x² + 0.6944x³
The upper line : y = -10.2753 + 28.0836x - 6.4881x² + 0.6944x³

Please note that the coefficients for the CI lines would be different should percentage confidence other than 95% is used (such as 90% or 99%)

Data points

X	Y	yx	sd	z	Percentile
1	10	10.7381	0.6565	-1.1243	13.04
1	11	10.7381	0.6565	0.3989	65.50
2	18	19.6667	2.9841	-0.5585	28.82
2	22	19.6667	2.9841	0.7819	78.29
3	20	23.9524	5.3117	-0.7441	22.84
3	30	23.9524	5.3117	1.1386	87.26
4	19	27.7619	7.6392	-1.147	12.57
4	31	27.7619	7.6392	0.4239	66.42
5	30	35.2619	9.9668	-0.5279	29.88
5	45	35.2619	9.9668	0.9771	83.57
6	40	50.619	12.2944	-0.8637	19.39
6	60	50.619	12.2944	0.763	77.73

The data points and their deviation from the mean line are then presented, as in the second table to the right. The abbreviations are:

X and Y are the original x and y values of the data point
y_x is the curved fitted mean y for the x value X
sd is the standard deviation of y at the x value X
z = (Y - y_x)/sd, and represents the difference between Y and its curve fitted value y_x in standard deviation units.
Percentile is a transformation of z into probability percentile, assuming a normal distribution.

These coefficients are now available to transform any x value to y value, using the polynomial transformation utility available in the Numerical Transformation Program Page

Finally, the curvefit bitmap, with the original data points (black round circles), and the 3 curves (best fit mean, the upper confidence inte3rval, and the lower confidence interval (in this example 95% confidence interval), are displayed

Altman DG (1993) Constructing age-related reference centiles using absolute residuals. Statistics in Medicine 12(10):917-924