StatTools : Curve Fitting Explained

Links : Home Index (Subjects) Contact StatTools

Related link :
Multiple Regression Explained Page
Curve Fitting Program Page

Introduction Example References
A common statistical problem is to describe a relationship between two measurements that are not linearly related (in a straight line).

When such a relationship can be mathematically defined (e.g. y=x2), variables can be transformed using programs in the Numerical Transformation Program Page and the relatively simple linear relationship retained.

Often however, a curved relationship that exists may appear regular and consistent, but a mathematical definition of that relationship is not available, and an empitical "best fit" algorithm, such as the polynomial curve fitting from the Curve Fitting Program Page is required.

The polynomial curve fit uses the formula y=a + b1x + b2x2 + b3x3 + b4x4.... As each increase in power bends the relationship into a sharper curve, the combination of all the coefficients will be able to produce a curve of potentially any level of complexity. In bio-social science, however, curve fitting beyond the third power is seldom necessary or meaningful.

Curve fitting can be easily accomplished by using multiple regression as described in the Multiple Regression Explained Page , where the single x variable can be transformed into x2, x3, x4, and so on, and the combination subjected to multiple regression analysis.

Curve fitting has been used successfully in laboratories, to define relationships between the results of a test (e.g. the depth of a color reaction) to the amount of a chemical (e.g. sugar) present.

The problem of using curve fitting when more than the mean values of the fit are required is the difficulty of assigning variance and the confidence interval of the fitted curve. The least square statistics is seldom useful here, as each of the coefficient has its own variance, and it is difficult to integrate them. An even more difficult issue is that, for many biological measurements, variance increases with the scale of measurement, so that the confidence interval around y increases as the x value increases.

Altman (see reference) described a two stage procedure that solves this problem. In the first stage, the standard curve fitting for the mean value is carried out. In the second stage, the distance between y of each data point and the mean y from the curve fit is obtained, and its absolute value used to perform another curve fit, so that a variable confidence interval can be defined.

The program in the Curve Fitting Program Page uses Altman's algorithm, and it can be used as follows.

  • The data is entered as two columns separated by spaces or tabs. Col 1 is the independent (x) variable, and col 2 the dependent (y) variable. Each data point is in a row.
  • The power the curve fit the mean can be defined, 1 a straight line, 2 a curve with one hump, 3 curve with 2 humps, and so on. The power is capped at 5 as curves fitted beyond that are seldom meaningful in biosocial sciences.
  • The power to curve fit the standard deviation around the mean can also be defined. Unless there is a good theoretical reason, 0 or 1 is usually sufficient. The power is capped at 3.
  • The percentage confidence interval required by the user. The 95% confidence interval is the most commonly used one, but the program allows users to change this to any percent (such as 90% or 99%)