Related Links:
Numerical Transformation Program Page
Introduction
Linear
Curve
Miscellaneous
All mathematics involves the transformation of one set of numbers to another set, but transformations provided by StatTools are those commonly used in clinical statistics.
Data Input :
- The data is a single column of numbers to be transformed
- Where additional parameters are necessary, input is either via values for text boxes, or alternatives in action buttons.
- Example data and parameters are available for each transformation procedure to guide the user. Specific requirements and pitfalls
for each procedure are described in the following panels
Result output : One or more of the following are provided after transformation
- A table comparing the following parameters from the original and transformed data
- Mean and Standard Deviation
- Skewness and Kurtosis
- Chi Square (with 2 degrees of freedom) to test the combination of skewness and kurtosis against the null hypothesis
that the data is normally distributed. A critical value of chi square = 5.99 provides the Type I Error α=0.05,
so that higher values indicates a significant departure from normal distribution
- A correlation coefficient ρ between the values against theoretical values from a normal distribution. This provides
a numerical representation of how close the data is to normal distribution.
- A table listing all values before and after transformation
- Two plots, for data before and after transformation, to allow visual inspection of the distribution of the data
- The data is converted into Standard Deviation (z) values, where z = (v-mean) / Standard Deviation
- The values are then placed into an array at one Standard Deviation intervals
- The number of values in each array is represented as a percent of the total number
- The final plot is the percent of total in each Standard Deviation array
- A x/y scatterplot, with x=before and y=after transformation, the demonstrate the shape of the relationship
that is affected by the transformation.
- Please note that the MacroPlot Program Page
provides programs to produce publishable quality
plots should the user wishes to replicate some of the plots produced by Numerical Transformation Program Page
.
Transform to a new minimum (min) and maximum (max) as nominated by the user, uses the following formula
Vnew = (Vold - minold) / (maxold-minmaxold) *
(maxnew-minmaxnew) + minnew
Transform to a new mean (mean) and Standard Deviation (sd) as nominated by the user, uses the following formula
Vnew = (Vold - meanold) / sdold * sdnew + meannew
Transforming to new minimum/maximum or new mean/Standard Deviation provides a new numerical scale, but do not
change the relationship between the values.
Ranking is frequently used in non-parametric statistical calculations, and in many instances, it is used to constrain
outliers, to bring the distributions closer to that of normal distribution. The values are ranked in ordered, and replicated
ranks are averaged. The ranks can be ascending or descending in order.
Introduction
Logarithm
Power
Box Cox
Poisson Related
Proportion or Probability Related
Polynomial
Transformation may produce values which have a curvilinear relationship with the original. This is commonly required in
the handling of data in the biomedical domain.
A common reason for a curvilinear transformation is to follow the natural relationship between two measurements. For example :
- When using a color reaction to detect the presence of a chemical, the relationship between the depth of color and
the concentration of the substance being measured may be a curve and not a straight line. The relationship can be
established using curve fitting, as explained in Curve Fitting Explained Page
. Once established, the color value can
be transformed into the concentration value.
- A biological reaction may respond in a curvilinear fashion to dosage of a stimulant. For example, uterine muscle contractility
responds to oxytocin stimulation is a log(dosage increase manner), so a linear reaction can only be demonstrated if
the log value of oxytocin administration rate is used.
Another reason for a curvilinear transformation is for measurements with natural closed ended constraints. For example, proportions
have a range of 0 to 1, Poisson distributed counts have a minimum value of 1, and ratios have values exceeding zero (0).
When the range of measurements in a particular analysis is distant from the natural constraints, such as a proportion around 0.5 and counts exceeding 30,
the measurements can be analysed as if they have an infinite range, as the confidence intervals would not impinge or overlap the
constraint values.
However, when the range is close to the natural constrains, such as a proportion <0.05 or >0.95, or a count close to 1,
the confidence intervals calculated many impinge upon or even overlaps the constraint value, making any reasonable
By far the most common reason for a curvilinear transformation is to convert measurements that are not normally distributed to
one that is, so that the powerful and user friendly parametric statistical procedures can be used for analysis. Three
common types of distribution are common in biomedical analysis, and difficult to handle, and these are :
- Ratios have only positive values, and its variance ((Standard Deviation)2) increases with the value. Simple
ratios such as weight and height have a log normal distribution, in that a simple logarithmic transformation will produce
a measurement that is normally distributed.
- Proportions or probabilities have a range of 0 to 1. Its variance decreases as it approaches these extreme
values, so that the confidence interval is only symmetrical when the value is 0.5. If proportion is to be used as a measurement,
it must firstly be transformed.
- Poisson Distributed counts has a minimum value of 1, and its variance is both asymmetrical and increases with the value.
Counts must be analysed using the difficult statistical procedur3es associated with the Poisson distribution, or
transformed before it is handled as parametric data.
- Mathematically complex distribution such as geometric and exponential distribution, which requires specialised procedures
and expertise to analyse. Unfortunately, a very common measurement in biomedical domain, time to events, belong to such
a complex distribution, and transformation is a common method of overcoming the difficulties of analysis.
The other panels presents the different types of curvilinear transformation available in StatTools.
Natural logarithm (y = loge(x), x = ey) is commonly used in preparing data for parametric statistical analysis.
This transformation should be used for any measurement that is a ratio, when there is a positive skew in the data (long
tail on the right hand side). In fact, in many biological measurements that are not initially normally distributed,
a logarithmic transformation renders the data normally distributed.
Values for logarithmic transformation must be >0
Natural antilogarithm or exponential transformation (y = antiloge(x), y=exp(x), y = ex)
is the reverse of the natural
logarithm. This transformation is used at the end of an analysis using log transformed values,
to convert the results to the original units of measurements.
Value for exponential transformation can be both positive and negative values. However very large value may exceed
the limit of calculation and crash the program.
Logarithm with a nominated base produces similar results as the natural logarithm, except that numerically they
are scaled differently. Based logarithm, especially the use of base=10, is favoured by clinicians as the results are
intuitively easier to interpret (1=10, 2=100, 3=1000, etc). The formula is
y = logbase = loge(x) / loge(base);
Values for based logarithmic transformation must be >0
Antilogarithm with a nominated base is used at the end of analysis that used the based logarithmic transformed values,
to reverse transform the results to the original measurement units. The formula is
y = antilogbase(x) = exp(x loge(base))
Value for based antilog transformation can be both positive and negative values. However very large value may exceed
the limit of calculation and crash the program.
Power transformation (y = xpower) is a flexible and powerful tool to change distributions of a set of data.
A power value of <1 uses roots, so tends to move the skew to the left, while power value >1 moves the skew to the left.
The range of positive and negative power values produce different curved relationships between the original (x) and
transformed (y) values.
The square root transformation (y=x0.5 is a special case of power transmission, and it is sometimes used in data
that have a geometric distribution. A typical case is the measurement of discrete periods, such as the number of operations between
complications, number of minutes in waiting time, especially if these periods are close to 1.
The Box Cox transformation was originally devise to transform time to events to a normally distributed measurement for analysis,
the arguments surrounding its use are as follows.
- A very common problem in statistics is a measurement of counts of occurrences, as these have a Poisson distribution,
and are mathematically
complex to handle. Calculations often involves iterative approximations, and computer memory and time are often exhausted
when large values are encountered. Examples of Poisson distributed counts are the number of asthma attacks per 100 child month,
the number of adverse events 100 per bed day in a hospital, and so on. Lambda (λ) is used to represent Poisson
distributed counts.
- One way of handling the difficult math is to use the inverse, (1/λ). The question then can be the number of child months
between each asthma attack, the number of bed days between adverse events, and so on. As these measurements usually
involves time, they are labelled as time to event measurements.
- Time to event measurements have an exponential distribution, which is also difficult to handle. However, the measurement
is continuous and interval. When sample size is sufficiently large and the range of measurements in a study not close to 1,
the data can be handled as approximately normally distributed. Examples of time to event measurements that are handled as
normally distributed are maternal age, gestation age (term babies), days on a waiting list.
- In many cases, a logarithmic transformation or a square root transformation renders the data indistinguishable from normal
distribution. Examples of easily transformed data are gestational age (live births), duration of first stage.
- However, on occasions, when the data set is small, the range of measurements close to 1, the data remains significantly
different to the normal distribution after the usual remedies. The Box Cox transformation is designed to deal with
this contingency.
The Box Cox transformation is a special case of the power transformation, based on a value lambda (λ). The formula is
Forward transformation : y = loge(x) if λ=0, otherwise y = (xλ-1) / λ
Reverse transformation : y = exp(x) if λ=0, otherwise y = {xλ+1)1/λ
The idea is that the data should be transformed using λ to become parametric for analysis. At the end of analysis,
the results are reverse transformed (using the same λ) to the original measurement units.
This leaves the question of which λ to use. In many cases, the correct λ can be found in similar work published,
as comparison of results requires that the same transformation is applied.
When there is no suitable existing λ values, then a value that will best transform the data to normally distribution
should be used. The following criteria are used to determine the closeness a transformed data set is to normal distribution.
- The Standard Deviation : The smallest Standard Deviation is closest to the normal distribution.
- The skewness of the distribution : A skewness value closest to zero (0) is closest to the normal distribution
- The kurtosis of the distribution : A kurtosis value closest to zero (0) is closest to the normal distribution
- The chi Square (with degrees of freedom = 2) summarizes the combined skewness and kurtosis, the smaller the better.
A chi Square value < 5.99 means Probability of Type I Error (α>0.05), and the data is not significantly
different to the normal distribution
- The correlation coefficient ρ comparing the data against a set of theoretically normally distributed values. A ρ
value closest to 1 is closest to the normal distribution.
Depending on the nature of the data to be transformed, the optimal λ using one criteria does not necessarily produces optimal results for other criteria. The program on the Numerical Transformation Program Page
allows the user to choose the criteria
that is most suitable for their purposes.
Published work seems to cite using the smallest Standard Deviation more than other criteria, as the most important statistical
reason for transformation seems to be to stabilize variance, which means a small Standard Deviation.
Please Note : The author came across the following transformations while researching the Box Cox transformation, and thought these may be of interest or useful to some users. The author has no experience in using these transformations.
When dealing with Poisson distributed counts, researchers may use the reverse of a Poisson Count (1/λ) to analyse data,
as this is a continuous measurement that can be transformed into a normal distribution.
The impetus for developing a direct transformation for the Poisson distributed count comes from radio-active scanning,
particularly that of Positron Emission Scanning, where the image received consists of pixels, each one of which is a count
of emission. Given that the Poisson distribution is asymmetrical on the two sides of the mean, with a very long right tail,
there is a need to transform counts into a normally distributed measurement, so that de-noising algorithms can be applied.
The Freeman-Tukey Transformation is an approximation of the square root transformation, with an adjustment for
low values. The formula is y = sqrt(x+1) + sqrt(x). With high count values, x+1 is approximately the same as x, so the
transformation is effectively 2sqrt(x).
The Anscombe Transformation is also a variant of the square root transformation, where y = 2sqrt(x + 3/8). As the
value of x increases, the 3/8 addition becomes increasingly trivial, so that, with very large counts, the transformation
is approximately 2sqrt(x).
The interesting part of Anscombe is the reverse of the transformation, as the asymmetrical and unstable variance returns
when the results are reverse transformed into counts. Three variants of the reverse transformation are therefore offered.
- The Algebraic reverse transformation uses the reverse of the forward formula, so that the original values can be
regained. The formula is y = x2/4 - 3/8
- The Asymptotically Unbiased reverse transformation reduces the bias caused by the asymmetry of the variance,
particularly when the counts are high. It is less effective in removing the bias when the count is low. The formula is
y = x2/4 - 1/8
- The Exact reverse transformation, which adjusted for count value and remains unbiased regardless of count value,
is proposed as the optimal solution. The formula is
y = (x2)/4 + sqrt(1.5x-1)/4 - x-2(11/8) + sqrt(1.5x-3)(5/8) - 1/8

Proportion or probability are increasingly being used as a measurement. Examples are the probability that a baby is genetically
abnormal, or the risk level of complication for a surgical procedure.
Although proportion is a continuous measurement, it is close ended, with a minimum value of zero (0) and maximum of 1.
It follows the binomial distribution, which is similar to the normal distribution when the sample size is large and the value
near 0.5. As the sample size decreases so that its variance increases, or that the value approaches the extremes of 0 or 1,
its natural variance, measured as the Standard Deviation, becomes increasingly asymmetrical, as shown in the diagram to the left.
Many statistical procedures treat proportions as if they are normally distributed, and this works reasonably well in most cases.
However, if the sample size is small, or the value close to the extremes of 0 or 1, the confidence interval overlaps
the 0 or 1 value, creating a conceptually inconceivable situation, and produces a flawed conclusion.
There are two commonly used transformation to create normally distributed measurements from proportions or probability.
- Arcsine transformation treats a probability measurement as the sine of an angle, and transform the value into radians,
so that the range of 0-1 becomes 0 to π/2. The formula is y = arcsine(sqrt(x)) and the reverse is y = sine(x)2.
For those who prefers to deal with angles as degrees, then degree = 180 (radian / π), and radian = (degree / 180)π
- Increasingly however, the Logit transformation, and its reverse Logistic transformation are used.
Conceptually, the proportion is transformed to an odd,
which is a ratio and have a log normal distribution. The odd is then logarithmically transformed to produce a normally
distributed measurement. The logit transformation produce a truly normally distributed set of values, so that a
proportion of 0 becomes -∞, a proportion of 1 becomes +∞ and a proportion of 0.5 becomes 0
- The forward or logit transformation : logit = log(odd) = log(proportion / (1 - proportion)); y = log(x/(1-x))
- The Reverse or logistic transformation :
- odd = antilog(logit) = elogit = exp(logit)
- proportion = odd / (1 + odd);
- Logistic = proportion = odd / (1+odd) = exp(logit) / (1 + exp(logit)) = 1 / (1 + exp(-logit))
- Logistic y = 1 / (1 + exp(-x))
In the bio-medical domain, there are many curvilinear relationships that do not follow precise mathematical formulation, and these
must be established by empirical observations and comparisons. A common algorithm to do so is polynomial curve fitting, where
y = a + b 1x + b 1x 2 + b 3x 3 .... and so forth, although in biology,
polynomial coefficients beyond the third power have very little practical implication, and are therefore seldom used.
Biochemical laboratory uses curve fitting extensively in translating observations in a test, such as radiation counts, color depth,
light absorption, to concentration of substances they are measuring.
Curve fitting is also used clinically, often to predict outcome, a common use in obstetrics being to nominate the average
birthweight or an ultrasound measurement from the gestational age.
StatTools provides a curve fitting program in the Curve Fitting Program Page
, which allows curve fitting
to the fifth power, far in excess of most requirements. Results for such curve fitting, or formula obtained from published work,
can be used to transform data accordingly in the Numerical Transformation Program Page
Introduction
Fuzzy Logic
The miscellaneous section is used to contain all those transformations that may be useful, but do not belong to any specific
classification. Although these are few currently, it is expected that more will be added as StatTools continues
to expand in response to user needs.
The concept of fuzzy logic was explored by the department in the early 1990s, when there was an interest in using the neural
network to establish a system of machine based decision making. Because the fuzzy logic transformation is useful, it is retained
and continued to be used as a statistical transformation, when the department's interest in machine based diagnosis waned.
Fuzzy logic uses the mathematics of probability to define truth. In other words, it sees true and false merely as extremes,
when reality is somewhere in between. Fuzzy logic therefore uses the principle of
logistic transformation to convert any measurement into degrees of certainty with a value between zero (0) and one (1), where
0 is absolute certainty for negative, and 1 absolute certainty for positive.
Fuzzy logic transformation is also commonly used to transform a continuous measurement into a bimodal distributed conclusion.
Thyroxine levels are transformed into probability of thyrotoxicosis, temperature readings into fever, pH into acidosis or
alkalosis, PO2 into hypoxaemia, and so on.
The basic logistic formula is p = 1 / (1 + exp(-v)), where v is value ranging from -∞ top +∞, and p is probability
ranging from 0 to 1. There is no difficulty in interpreting probability (p) as level of certainty, but the range of
v varies from -∞ to +∞, so the main issue in fuzzy logic is how to scale the measurement value x into a convenient
parameter v to be used in the logistic equation. The algorithm is as follows
- A low value (vlow) to represent negative and a high value (vhigh) to represent positive are defined.
- A low probability value (plow) to make a negative decision and a high probability value (phigh)
are nominated for decision making. These p values should be symmetrically the same distance from the neutral p value of 0.5.
Commonly 0.05 for plow and 0.95 for phigh are used
- The midpoint measurement value, where no decision can be made (p=0.5), is the average of high and low values,
m = ((vlow) + (vhigh))/2
- The distance from the midpoint, when the positive decision is to be made is d+ = vhigh - m
- This distance in value is then scaled to a logit value so that dlogit = d+ / logit(phigh)
- All values (x) to be transformed are then calculated as y = logistic((x-m)/dlogit, or z = (x-m)/dlogit,
then y = 1 / (1 + e-z)
pH | Acidosis |
7.100 | 0.999 |
7.125 | 0.997 |
7.150 | 0.993 |
7.175 | 0.981 |
7.200 | 0.950 |
7.225 | 0.877 |
7.250 | 0.727 |
7.275 | 0.500 |
7.300 | 0.273 |
7.325 | 0.123 |
7.350 | 0.050 |
7.375 | 0.019 |
7.400 | 0.007 |
An example is as follows
We wish to have an indicator for the health of the new born by measuring it's umbilical cord blood pH, and
set the following criteria
- We use p=0.05 to decide unlikely or negative diagnosis (plow=0.05) and p=0.95 to decide likely or positive
diagnosis (phigh = 0.95). From this the logit for positive, logit(0.95) = Log((0.95 / (1-0.95))) = 2.9444
- We decided that a pH < 7.2 constitute acidosis (vhigh=7.2), and a pH > 7.35 should be considered normal
or no acidosis (vlow=7.35). Please note low and high means probability and not value of measurements.
- Midpoint value m = (vlow + vhigh) / 2 = (7.20 + 7.35) / 2 = 7.275
- The distance (in value) between a positive decision and midpoint is d+ = 7.20 - 7.275 = -0.075
- This difference is scaled to logit, so that dlogit = -0.075 / 2.9444 = -0.0255
- Transformation of pH to acidosis is then pacidosis = logistic((pH - 7.275) / -0.0255), or
z = (pH - 7.275) / -0.0255, then pacidosis = 1 / (1 + e-z)
plow,phigh | logit(phigh) |
0.50,0.50 | 0.000000 |
0.40,0.60 | 0.405465 |
0.30,0.70 | 0.847298 |
0.20,0.80 | 1.386294 |
0.15,0.85 | 1.734601 |
0.10,0.90 | 2.197225 |
0.05,0.95 | 2.944439 |
0.01,0.99 | 4.595120 |
The table to the left shows how pH from 7.1 to 7.4 are transformed into p = 0 to 1 for acidosis, with p=0.05 when pH is 7.35,
and p=0.95 when pH is 7.20. The plot of the relationship is shown in the diagram above and to the right. It can be seen that a linear
measurement is transformed into a bimodal distribution, acidosis is low, close to or below 0.05, when pH is 7.35 or more, and
acidosis is high, close to or above 0.95 when pH is 7.20 or less. The range of uncertainty is between 7.20 and 7.35.
By changing the values vlow and vhigh, users can define ranges of values that can be interpreted as
positive, negative, and varying level of uncertainty. The program in the Numerical Transformation Program Page
allows these
parameters and the data to input by the user, but fixed the Plow to 0.05 and Phigh to 0.95.
The table to the right provides logit values for different p values, which the users can use if he/she requires decision
levels other than p=(0.05,0.95)
|