Related link :
Discriminant Analysis (Analysis Using Reference Data) Program Page
Discriminant Analysis (Use of Coefficients on New Data) Program Page
Discriminant Analysis (Plotting Discriminant Map) Program Page
Explanation
Analysis Using Reference Data
Use Model on other data sets
Plotting Discriminant Map
References
Introduction
The programs to perform Discriminant Analysis in StatTools provides only the basic calculations that creates and use the Discriminant model, and this explanation page discusses only those features that supports the use of this basic sets of programs. Users requiring more details can follow the trail from the references provided. Those wishing to acquire an in depth understanding of the subject should attend the appropriate courses, which usually requires a one semester study at the Masters level in a tertiary institution.
Mathematically, Discriminant Analysis has some similarity to Multiple Regression. Both are based on the Least Square method, and assume that the data is parametric (Normally distributed measurements).
- In the case of Multiple Regression, the independent variables are binary or at least ordinal, and the dependent variable
a parametric measurement.
- In Discriminant Analysis, the independent variables are parametric while the dependent variable represents groups which
are not related.
The use of Discriminant Analysis differs from that of Multiple Regression.
- Multiple Regression is used to model how a particular measurement is affected by, and in many cases can be predicted from,
its associated independent variables
- Discriminant Analysis is usually used to classify and separate individual into pre-conceived groups.
Organization of Programs
All procedures of Discriminant Analysis offered by StatTools placed into 3 program pages according to how they may be used. These are briefly described here, but explored in greater details in their own panels on this page.
The Discriminant Analysis (Analysis Using Reference Data) Program Page
is provided for the initial analysis using a set of reference data. If successful, this produces a model by which future individuals can be correctly classified. This is used by researchers wishing to develop a clinical classification system.
Discriminant Analysis (Use of Coefficients on New Data) Program Page
is provided to use the statistical results obtained from the reference data on future and independent sets of data. This is used by clinicians, having accepted the validity of the model already produced, to classify new cases as they present.
Discriminant Analysis (Plotting Discriminant Map) Program Page
is provided for the research assistant or secretary, to produce graphic representation of results produced by the two previous programs.
Technical Issues
The programs from StatTools are assembled from information available in the public domain (see references), and the results tested against that produced by SPSS. Although the results are numerically the same as that from SPSS (other than small rounding errors), they differed in the following manner
- Many safety checks (e.g. whether the independent variables are truly parametric), and many intermediary results
(covariance tables) are not presented by StatTools
- StatTools uses the correlation matrix and not the covariance matrix. This means all measurements are converted to
z values (z=(value-mean)/SD), so have a mean=0 and SD=1 before calculation. The coefficients
thus created is called the Standardized Discriminant Coefficient.
- The means and Standard Deviations for each variable used in the Discriminant Analysis (Analysis Using Reference Data) Program Page
are
calculated from the data
- The means and Standard Deviations for each variable used in the Discriminant Analysis (Use of Coefficients on New Data) Program Page
should be from
the reference data that produced the coefficient. Users do not always adhere to this, and use either the data being
processed or some estimate of population means and Standard Deviation.
- The results is produced to 4 decimal point precision by default. This is usually unnecessary, as most reports
of Discriminant Function and probability use 2 decimal precision
- The function values, being created using normalized values, are also normalized (mean=0 and SD=1), and not related to
the units of measurements used in the input data.
- SPSS produces the probability estimates using the Maximum Likelihood method. This is also presented by
StatTools, with the addition of the Bayesian model that incorporates the apriori probability and a loss function.
The panel discusses the procedure provided by the Discriminant Analysis (Analysis Using Reference Data) Program Page
. The purpose of the procedure is to analyse a set of reference information to produce a classification model based on the Discriminant Analysis.
Reference Data Set The reference data set must be carefully selected, so that it is representative of the patterns observ3ed in the community. Often retrospective data are used, but they have the disadvantage of not being representative of the population in the future that will use the model produced. Another difficulty is that the sample size in different groups may not be similar, so the model is dominated by patterns in the group wth larger sample size.
After selecting the representative samples, attention has to be paid as to what to measure. Care should be taken not to select independent variables that are too closely correlated, as they represent tautology, the same concept measured twice and have unwarranted influence on the final model.
Default Example Data offered by StatTools This is a computer generated set of numbers produced to demonstrate how the program works, and it does not represent any real observation.
Wine | Group | tannin | color | acidity | sugar |
Sweet Red | 1 | 1.2 | 45 | 3.16 | 72.7 |
Sweet Red | 1 | 1.3 | 67 | 3.38 | 102.4 |
Sweet Red | 1 | 1.1 | 48 | 3.61 | 33.7 |
Sweet Red | 1 | 1.6 | 36 | 3.51 | 58.2 |
Dry Red | 2 | 1.5 | 47 | 3.20 | 44.2 |
Dry Red | 2 | 1.5 | 74 | 3.21 | 91.8 |
Dry Red | 2 | 1.7 | 47 | 3.39 | 53.1 |
Dry Red | 2 | 1.6 | 56 | 3.36 | 88.5 |
Sweet White | 3 | 1.1 | 27 | 3.30 | 36.3 |
Sweet White | 3 | 1.0 | 53 | 3.55 | 74.7 |
Sweet White | 3 | 0.9 | 37 | 3.23 | 94.2 |
Sweet White | 3 | 1.2 | 23 | 3.07 | 53.8 |
Dry White | 4 | 1.4 | 44 | 3.34 | 20.7 |
Dry White | 4 | 1.3 | 34 | 3.24 | 9.5 |
Dry White | 4 | 1.1 | 37 | 3.24 | 17.8 |
Dry White | 4 | 1.4 | 55 | 3.35 | 35.9 |
We wished to set up a model to classify wine into 4 categories of Sweet Red (1), Dry red (2), Sweet White (3), Dry White (4), using 4 measurements of tannin (g/l), Color absorption (%), acidity (pH), and sugar concentration (mg/100ml). We selected 4 each of these wines to build the model. Please note that a real exercise will have used many more samples per group, and make many more measurements. This simplified data set is used to make understanding easier. The numerical part of this table is the default example data of the Discriminant Analysis (Analysis Using Reference Data) Program Page
On clicking the Calculate button, the following results are produced
Table of the data as entered . This is for error checking by the user, to make sure the intended data has been entered, without error
Table of mean and Standard Deviation for each variable, calculation uses all the cases in the data set. This table is used to
convert the data entered into standardized z values, z = (value-mean)/Standard Deviation.
Table of mean and Standard Deviation for every combination of group and variables, using z values. This allows the researcher to scan how the groups differ and contribute to the model
Table of Chi Square for statistical significance of each function produced. The maximum number of functions is (number of groups -1) and in this study is 4-1=3, and they are presented in decreasing order of contribution (importance). In many cases, when the groups can be easily separated, as in this set of data, some of the functions are statistically not significant (p>0.05) and can be discarded (function 3 in this case).
Table of Standardized Canonical Discriminant Coefficients (cf). These are values to be used to calculate the Function Scores for each case where Function Score FS=Σall variables z*cf. The distances between the functional values and the centroids will ultimately be used to classify each case.
Table of Centroid values. The Centroid is an expression of the mean Functional Scores for each function in each group, and represents the central value. The distance between the Function Core of each individual to the Centroid is the indication of how closely the individual belongs to that group.
The 3 tables of Mean and Standard Deviation, Standardized Discriminant Coefficients, and Centroid represents the model produced by the Discriminant Analysis, and are used to allocate cases to groups.
Checking the results of analysis. The same data is used to check against the analysis produced. This is presented in 2 tables because otherwise the width of the table would be too wide for a web page if there are multiple functions and groups.
Column 1 is the row number, used to identify the individuals in the data set
The Function Scores are the scores of each functions for each individual. There are only 2 functions that are statistically significant in this exercise and they are used.
The D2→Centroid is the Sum of Square of the distance between each function Score and the Centroid value of the group, where D2 = ΣAll functions (Function Score - Centroid Value)2. This represents the square of the distance between the individual and the centroid of each group. The individual is then allocated to the group where D2 is the smallest, represented by the bold font.
The Normal Density Probability→Group is an alternative expression of D2 and represents the probability
of that individual belonging to that group, assuming D2 to be normally distributed. Normal Density Probability = exp(-D2/2). The individual is then allocated to the group with the highest probability, which should be the same group that had the lowest D2. The reason for calculating Normal Density Probability is that this is the variable used to calculate the subsequent Maximum Likelihood and in future data the Bayesian Probabilities.
The Maximum Likelihood Probability is an adjustment to the overlaps in Normal Density Probabilities of the groups. Given the overlap, the sum of Normal Density probability may be >1, so Maximum Likelihood Probabilitygroup = Normal Density Probabilitygroup / Σall groupsNormal Density Probability. The individual is allocated to the group with the highest Maximum Likelihood Probability. As this again is a transformation, the allocation should be the same as before.
How an individual should be allocated by Bayesian Probability taken into consideration of apriori probability and cost function will be discussed at the next panel.
This panel discusses the procedures involved and results obtained from the Discriminant Analysis (Use of Coefficients on New Data) Program Page
.
The purpose of the program is to take a Discriminant model already validated, and apply it to new individual set of data, allocating its cases to the groups already identified.
Data Input
- The 3 tables that represents the Discriminant model, the Mean and Standard Deviation for all variables, the Standardized Discriminant Coefficients, and the Centroid Values
- The data to be allocated, a table where the rows are individuals, and the columns the variables. It is meant for new
independent sets of data, but for the exercise in StatTools, the same data table (without the first column
designating groups) is used, so that users can compare the results with that from the
Discriminant Analysis (Analysis Using Reference Data) Program Page
- In addition, the program calculate Bayesian probabilities and requires the following additional parameters
- The apriori probability for each group. The maximum Likelihood Probability is calculated on the assumption that all groups
are equally likely, but this may not be true when the model is used on an independent set of data. The population density
of the different groups may be different, a probability already calculated from some other data, or that the user would like
to assigned different probabilities to the groups before using the Discriminant model.
An array of apriori probabilities for each of the group is entered. The convention is to use probability
(a number between 0 and 1), but the program can handle any measurement of probability (percent, actual number of cases), as
the values are converted into probabilities that sums to the value of 1
If the apriori probabilities are unavailable or irrelevant, then an array of same numbers should be entered
- The cost function is a measurement of cost or disadvantage if a case is erroneously not allocated to that group
An array of costs for each of the group is entered. Any measurement of cast can be used (number of dollars, death
or complication rate) but the same unit of measurement must be used across the groups. The program then normalized all
the values as fractions that totalled to 1
If costs are unavailable or irrelevant, then an array of same numbers should be entered
Results
The same set of outcome results as that described in the previous panel are produced. These includes, for individuals, the Function Score, the D2→Centroid, the Normal Density Probabilities, and the Maximum Likelihood Probabilities. Explanations for these are provided in the previous panel and will not be repeated here.
In addition, the Bayesian Probabilities are calculated.
Bayesian Probability taken into consideration of apriori probability is calculated in two steps for each case
- The maximum Likelihood Probability for each case in eace group is calculated as pgroup = Normal Density
Probability x Apriori Probability
- The Bayesian Probability for each group is then calculated as pBayesian = pgroup / Σall
groupspeach group
Bayesian Probability taken into consideration of apriori probability and cost is calculated in two steps for each case
- The maximum Likelihood Probability for each case in eace group is calculated as pgroup = Normal Density
Probability x Apriori Probability x cost
- The Bayesian Probability for each group is then calculated as pBayesian = pgroup / Σall
groupspeach group
The Bayesian adjustments are necessary as group allocation does not occur on its own. Most users have a preconceived apriori probability, based on understanding of natural occurences and results of other statistical tests, and use Discriminant classification as an adjunct to adjust this apriori probability.
Also, in many cases, groups differ in impact and risks, and wrongly not allocating to some group have greater impact than that to other groups, so this should also be taken into consideration.
Although the Function Scores remain the same, the Bayesian adjustment changes the allocation of marginal cases to different groups.
This panel supports the graphic program in the Discriminant Analysis (Plotting Discriminant Map) Program Page
. The program takes results obtained in
the Discriminant Analysis (Analysis Using Reference Data) Program Page
or the Discriminant Analysis (Use of Coefficients on New Data) Program Page
, and produce graphic representation of the relationship between individuals and the groups.
The bitmap is in the form of a two dimensional diagram, each axis representing a Discriminant Function, so that the relationship between only two functions can be visualized in any bitmap. Discriminant Analysis however produces coordinates in a multi-dimensional Euclidian space, but the bitmap only allows examination of any 2 dimension at a time. Therefore, when 2 functions are produced, 1 bitmap is needed (AB). When 3 functions are available, 3 bitmaps are required (AB,AC,BC), when 4 function are available 6 bitmaps are required (AB,AC,AD,BC,BD,CD), and so on. In the default example we will use the results produced by the default data in
Discriminant Analysis (Use of Coefficients on New Data) Program Page
, with two Functions, so only one bitmap.
Data input consists of
- The table of Centroid values which represents the model
- The table of function Scores, which is the result produced from the data
- The two arrays of apriori probability and costs.

The results are two bitmaps of the same size (400 x 400 pixels). These can be used separately, or superimposed on each other using Powerpoint, depending on the user's preferences.
The first bitmap is a map of territories occupied by each group on the bitmap, as shown in the bitmap to the right. This depends on the Centroid value of each group, and the territories adjusted by the Bayesian formulae. To avoid excessive labelling, up to 7 groups can be mapped on the bitmap, the colors being
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
This bitmap can be copied and pastes to other applications, and the color changed according to the user's reference. The bitmap can then be used to visually assigning any set of Function Scores to a group
The second bitmap is a plot, with scales of the functions marked at the border, and the Function Scores of each individual links to the centroid of the group the individual is assigned. The plot is accompanied by MacroPlot codes, and for users familiar with Macroplot the bitmap can be altered according to preferences.
Both plots can be copied and pasted to Powerpoint or any other graphic software and further modified. The white background of the plot can be made transparent in Excel, and place in front of the map, so that the areas occupied by each group, the location of the Centroids, and the locations of each individual can be visualized. Further labelling can also be added. The results is as the bitmap to the left
When the apriori and cost functions are changed, both the areas occupied by each group, and how each individual is assigned, also changed. If the apriori is changed from 4 4 4 4 to 1 2 1 3, because the dry wines are more common, and dry whites slightly more common than dry reds, and if we changed the costs to 1 1 5 1 because sweet whites are expensive and therefore assigned a cost more than the others, the resulting plot would be as shown to the right. Please note that the areas occupied by each group have now changed, and the wine with overlapping or marginal qualities, are now assigned to a different group.
Overall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology.
McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6
- Chapter 2 p.24-56 : Matrix math. Particularly the Square Root method of matrix inversion.
Also calculations for between and within group Sum Product and Covariance matrices
- Chapter 10 p.280-306 : Multiple Discriminant Analysis, particularly the algorithm.
- Chapter 13 p. 345-371 Normal Probability Density Model for classification.
- Chapter 14 p. 373-383 Use of Canonical Correlates for Classification. Chapters 13 and 14
provided the algorithms for calculating Maximum Likelihood and the Bayesian Probabilities
Press WH, Flannery VP, Teukolsky SA, Vetterling WT (1989). Numerical Recipes in Pascal.
Cambridge University Press IBSN 0-521-37516-9 p.395-396 and p.402-404.
Jacobi method for finding Eigen values and Eigen vectors
Norusis MJ (1979) SPSS Statistical Algorithms Release 8. SPSS Inc Chicago
Chapterr 23 : Discriminant p. 69-83. Extensive algoritm provided by SPSS.
George D and Mallery P (1999) SPSS for Windows Step by Step. A Simple Guide and Reference.
Allyn and Bacon, Sydney. ISBN 0-205-28395-0 Chapter 26. The Discriminant Procedure p.313-328.
Wikipedia on Discriminant Analysis.
|