Factor Analysis Exp

StatTools : Factor Analysis Explained

Links : Home Index (Subjects) Contact StatTools

Introduction Basic Exploratory Factor Analysis Advanced Topics References

This page describes the suite of explanations, tables, and programs related to Factor Analysis that are available in StatTools.

Excellent statistical packages for Factor Analysis are widely available, in software packages such as SAS, STATA, SPSS, and LISREL. There are also excellent free packages available for download. (see references). All of these require users to set up options, then perform all the procedures in a single session.

StatTools does not duplicate these excellent packages, and instead provides algorithms performing individual procedures involved in Factor Analysis. This allows users to take intermediate results from their own or published analysis, and by using a different procedure, produce a different set of results.

There are two major types of Factor Analysis, Exploratory and Confirmatory. Confirmatory Factor Analysis is a very large and complex subject, briefly introduced in a subsequent panel on this page but otherwise not covered by StatTools.

The rest of this page, and the associated pages and procedures, concerns Exploratory Factor Analysis, particularly that of Principal Component Analysis.

Overview Component/Factor Extraction Factor Rotation Factor Scores

Exploratory Factor Analysis has no a priori theory or hypothesis, and is sometimes call unsupervised clustering. The variables are clustered according to how they correlated with each other.

Exploratory Factor Analysis is usually carried out for either of two purposes, each with their own sets of assumptions and procedures.

The first is the Principal Factor Analysis, with the intention to explore the relationships between a set of variables.

The Correlation matrix is used, but the values of the diagonal elements are replaced by communalities.
- An initial Principal Component Analysis may be used to obtain the communalities, which are then inserted into the diagonal cells of the correlation matrix
- Alternatively, the diagonal cells are replaced by the largest of the absolute values of the correlation coefficient in the same column or row
The resulting Factor pattern matrix describes the Factors as clusters of variables, and the coefficients the relationship between the variables to each factor.

The second is the Principal Component Analysis, with the intention of data reduction, where a large number of variables are summarised or compressed into a fewer number of Factors.

The Covariance or Correlation matrix can be used, with the diagonal elements unchanged.
- The covariance matrix is used if the intention is to enable the future creating of factor scores directly from measurements of the variables concern.
- The correlation matrix is the same as the covariance matrix, except that all measurements are transformed to that with mean value=0 and Standard Deviation=1. It is more useful to researchers wishing to examine how the variables contribute to the factors, but the creation of subsequent factor scores requires the variables to be firstly converted the z value (z = (value-mean)/Standard Deviation).
Although the resulting Factor pattern matrix can be used to describe the relationship between the variables to each factor, the main purpose of the Principal Component Analysis is to use the Factor Pattern to create Factor Scores for clinical use.

There are numerous methods of extracting factors/components from a correlation or covariance matrix, each with its own assumptions and procedures. Although they all produce approximately the same results, from the theoretical point of view, the results mean different things. StatTools does not cover most of the methods, only mention the Maximum Likelihood method, and describes the Jacobi Method.

The Maximum Likelihood method provides a good estimate of relationship between variables and their factors, and is usually used in Confirmatory Factor Analysis and in the Principal Factor model of Exploratory Factor Analysis.

The Jacobi method is essentially a mathematical rotation of the original matrix into a Principal Component matrix. The amount of information is retained, but the Principal Components contain a hierarchy of the variances, so that the major components can be retained for further analysis, while the minor ones contains only trivial information are discarded. This method is presented by StatTools as it is used as default by SPSS, and is the simplest, most commonly method of extraction used.

.	var_1	var_2	var_3	var_4	var_5	var_6
var_1	1.00	0.70	-0.26	-0.27	0.38	0.19
var_2	0.70	1.00	-0.19	-0.38	0.38	-0.03
var_3	-0.26	-0.19	1.00	0.50	0.00	0.05
var_4	-0.27	-0.38	0.50	1.00	0.13	0.28
var_5	0.38	0.38	0.00	0.13	1.00	0.57
var_6	0.19	-0.03	0.05	0.28	0.57	1.00

.	Comp_1	Comp_2	Comp_3	Comp_4	Comp_5	Comp_6
var_1	-0.86	-0.09	0.21	0.33	-0.23	0.22
var_2	-0.84	0.06	0.42	-0.03	0.10	-0.31
var_3	0.49	-0.46	0.66	-0.25	-0.22	0.06
var_4	0.52	-0.68	0.15	0.44	0.23	-0.08
var_5	-0.55	-0.69	-0.07	-0.26	0.35	0.17
var_6	-0.21	-0.79	-0.44	-0.05	-0.33	-0.17
Eigen Value	2.29	1.79	0.88	0.44	0.40	0.21

An example uses the default data from the Factor Analysis - Principal Component Extraction Program Page . The table to the left is the original correlation matrix, and that to the right the transformed Principal Component matrix. Please note that the same number of Principle Components as the number of variables are produced, but there is a hierarchy in the amount of information from left to right, represented by the Eigen value. The common approach is to retain those Principal Components that have Eigen values >=1 for further analysis, and rejecting the remainder as trivial. In this example, the first two components (Eigen values of 2.29 and 1.79) are retained.

Although the results of initial extraction are referred to as Principal Factors or Principal Components, depending on whether the diagonal of the matrix is replaced by the communalities, and the mathematical procedures used, the term Factor is usually used to represent both in subsequent calculations.

The mathematics of the initial Factor extraction procedure is based on extracting the maximum amount of information (variances and correlations) from the correlation matrix, but these initial Factors are often conceptually uninterpretable and require further processing in terms of rotation.

To enable interpretation, the Factor matrices must be rotated to achieve simple structure. This means that, as much as possible, each variable should have a high loading on one Factor and low loading on all the other Factors. By convention, a high loading is >=0.4 or <=-0.4, and a low loading is between -0.2 and 0.2.

The results of rotation are usually presented in 2 matrices. The Pattern matrix, where the coefficient represents how much each variable contributes to the factor, and the Structure matrix where the coefficients are the correlation coefficients between each variable and each factor

Types of Rotation : There are two types of rotation.

.	Comp_1	Comp_2	Comp_3
var_1	-0.86	-0.09	0.21
var_2	-0.84	0.06	0.42
var_3	0.49	-0.46	0.66
var_4	0.52	-0.68	0.15
var_5	-0.55	-0.69	-0.07
var_6	-0.21	-0.79	-0.44

.	F_1	F_2	F_3
var_1	0.83	0.23	-0.20
var_2	0.94	-0.01	-0.12
var_3	-0.05	-0.08	0.94
var_4	-0.35	0.34	0.71
var_5	0.43	0.77	0.09
var_6	-0.05	0.93	0.04

Orthogonal rotation obtains the best possible simple structure conditional that the factors are uncorrelated (thus orthogonal) with each other. The results are a number of independent Factors each containing measurements that are highly correlated with it. There are different numerical methods to perform orthogonal rotation, StatTools uses the Normalised Varimax Rotation, the default method used by most statistical packages.

The default values from the Factor Analysis - Factor Rotation Program Page is used as an example to demonstrate. The 3 largest Principal Components extracted in the previous panel is used, as shown in the table to the left.

The table to the right shows the results after orthogonal (Varimax) rotation. After orthogonal rotation, the loading is the same as the correlations, so that the Pattern matrix is the same as the Structure matrix, so only one matrix is produced. The structures of the factors are now more apparent, where variables 1 and 2 loaded strongly to Factor 1, variables 5 and 6 to Factor 2, 3 and 4 to Factor 3.

Oblique rotation removes the constraint that the Factors must be uncorrelated, and rotate the factors in such a manner to obtain the simplest structure possible even if the factors become correlated. There are a number of numerical methods available to perform oblique rotation, the algorithm in the Factor Analysis - Factor Rotation Program Page uses the Direct Oblimin rotation, which is most commonly used and the default method for SPSS. The degrees of correlation allowed is controlled by the coefficient δ, in a value between -1 and +1. Negative values de=crease the amount of correlation which is not used, and in most cases the default δ=0 is used.

.	F_1	F_2	F_3
var_1	0.82	0.17	-0.09
var_2	0.96	-0.09	0.03
var_3	0.08	-0.14	0.97
var_4	-0.29	0.33	0.65
var_5	0.40	0.74	0.08
var_6	-0.11	0.94	-0.06

.	F_1	F_2	F_3
var_1	0.87	0.26	-0.29
var_2	0.94	0.04	-0.24
var_3	-0.19	0.01	0.93
var_4	-0.42	0.39	0.77
var_5	0.47	0.80	0.08
var_6	0.02	0.92	0.10

As the factors now correlates, the loading from different factors overlap, so the coefficient in the Pattern (amount of loading) and the Structure (correlation coefficient) are no longer the same, and two tables are produced, as shown above. The table to the left is the Pattern matrix and on the right the Structure matrix.

Comparing the Pattern matrix of the oblique rotation against that from orthogonal rotation, it can be seen that the pattern is simpler, with the loading coefficient more polarised towards or away from 0 after oblique rotation. The Structure matrix also shows that the correlation between a variable and the factor it dominates is greater, further clarifying the overall structure.

.	F_1	F_2	F_3
F_1	1.00	0.13	-0.26
F_2	0.13	1.00	0.14
F_3	-0.26	0.14	1.00

The oblique rotation allows the factors to be correlated, and the table to the right shows the correlations between the factors.

In a complex and large Factor Analysis, the correlation matrix between the factors are subjected to further Factor Analysis, resulting in factors at a different hierarchical level.

An example is the development of a measurement for intelligence, the IQ Test. Many tests of abilities are analysed into factors of reading, arithmatic, and other basic skills, each a factor in a specified ability set. As skills tend to overlap, these factors are correlated, and the correlation matrix analysed at the next level into literary, numerical, abstract thinking skills, and so on. These also correlate, so additional Factor Analysis are carried out until the highest level of measurement, the IQ.

Orthogonal, Oblique, or both

The decision to use orthogonal or oblique rotation must primarily be determined by the theoretical construct underlying the exercise.

For example, in the development of the IQ Test to measure intelligence, it was clearly recognised that mental abilities in different activities inevitable overlap, and that whatever the test devise, the result will be influenced by more than a single skill. A model based on oblique rotation is a very logical approach to this problem

On the other hand, to develop a measurement for quality of a health care worker will recognise that the measurement is an umbrella under which a whole range of independent attribute such as diligence, intelligence, empathy, knowledge, manual skills, and so on contribute, mostly independent of each other. The model to develop such a set of measurements is more likely to use orthogonal rotation, so that areas of strength and weakness can be more clearly identified.

Where there is no guidance from theory, often both are tried to see which results in a better outcome. The question is then which one should be tried first.

One model is to recognise the theoretical advantages of having factors that are independent and uncorrelated with each other, so the orthogonal rotation (Varimax) is tested first. Only if an acceptably simple structure cannot be achieved, with too many variables have high loadings across multiple factors, should orthogonal rotation be abandoned, and the oblique (Oblimin) rotation used.
The other model is to recognise that, in any single Factor Analysis exercise, the object is to create a simplified structure encompassing many related and overlapping constructs, that correlations between variables and factors are inevitable, and that the priority should be given to creating as simple a structure as possible, where each variable loads high on only one factor, and low on all other factors. In this scenario, the oblique (Oblimin) rotation is tried first, and the correlation matrix between the factors examined. Only when the correlation coefficients between factors are so low as to be negligible, should the orthogonal rotation be used.
The algorithm in the Factor Analysis - Factor Rotation Program Page performs both rotations, allowing users to inspect and select the one more suitable to their needs.

In many cases, particularly in exercises using the Principal Component Analysis, the purpose of Factor Analysis is to create a multivariate instrument of measurement that can be used clinically with future data. An example is the Factor Analysis used to develop a measurement of depression.

Once the Factor Analysis is completed, and we have the Pattern matrix to represent the results, the next consideration is how to use this clinically in the future. In other words, once we have the Pattern matrix for depression, how do we measure depression with it.

The algorithm is the Factor Analysis - Produce Factor Scores Program Page allows the use of the Pattern matrix to create factor scores from future relevant data set. We will use the default example data in that page to demonstrate how this is done, step by stem.

The Pattern matrix produced after factor rotation is placed in the first text box, top left of the page. Orthogonal (Varimax) rotation produces only one matrix and this is used. Oblique rotation (Oblimin) produces two matrices, Patern matrix and Structure matrix, and the Pattern matrix must be used.

From the Pattern matrix, the algorithm creates the W Matrix, with columns for factors and rows for variables. For each subject, and each Factor, the Factor score is the sums of products between the z values of each variable and the W coefficient between that variable and that Factor.

The mean and Standard Deviation (SD) matrix is entered into the second test box, top right of the page. This is a 2 column matrix, each row represents a variable, column 1 is mean and column 2 Standard Deviation of that variable. This matrix is used by the algorithm to calculate z values (z=(value-mean)/SD)

The data matrix is entered into the third text box (second row). This is the data to be used to create the factor scores. Each row represents a subject, and each column a variable.

The algorithm firstly converts each measurement to its z value, then use the z values to calculate the factor scores.

The factor scores, being created using the z values, is also a z value with mean=0 and Standard Deviation=1. For clinical use, these values can be further transformed into scores that are intuitively easy to comprehend by users.

For example, the results of the Factor score for the intelligence test is firstly multiplied by 10 and have 100 added to it IQ=10z+100, to create the convenient mean IQ of 100 and SD of IQ 10.

Sample Size Parallel Analysis Confirmatory Factor Analysis

Currently there is no estimation for sample size for factor analysis that is based on any statistical theory. Recommendations from different sources vary greatly, and the commonly used rule of thumbs can be as follows

For each factor, 5 variables. For each variable, 5 subjects. In other words, 25 subjects per factor
Sample size should be 3 to 20 times the number of variables used, or absolute numbers of 100 to 1000.

Mundfrom (see references) and others in 2005 used empirical simulations to estimate minimal sample sizes that are likely to produce reproducible results. This page summarises the more commonly used parts of table 1 and 2 from this paper, allowing quick references to the minimal sample sizes required for factor analysis under the more usual clinical scenario. It is recommended that users read the original paper to gain a clearer understanding of how the sample sizes are derived, and to obtain the full tables of sample sizes suitable for a wider range of conditions.

Sample size based on the communality of the model

In general, sample size depends on two criteria, the ratio of the number of variables to the number of factors, and the communality of the factors extracted.

Communality is a value between 0 and 1, and represents the proportion of the total variance in the data that is extracted by the factor analysis.

This page summarises table 1 of the paper.

Note : columns are number of Factors and rows are p/f (parameters/factors) = ratio of number of variables to number of factors, and columns are number of factors

Where the communality is expected to be high (0.6 or more)

p/f	1	2	3	4	5	6
3	32	320	600	800	1000	1200
4	27	150	260	350	450	500
5	21	75	130	260	260	300
6	19	55	95	160	200	160
7	18	45	75	110	130	110
8	18	45	75	90	75	70
9	17	40	60	65	80	80
10	15	35	60	70	65	65
11	16	35	55	60	60	75
12	15	35	55	55	65	75

Where the communality is expected to be not so high (0.2 to 0.6)

p/f	1	2	3	4	5	6
3	110	710	1300	1400	1400	1600
4	65	220	350	700	900	900
5	50	130	200	300	300	350
6	50	95	140	180	200	180
7	40	75	105	160	150	130
8	36	65	90	90	130	110
9	33	55	70	85	90	100
10	32	55	75	80	85	95
11	36	50	65	75	85	95
12	30	50	70	75	85	95

Sample size where the ratio of variables to factors is 7 or more

A simpler approach is to use models where the ratios of variables to factors (p/f) are at least 7 and assuming that the model will have communalities usable in the clinical situation. In this case, minimal sample size required depends only on the number of factors in the model.

The sample sizes required can be more simply and conveniently presented. Based on table 2 of the paper, they are :

18-60 for 1 factor with 7 or more variables
45-80 for 2 factors with 14 or more variables
75-100 for 3 factor with 21 or more variables
110-180 for 4 factors with 28 or more variables
130-170 for 5 factors with 35 or more variables
110-140 for 6 factors with 42 or more variables

Where there are more than 6 factors, the minimum sample size required is 100. This applies until the number of factors is 15 with 105 or more variables, when the sample size should exceed the number of variables.

In short, the sample size of 180 can be used where the ratio of variables to factor is 7 or more, and if there are less than 15 factors.

Parallel Analysis is a procedure to determine the number of factors to retain, during the initial extraction of Principal Components in Exploratory Factor Analysis. As this is a large subject, details of explanation, tables of minimum Eigen Values, and a Javascript program for calculation, are provided in the Factor Analysis - Parallel Analysis Explained, Tables, and Program Page

StatTools does not offer any calculations for Confirmatory Factor Analysis, as the procedures are complex, the choices numerous, and pitfalls aplenty. StatTools takes the view that those undertaking Confirmatory Factor Analysis should have expertise not only in the subject being investigated, but also statistics at the professional level of expertise. The following is a brief introduction to the subject, based on the algorithms available in the statistical software package LISREL. The main purpose is to demonstrate the complexity involved.

Confirmatory factor analysis answers the question whether a set of data fits a prescribed factor pattern. It is usually used for two purposes.

The first is to test or confirm that a factor pattern, say from a survey tool, is stable and therefore can be confirmed by an independent set of data. In this, the factor pattern comes from an existing tool or a theoretical construct, and whether this fits with a set of data is then tested.

The second is in the development of a multivariate instrument or tool, such as questionnaire to evaluate racism. In this, the number of factors (concepts, constructs, or dimensions) are firstly defined, then a number of variables (questions or measurements) that may reflect each of these construct developed. Data are then collected, and tested against the factor-variable relationship. Those variables that do not fit neatly into a single factor are then replaced or changed, and new data are collected and tested. This process is repeated until a set of data collected fits the required pattern.

Confirmatory factor analysis uses the Maximum Likelihood method of extraction, because it is robust and allows for significance testing. In practice, however, statistical significance is difficult to interpret, as it is determined not only by how well the data fits in with the theoretical construct, but also by sample size, and the number of variables and factors.

Another problem is that Confirmatory Factor Analysis and maximum Likelihood method of extraction works best in the Principal Factor Model (where the correlation matrix has its diagonal elements replaced with the communality). To perform Confirmatory Factor Analysis on Factors developed from the Principal Component Model, or to use the Maximum Likelihood method on a correlation matrix without communality correction will produce strange looking and uninterpretable results, particularly being confronted with the Heywood situation where the factors cannot be comprehensively extracted.

Test of fit between data and construct

The Chi Square Test is the primary statistical test. However, Chi Squares tends to increase with sample size, with decreasing number of variables, and increasing number of factors. So, unless the theoretical construct was developed using Maximum Likelihood Factor Analysis that had exhausted all the correlations in the original matrix, and that the sample size of the testing data set is similar to the original data set used to develop the construct, the Chi Square may not truly reflect how well the data fits the construct.

As the Chi Square Test is statistical robust, yet problematical because of the complexity of confirmatory factor analysis, statisticians have developed an array of adaptation of the Chi Square to adjust for sample size and the relationship between the number of variables and factors.

From the clinical point of view however, the following decision making steps are recommended by some of the publications.

Step 1. Examine the critical number. This is the minimum sample size, below which the results cannot be validly interpreted. Only if the sample size exceeds the critical number can interpretation proceed.
Step 2. Look at the significance of the Chi Square. The Minimum Fit Chi Square can be used if one can be sure that all the variables are continuous measurement and normally distributed. If this cannot be assured, as is in almost all cases, then the Normal Theory Weighted Least Squares Chi-Square is used. If the Chi Square is not significant (p>=0.05), then a decision that a good fit exists between data and theory can be made. Subsequent steps are only necessary of the Chi Squares is significant (p<0.05) .
Step 3. Examine the Chi Square Degree of Freedom Ratio. This is obtained Ratio = Chi Sq / Deg Freedom. If the Ratio is 2 or less, then a decision that a reasonably good fit exists can be made. If the ratio exceeds 2 go to step 4.
Step 4. Examine the Goodness of Fit Index. If this index is 0.9 or more, then a decision that a reasonable good fit exists can be made. If not then the conclusion that the data fits poorly to theory should now be made.

Possible actions when data and theory do not fit.

If the conclusion is that the data and theory fits, then the statistical exercise ends. The researcher accepts the validity of the theory and moves on.

When the conclusion is that the data do not fit the theory, the actions that are possible then depends on the reasons for conduction the confirmatory test in the first place.

Option 1. Reject the factor construct and move on. This is of course the primary purpose of the exercise. The question is whether the data fits the theory, the answer is no, and the matter ends. Another reason for doing this is if the data itself is problematical, so that a fit can never be obtained. This occurs if the number of variables in a factor are too few, if the correlations are such that one or more variables load across more than 1 factor, or if the correlations within a factor vary greatly (as it would then not be possible to extract all the correlation out of the matrix).
Option 2. Why did it not fit and is it fixable. This option can be taken if the fit is nearly good enough, and only a few variables are problematic. Variables that do not fit well can be removed and the exercise repeated. This is doable if the exercise is part of developing a new statistical instrument of measurement, but not suitable in testing a set of data against an established Factor model.

Algorithms : It is difficult to find the algorithms for calculations associated with Factor Analysis, as most modern text book and technical manuals advise users to use one of the commercial packages. I have eventually found some useful algorithms in old text books, and they are as follows

Press WH, Flannery VP, Teukolsky SA, Vetterling WT (1989). Numerical Recipes in Pascal. Cambridge University Press IBSN 0-521-37516-9 p.395-396 and p.402-404. Jacobi method for finding Eigen values and Eigen vectors

Norusis MJ (1979) SPSS Statistical Algorithms Release 8. SPSS Inc Chicago

p. 86 for converting Eigen values and vectors to Principal Components
p. 91-93 for Varimax Rotation
p. 94-97 for Oblimin Rotation
p. 97-98 for Factor scores

Text books I learnt Factor Analysis some time ago, so all my text books are old, but they are adequate in explaining the basic concepts and provide the calculations used in these pages. Users should search for better and newer text books.

Thurston LL (1937) Multiple Factor Analysis. University of Chicago Press. I have not read this book, but this is quoted almost universally as it is the original Factor Analysis text book which set out the Principles and procedures
Gorsuch RL (1974) Factor Analysis. W. B. Saunders Company London 1974 ISBN 0-7216-4170-9 A standard text book teaching Factor Analysis at the Master's level. This is my copy and I believe there are later additions of this book available

Orthogonal Powered-Vector Factor Analysis
Overall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology. McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6 p.137-156

Sample Size
Mundfrom DJ, Shaw DG, Tian LK (2005) Minimum sample size recommendations for conducting factor analysis. International Journal of Testing 5:2:p 159-168

Free Factor Analysis software and its user manual can be downloaded from http://psico.fcep.urv.es/utilitats/factor/Download.html. T his is a package for Windows written by Drs. Lorezo-Seva and Ferrendo from Universitat Rovira i Virgili in Terragona in Spain. The presentation and options are very similar to that from SPSS, and the manual is excellent. The best part is that it is free and yes it is in English.

Teaching and discussion papers on the www There is an enormous list of discussion papers, technical notes and tutorials on the www that can be easily found by Google search. The following is a small sample of this.

http://en.wikipedia.org/wiki/Factor_analysis
http://www.psych.cornell.edu/darlington/factor.htm
http://www.hawaii.edu/powerkills/UFA.HTM
http://www.ats.ucla.edu/stat/spss/output/factor1.htm
http://www.stat-help.com/factor.pdf