StatTools : Analysis of Covariance Explained and Codes in R

Links : Home Index (Subjects) Contact StatTools

Related Links:
R Programming Language Explained Page
Multiple Regression Explained Page

Introduction The Data R Codes References
This page repeats some of the discussions in the analysis of covariance panels of Multiple Regression Explained Page and uses the same data set as example. The essentials will be repeated here, but the calculations will be carried out in R.

For those not familiar with R, an introduction is provided in R Programming Language Explained Page

For those who do not have a clear understanding of Analysis of Covariance, the following minimal and very basic terms and descriptions may be useful.

  • The Analysis of Variance partitions the variance of the dependent variable according to those factors that influence it.
    • In the simplest model, the analysis of variance is summarized as the t test. For example, how is the variance in birth weight influenced by the sex (male or female) of the baby, a single comparison of the two sexes
    • When there are more than two groups, the general model of One Way Analysis of Variance is used. For example, how do three different ethnic origin (say Greeks, Germans, and Slavs) influence the birth weight of the baby. with three groups there are 3 comparisons, Greek vs Germans, Greek vs Slavs, and German vs Slavs.
    • When Two sets of influences (Factors) are involved (say sex and ethnicity), then a Two Way Analysis of Variance is used. With more, Multiway Analysis of Variance. However, there may be systematic or accidental correlations between factors, (say Greeks have more girls than Germans), and these are called Interactions between Factors. The analysis of Variance which separates those variances unique to each factor, and those that overlapped between factors is known as the Factorial Model of Analysis of Variance.
  • If, on top of all of this, as is usually the case, there are other influences to be taken into consideration, such as differences in birth weights must be corrected by the gestational age, then one or more of these corrections are termed covariates, and the combination of the analysis becomes Covariance Analysis.
  • Things now starts to become a bit complicated, because each covariate may act differently in different factors, say German babies grow faster than Slav babies near term. This is call an Interaction between a factor and a covariate.
  • The total number of interactions are therefore a multiple of covariates and factors. As these increases, the model becomes complex confusing.
  • To be correct, the results of a covariate analysis is only valid if all possible interactions are tested and found to be trivial (not statistically significant). In a review of the literature however, most do not bother and assumes that interactionsare either irrelevant or do not exist.