| 
 Related link :  
Multiple Regression Program Page
 
Sample Size for Multiple Regression Program Page
 
Sample Size for Multiple Regression Explained and Tables Page
 
Curve Fitting Explained Page
 
Path Analysis Explained Page
 
 
Multiple Regression
Complex Factorial Covariance Models
References 
 
 
Introduction
Multiple Regression
Example 
  
Multiple regression, from the program in  Multiple Regression Program Page
 is one of the most flexible 
and powerful statistical tools available to the researcher, as it allows the modelling of multiple influences on an outcome,
correcting for the overlapping influence of the independent variables.  For those who are familiar with the concepts, the algorithm 
of multiple regression can be used to calculate a large number of other parametric statistical procedures.
 Most professional statistical packages provide large numbers of complex statistical procedures based on multiple 
   regression, under the broad heading of the General Linear Model.  StatTools provides the following algorithms
   based on the multiple regression.
  
This page has two sections.
 
- This section explains the use of multiple regression, its sample size, and provides an example
 - A very much larger and complex section on how to use the multiple regression algorithm to conduct complex
    Factorial models of Covariance Analysis
      
 
Multiple regression consists of two or more independent variables (x 1,x 2,x 2, etc) 
and a single dependent variable (y). The formula produced is y = a + b 1x 1+b 2x 2+b 3x 3... where " 
 
- y is the single dependent variable, assumed to be a parametric measurement (continuous and normally distributed)
 - The x in x1,x2,x2, etc, are the independent variables.  They need not
    be parametric, but they need to be ordered (3>2>1).  Common independent variable types are :
    
    - Binary of 0/1, (no/yes, false/true, negative/positive, etc)
    
 - Ordinal, such as responses to pain (0=none, 1=some, 2=lots), or the Likert Item (0=SD, 1=D, 2=N 3=A, 4=SA)
    
 - Poisson distributed counts, such as number of cells in a set volume, number of complaints per month
    
 - Discrete Interval Measurements with unstated distribution, such as height in cms, age in years, Time on waiting list
    
 - Normally distributed measurement
    
 - Log-normally distributed measurements such as ratios.
    
  
  
Data Entry :  when using the program in  Multiple Regression Program Page
 consists of a multi-column 
   table, where
    
   - Each row is data from a subject
   
 - Each column is a measurement of a variable
   
 - The last column is the dependent (y) variable
   
  
Terminology
- Partial Correlation Coefficient (PCor) is the correlation between an independent variable (x) with the dependent variable (y), 
    having corrected for inter-correlations between all the independent variables 
 - Partial Standardised Regression Coefficient (PSReg) is the regression coefficient between an independent variable (x) and the
    dependent variable (y), having corrected for inter-correlations between all the independent variables, rescaled to a mean of 0
    and Standard Deviation of 1 for both.  This is measurement unit free, and used for comparing the relative scale of influence
    from different independent variables
 - Partial Regression Coefficient (PReg or b) is the regression coefficient between an independent variable (x) and the
    dependent variable (y), having corrected for inter-correlations between all the independent variables.  This is the b used in
    the regression formula  y = a + b1x1+b2x2+b3x3...
 - Standard Error of the Partial Regression Coefficient (SE)    
 - t=b/SE, and α is the Probability of Type I Error (two tail) of t with residual degrees of freedom
 - Constant(a) is the a in the formula y = a + b1x1+b2x2+b3x3...
  
Please note : that, in the table of analysis if variance, although the model Degrees of Freedom is the sum of the
   Regression Degrees of Freedom, the model Sums of Square is greater than the sum of (Sums of Square) from all the regression
   Coefficients.
   This is because the individual Sums of Squares describes the pure influence on y from each x variable, while the model sums 
   all of them, and add on top those Sums of Squares that overlap between the independent x variables.  It is this difference 
   that provides the very powerful analysis of variance in complex models, where multiple measurements often have various 
   degrees of correlation with each other, and their pure influences and overlapping influences need to be separately accounted for.
 Multiple Regression as Entered, and with Stepwise deletion
The program in the Multiple Regression Program Page
 provides two options for conducting multiple regression
 
- The as Entered model calculates multiple regression once, using all the entered data.  This is the preferred model if the 
    intension is to provide a description of the relationship between the variables, or if the calculation is used to obtain
    parameters for other complex statistical purposes.
 - The Stepwise Deletion model carries out repeated multiple regression analysis on the data entered, deleting the weakest
    independent variable after each cycle.   This is the preferred model when developing a predictive algorithm, where the 
    researcher starts with a large number of plausible predictors, and eliminate the weaker ones serially to obtained the most 
    powerful yet most parsimonious (fewest predictors) formula.  
    
The algorithm from the program continues until only 1 independent variable left, allowing the user to determine the number of 
    independent variables to retain in the final formula.  This can be done arbitrarily by judgement, but in most cases, 
    the decision is to retain only those independent variables where the Partial Regression Coefficient (b) is 
    statistically significant (α<0.05)   
   
Sample size for Multiple Regression 
Sample size program for multiple regression in the Multiple Regression Program Page
 uses a modified version of that 
for comparing multiple groups of measurement in the Sample Size for Unpaired Differences Tables Page
, but using the number of 
independent variables and Multiple Correlation Coefficient R to represent the number of groups and the residual variance. 
The calculations require multiple iterative approximations, so computation time increases exponentially with the number of
independent variables, and with decreasing value of R.  Users are encouraged to consult the tables in the
Sample Size for Multiple Regression Explained and Tables Page
 for their sample size needs.
  
Example 1 : Sample Size
We wish to study whether we can predict birthweight from maternal age, height
   and weight, as well as gestational age and the sex of the baby, 5 independent
   variables or predictors.   We want this model to be clinically useful, so 
   requires a moderate effect size of R=0.5 
 Using α=0.05, power=0.8, number of independent variables u=5, and anticipated
   effects size R=0.5, we can obtained from the Sample Size for Multiple Regression Explained and Tables Page
 that the sample size 
   required to be 56 pregnancies.
 Example 2 : Multiple Regression as Entered 
  
| age | Ht | Gest | Sex | BWt |  
| 24 | 170 | 37 | 1 | 3048 |  
| 29 | 161 | 36 | 0 | 2813 |  
| 29 | 167 | 41 | 1 | 3622 |  
| 21 | 165 | 36 | 1 | 2706 |  
| 35 | 168 | 35 | 0 | 2581 |  
| 27 | 161 | 39 | 0 | 3442 |  
| 26 | 163 | 40 | 1 | 3453 |  
| 34 | 167 | 37 | 0 | 3172 |  
| 25 | 165 | 35 | 1 | 2386 |  
| 28 | 170 | 39 | 0 | 3555 |  
| 32 | 167 | 37 | 1 | 3029 |  
| 31 | 169 | 37 | 0 | 3185 |  
| 26 | 161 | 36 | 1 | 2670 |  
| 21 | 165 | 38 | 0 | 3314 |  
| 21 | 166 | 41 | 1 | 3596 |  
| 24 | 164 | 38 | 0 | 3312 |  
| 34 | 169 | 38 | 0 | 3414 |  
| 25 | 161 | 41 | 0 | 3667 |  
| 26 | 167 | 40 | 0 | 3643 |  
| 27 | 162 | 33 | 1 | 1398 |  
| 27 | 160 | 38 | 1 | 3135 |  
| 21 | 167 | 39 | 1 | 3366 |  
 
We use the default example data from the Multiple Regression Program Page
 for this exercise.  The data 
   was computer generated to demonstrate the procedure and not real.
 We wish to explained factors that may influenced the birth weight of babies, these being maternal age (years) and height (cms), 
   the gestation age at birth (weeks), and whether the baby is a girl (1) or boy (0).  We collected 22 subjects, with the data showing on the left.
 
| Var | mean | SD |  
| 1.age | 27.0 | 4.3 |  
| 2.Ht | 165.2 | 3.2 |  
| 3.Gest | 37.8 | 2.1 |  
| 4.Sex | 0.5 | 0.5 |  
| 5.BWt | 3114 | 533 |  
 	 
Please note : The data are in columns separated by spaces or tabs, and 
	 the dependent variable (BWt) is in the last column.
 Using the program from the Multiple Regression Program Page
 and taken the option of calculating the
data as entered, we obtained the following results.
 we firstly produced the means and standard deviations of all the variables as shown to the right,
the last variable (5.BWt) is the dependent variable.
 
| 1 | 0.26 | -0.25 | -0.38 | -0.10 |  
| 0.26 | 1 | 0.08 | -0.13 | 0.24 |  
| -0.25 | 0.07 | 1 | -0.11 | 0.92 |  
| -0.38 | -0.13 | -0.11 | 1 | -0.32 |  
| -0.10 | 0.24 | 0.92 | -0.32 | 1 |  
 	 
The correlation matrix is produced next, as shown on the right.
 The multiple regression analysis now takes place.   Please note abbreviations
   for the coefficients table are as follows. 
 PCor = Partial Correlation Coefficient.   This is the correlation between 
    the variable and the dependent variable after correction for inter-correlation 
		between the independent variables.
 PSReg = Partial Standardised Regression Coefficient.   This measures the
    influence of each independent variable on the dependent variable, using z or
		standardised units.  For example, for 1 SD of change in maternal age, 0.01 SD of
		change occurs in birthweight.   For 1 SD of change in gestation, 0.9 SD of 
		change occurs in birthweight.
 PReg = Partial Regression Coefficient.   This measures the change in
    the dependent variable for each unit of change in the independent variable.
		For example, for an increase of 1 year in age, the baby weighs 1.7g more.  For
		each week of maturing, the baby weighs 223g more.  Girls are 209g lighter. 
| var | PCor | PSReg | PReg | SE | t | α |  
| 1.age | 0.0418 | 0.0137 | 1.701 | 9.8641 | 0.1724 | 0.8653 |  
| 2.Ht | 0.4395 | 0.1417 | 23.6492 | 11.7243 | 2.0171 | 0.0608 |  
| 3.Gest | 0.9493 | 0.8952 | 223.1943 | 17.9205 | 12.4547 | <0.0001 |  
| 4.Sex | -0.5476 | -0.2009 | -209.15 | 77.5107 | -2.6983 | 0.0158 |  
 
Const = -9165.48   R = 0.961   R2 = 0.9236
 |   
SE = standard error of the Partial Regression Coefficient.
 t = t test for that Partial Regression Coefficient
 α (p) = the probability of Type I Error (α) for that Partial Regression Coefficient.
 Const = the constant of the equation.   In this case, BWt in G = -9165 + 1.7(age in years)
    + 23.7(height in cms) + 223.2(gestation in weeks), and -209.5 if the baby is a girl. 
 R = the Multiple Correlation Coefficient. This is the effect size of the equation,
    R Sq is R2, the proportion of the total variance that is explained
		by the regressions.  					 		
This is followed by the analysis of variance
 
 | df | SSq | MSq | F | α |  
| Var1 | 1 | 797 | 797 | 0.0297 | 0.8652 |  
| Var2 | 1 | 109006 | 109006 | 4.0688 | 0.0598 |  
| Var3 | 1 | 4155814 | 4155814 | 155.1197 | <0.0001 |  
| Var4 | 1 | 195066 | 195066 | 7.281 | 0.0152 |  
| Model | 4 | 5503642 | 1375910 | 51.3572 | <0.0001 |  
| Res | 17 | 455447 | 26791 |  
| Tot | 21 | 5959089 |  
 
The abbreviations for the analysis of variance table are as follows
 Var = the source of variation
 df = degrees of freedom
 SSq = Sum of Squares
 MSq = mean Sums of Squares or variance
 F = Fisher's F, ratio of MSq of Reg and Res
 p = Probability of Type I error (α)
 Model = Contribution from all independent variables collectively
 Res = results related to the residual or random error
 Tot = total df and SSq.
 Var1-Var4 = individual contributions from each variable after corrections for correlation
 
It should be noted that, although the sum of degrees of freedom from all the independent variable equals to that of the
   model as a whole (in this example both = 4),  this is not so for the Sums of Squares unless the independent variables are all uncorrelated with each other.
   Otherwise the sum of all the individual Sums of Squares is usually less than that of the model as a whole 
   (in this example 4460683 and 5503642).  This is because, 
   for each variable, the Sum of Squares tabulated is that unique to itself, excluding 
   the part it shares by correlation with other independent variables.  The missing value, the difference between model ssq and
   the sums of those from individual variables (5503642-4460683=1042958), is that attributable to the overlaps and 
   correlations between the independent variables.
 Example 3 : Multiple Regression with Stepwise Deletion    
 Instead of aiming to understand the relationship between independent and dependent variables, we wish to establishe
   the most efficient formula to predict birthweight.  The efficiency is defined by the most accurate prediction with the 
   least number of independent variables.  We determined to use α(p)>0.05 to delete those variables that are 
   inefficient predictors.
 
| var | PCor | PSReg | PReg | SE | t | α |  
| 2.Ht | 0.46 | 0.14 | 24.18 | 11.0 | 2.1985 | 0.042 |  
| 3.Gest | 0.95 | 0.89 | 222.14 | 16.38 | 13.5577 | <0.0001 |  
| 4.Sex | -0.59 | -0.21 | -214.61 | 68.83 | -3.118 | 0.0063 |  
 
constant (a) = -9165.37 |   
From the first cycle of calculation in the previous example, we determined that maternal age (PSReg=0.01, t=0.17, α=0.87)
   can be deleted. In the second cycle, we found the results as shown to the right. 
All 3 remaining predictors now have statistically significant Partial Regression Coefficient (α<0.05), so 
no further deletion is necessary, and the final prediction formula is    
  Birth weight (g) = -9165 + 24 (maternal height in cms) + 222 (gestation in weeks) for boys, and
  215g less for girls 
Please note :  that the program in the Multiple Regression Program Page
 progressively delete 
   the least significant variable at each cycle of calculations until only one variable is left in the equation.   The user however
   should examine the results at the end of each cycle, and decide when the stepwise deletion should stop.  In this example,
   stepwise deletion is stopped after the first cycle, and only maternal height had been deleted, because the decision 
   to delete was based on α>0.05
  
 
 
Concepts and Background
OneWay Analysis of Variance and Covariance
Factorial Analysis of Variance and Covariance 
  
 
Introduction and Theoretical Considerations
Technical Considerations 
  
This section explains the relationship between multiple regression and the general model of analysis of variance and covariance.
This is done for the following reasons.
 
- To demonstrate the underlying principles of the least squares statistical approach to the analysis of variance
 - To provide an understanding of One Way Analysis of Variance, the Factorial model of Analysis of Variance, and the Analysis 
    of Covariance
 - To provide a guideline on how to conduct complex Analysis of Covariance, step by step, using the algorithm of 
    multiple regression.   Although this may still be of interest to some, it is mostly superceded by the commercially 
    available statistical packages, which will perform the procedures with check boxes for options, and a click of the button.  
  
For those who do not have a clear understanding of Analysis of Covariance, the following minimal and very basic
   terms and descriptions may be useful.
 
- Variance is the square of the Standard Deviation, and it measures variations in a measurement
 - The Analysis of Variance partitions the variance of the dependent variable according to those factors that influence it.  
    
    - In the simplest model, the analysis of variance is summarized as the t test.  For example, how is the variance in
        birth weight influenced by the sex (male or female) of the baby, a single comparison of the two sexes
    
 - When there are more than two groups, the general model of One Way Analysis of Variance is used.  For example, how do
        three different ethnic origin (say Greeks, Germans, and Slavs) influence the birth weight of the baby.  with three 
        groups there are 3 comparisons, Greek vs Germans, Greek vs Slavs, and German vs Slavs.
    
 - When Two sets of influences (Factors) are involved (say sex and ethnicity), then a Two Way Analysis of Variance is used.  
        With more, Multiway Analysis of Variance.   However, there may be systematic or accidental correlations between factors,
        (say Greeks have more girls than Germans), and these are called Interactions between Factors.  The analysis of Variance
        which separates those variances unique to each factor, and those that overlapped between factors is known as the
        Factorial Model of Analysis of Variance.
    
  
 - If, on top of all of this, as is usually the case, there are other influences to be taken into consideration, such as 
    differences in birth weights must be corrected by the gestational age, then one or more of these corrections are 
    termed covariates, and the combination of the analysis becomes Covariance Analysis.      
 - Things now starts to become a bit complicated, because each covariate may act differently in different factors, say
    German babies grow faster than Slav babies near term.  This is call an Interaction between a factor and a covariate.
 - The total number of interactions are therefore a multiple of covariates and factors.  As these increases, the model becomes
    complex confusing.
 - To be correct, the results of a covariate analysis is only valid if all possible interactions are tested and found to be 
    trivial (not statistically significant).   In a review of the literature however, most do not bother and assumes that 
    interactionsare either irrelevant or do not exist.          
  
 
This panel describes to the reader the organisation of the explanations, and the example data used, in the rest of this section.
 The rest of the sections are divided as follows
 
- One Way Analysis of Variance and covariance, with the following examples
    
    - Analysis using two groups (sex of the baby) and a covariate (gestation)
    
 - Analysis using three groups (ethnicity of the mother baby) and a covariate (gestation)
    
  
 - Factorial Analysis of Variance and Covariance, with two factors (sex and ethnicity) and a covariate (gestation).    
  
  
| Sex | Ethnicity | Gest | BWt |  
| Girl | Greek | 37 | 3048 |  
| Boy | German | 36 | 2813 |  
| Girl | French | 41 | 3622 |  
| Girl | Greek | 36 | 2706 |  
| Boy | German | 35 | 2581 |  
| Boy | French | 39 | 3442 |  
| Girl | Greek | 40 | 3453 |  
| Boy | German | 37 | 3172 |  
| Girl | French | 35 | 2386 |  
| Boy | Greek | 39 | 3555 |  
| Girl | German | 37 | 3029 |  
| Boy | French | 37 | 3185 |  
| Girl | Greek | 36 | 2670 |  
| Boy | German | 38 | 3314 |  
| Girl | French | 41 | 3596 |  
| Boy | Greek | 38 | 3312 |  
| Girl | German | 39 | 3200 |  
| Boy | French | 41 | 3667 |  
| Boy | Greek | 40 | 3643 |  
| Girl | German | 38 | 3212 |  
| Girl | French | 38 | 3135 |  
| Girl | Greek | 39 | 3366 |  
 
The algorithm used to obtain the results will be multiple regression (as entered model), as calculated in the
   Multiple Regression Program Page
.   Out of all the results produced, the useful parameters  used for
   Analysis of Variance and Covariance are
 
- The constant (a) and regression coefficient (b) of the regression coefficient
 - The degrees of freedom (df) and Sums of Square (ssq) from the Anakysis of Variance table
  
 
The dataset used for this exercise, as tabulated to the right and plotted to the left, is artificially generated by the computer to demonstrate the procedures, and they do not represent reality.  
   Users should also understand that real analysis requires a much larger volume of cases than that presented here.
 There are 4 German boys (red) and 3 German girls (maroon), 3 Greek boys (light green) and 5 Greek girls (dark green), 
   3 French boys (blue) and 4 French girls (navy).   All sex and ethnicity in subsequent plots will be identified by these colors.
  
 
 
Two Groups
Three Groups 
  
 
| Sex | Gest | BWt |  
| Boy | 36 | 2813 |  
| Boy | 35 | 2581 |  
| Boy | 37 | 3172 |  
| Boy | 38 | 3314 |  
| Boy | 39 | 3555 |  
| Boy | 38 | 3312 |  
| Boy | 40 | 3643 |  
| Boy | 39 | 3442 |  
| Boy | 37 | 3185 |  
| Boy | 41 | 3667 |  
| Girl | 37 | 3029 |  
| Girl | 39 | 3200 |  
| Girl | 38 | 3212 |  
| Girl | 37 | 3048 |  
| Girl | 36 | 2706 |  
| Girl | 40 | 3453 |  
| Girl | 36 | 2670 |  
| Girl | 39 | 3366 |  
| Girl | 41 | 3622 |  
| Girl | 35 | 2386 |  
| Girl | 41 | 3596 |  
| Girl | 38 | 3135 |  
 
We will use the data set and analyse the difference in birth weight between boys and girls, and for the moment forget the 
ethnicity.  The re-arranged data table is as shown to the right, and the plot as shown to the left.
 One way Analysis of Variance
 If we ignore the gestational age, then we can use the program in the Unpaired Difference Programs Page
.  The results would 
   be 
 
- For boys, n=10, mean=3268g, Standard Deviation=351g
 - For girls n=12  mean=3119g, Standard Deviation=380g
 - The difference = 149g,  t=0.95 df=20 p=0.35
  
However, if we were to use the regression model in  Multiple Regression Program Page
, using x=0 for boys and x=1 
for girls, and y=birth weight, we will obtain the formula birth weight (y) = 3268 - 149(girls).  This means that the birth weight
is 3268g when x=0 (boys), and reduced by 149g when sex is 1 (girl).  The t for the regression coefficient -0.95 is also the same
as that using the algorithm to compare the two groups.
 In other words, the regression algorithm produces the same results as that of analysis of variance for two groups.
 One way Analysis of Variance with a covariance
The One Way Analysis of Variance showed that there was no significant difference between the birth weight of boys and girls. This is because a much greater influence obfuscated the difference, the gestational age, as can be seen in the diagram. 
One method of correcting for the influence 
of gestational age is to draw two regression lines and compare them, using the program in the 
Compare Two Regression Lines (Covariance Analysis) Program Page
.  Submitting the data to that program will produce the following results.
 
- For girls, Birth weight (y in gram) = -2772 + 185(gestation in weeks)
 - For boys, Birth Weight (y in gram) = -3999 + 187(gestation in weeks)]
 - Difference in slope = 185-187 = -2g per week, t = 0.07,   df = 18,   p = 0.95
 - Assumed common slope = 186g / week
 - Difference between sexes (girls - boys) adjusted for gestational age = -165g, t = 4.03,   df = 19,   p <0.001
 - In other words, the growth rates between boys and girls are not significantly different, at 186g/week.  Having 
    corrected for growth rates, girls are 186g lighter than boys, which is statistically significant.
  
| Sex | Gestation | Ia | BWt |  
| 0 | 36 | 0 | 2813 |  
| 0 | 35 | 0 | 2581 |  
| 0 | 37 | 0 | 3172 |  
| 0 | 38 | 0 | 3314 |  
| 0 | 39 | 0 | 3555 |  
| 0 | 38 | 0 | 3312 |  
| 0 | 40 | 0 | 3643 |  
| 0 | 39 | 0 | 3442 |  
| 0 | 37 | 0 | 3185 |  
| 0 | 41 | 0 | 3667 |  
| 1 | 37 | 37 | 3029 |  
| 1 | 39 | 39 | 3200 |  
| 1 | 38 | 38 | 3212 |  
| 1 | 37 | 37 | 3048 |  
| 1 | 36 | 36 | 2706 |  
| 1 | 40 | 40 | 3453 |  
| 1 | 36 | 36 | 2670 |  
| 1 | 39 | 39 | 3366 |  
| 1 | 41 | 41 | 3622 |  
| 1 | 35 | 35 | 2386 |  
| 1 | 41 | 41 | 3596 |  
| 1 | 38 | 38 | 3135 |  
 
We will now use the multiple regression model, and introduce the concept of  interaction. Before we combined the influences
of gestational age and sex on birth weight, we must first assure ourselves that the influences of gestation are not different in
the two sexes, that boys grows faster/slower than girls near term.
 We therefore create a new variable, the interaction (Ia) so that Ia = sex * Gestation, so that the data to be used are as shown to the right.  We then analyse this set of data using multiple regression and obtain the following results (rounded to the nearest whole number).
 
- Birth weight (g) = -3772 + (-227(girls)) + (185(Gestation in weeks)) + (2(Interaction))
 - The interaction = 2, t = 0.07, not statistically significant, is the same as the difference between the two slopes in the previous calculation
  
Had there been significant interaction, we would not be able to proceed, as the adjustment for gestation will need to be different in the two sexes.  As there is no significant interaction, the multiple regression analysis can now be repeated without the interaction term, and the result is Birth Weight (g) = -3808 + (-165(girls)) + (186(Gestation in weeks)).  In other words, having corrected for the 
influence of gestation, girls are 165g lighter than boys.
 The whole point of this exercise, to analyse the same data using comparison of two regression lines and using multiple regression,
   is to demonstrate the principle underlying covariance analysis, and to demonstrate what an interaction in a multivariate set
   of calculation is all about.  To summarise
 
- Multiple regression can be used to analyse multivariate statistical data
 - In the multi-variate situation, there is a need to check for interaction, that the influence of one variable on the outcome
    is not affected by another influence. 
     
 
 
| Ethnicity | Gest | BWt |  
| German | 36 | 2813 |  
| German | 35 | 2581 |  
| German | 37 | 3172 |  
| German | 38 | 3314 |  
| German | 37 | 3029 |  
| German | 39 | 3200 |  
| German | 38 | 3212 |  
| Greek | 39 | 3555 |  
| Greek | 38 | 3312 |  
| Greek | 40 | 3643 |  
| Greek | 37 | 3048 |  
| Greek | 36 | 2706 |  
| Greek | 40 | 3453 |  
| Greek | 36 | 2670 |  
| Greek | 39 | 3366 |  
| French | 39 | 3442 |  
| French | 37 | 3185 |  
| French | 41 | 3667 |  
| French | 41 | 3622 |  
| French | 35 | 2386 |  
| French | 41 | 3596 |  
| French | 38 | 3135 |  
 
We will use the data set and analyse the difference in birth weight between ethnic origins, and for the moment forget sex of the 
baby.  The re-arranged data table is as shown to the right, and the plot as shown to the left.
 One way Analysis of Variance
 If we ignore the gestational age, then we can use the program in the Unpaired Difference Programs Page
.  The results would 
   be 
 
- For Germans, n=7, mean=3046g, Standard Deviation=261g
 - For Greeks, n=8, mean=3219g, Standard Deviation=373g
 - For French, n=7, mean=3290, Standard Deviation=451g
 - In the analysis of variance, F=0.81, α=0.46,the groups are not significantly different to each other.
  
Multiple Regression : Introducing the dummy variable
Multiple regression requires that the independent variables to be at least ordered (3>2>1).  When there are multiple 
groups which are not ordered, thee is a need to create dummy variables that are ordered to represent them, using the following procedures.
 
- The number of dummy variables = 1 less than the number of groups.  For the current data of 3 ethnic groups, we will create
    2 dummy variables EthnicDummy1 (ED1) and EthnicDummy2 (ED2)
 - For each group, we will assign it to one of the dummy variables as 1, and the remaining ones as 0, and for the last group,
    we will assign it as 0 to all groups.  It does not matter which group is assigned to what, providing they are identified 
    when the results are interpreted.
    
    - For Germans, ED1=1, ED2 = 0 (German and not Greek)
    
 - For Greeks, ED1=0, ED2 = 1; (Greek and not German)
    
 - For French, ED1=0, ED2=0; (Not German and not Greek)
    
      
  
ED1 (German) | ED2 (Greek) | Birth Weight |  
| 1 | 0 | 2813 |  
| 1 | 0 | 2581 |  
| 1 | 0 | 3172 |  
| 1 | 0 | 3314 |  
| 1 | 0 | 3029 |  
| 1 | 0 | 3200 |  
| 1 | 0 | 3212 |  
| 0 | 1 | 3555 |  
| 0 | 1 | 3312 |  
| 0 | 1 | 3643 |  
| 0 | 1 | 3048 |  
| 0 | 1 | 2706 |  
| 0 | 1 | 3453 |  
| 0 | 1 | 2670 |  
| 0 | 1 | 3366 |  
| 0 | 0 | 3442 |  
| 0 | 0 | 3185 |  
| 0 | 0 | 3667 |  
| 0 | 0 | 3622 |  
| 0 | 0 | 2386 |  
| 0 | 0 | 3596 |  
| 0 | 0 | 3135 |  
 
Multiple Regression now produces the formula Birth Weight (y) = 3290 + (-245ED1) + (-71ED2). This means :
 
- For German babies, where ED1=1 and ED2=0, the birth weight is 3290 - 245 = 3045g
 - For Greek babies, where ED1=0 and ED2=1, the birth weight is 3290 - 71 = 3219g
 - For French babies, where ED1=0 and ED2=0, the birth weight is 3219g
 - F for the model is 0.81, which is not statistically significant.
 - Except for the rounding error of 1g for German babies, these are the same results as that from One Way Analysis of Variance
  
Analysis of Covariance for multiple groups.
ED1 (German) | ED2 (Greek) | Gestation | ED1S | ED2S | Birth Weight |  
| 1 | 0 | 36 | 36 | 0 | 2813 |  
| 1 | 0 | 35 | 35 | 0 | 2581 |  
| 1 | 0 | 37 | 37 | 0 | 3172 |  
| 1 | 0 | 38 | 38 | 0 | 3314 |  
| 1 | 0 | 37 | 37 | 0 | 3029 |  
| 1 | 0 | 39 | 39 | 0 | 3200 |  
| 1 | 0 | 38 | 38 | 0 | 3212 |  
| 0 | 1 | 39 | 0 | 39 | 3555 |  
| 0 | 1 | 38 | 0 | 38 | 3312 |  
| 0 | 1 | 40 | 0 | 40 | 3643 |  
| 0 | 1 | 37 | 0 | 37 | 3048 |  
| 0 | 1 | 36 | 0 | 36 | 2706 |  
| 0 | 1 | 40 | 0 | 40 | 3453 |  
| 0 | 1 | 36 | 0 | 36 | 2670 |  
| 0 | 1 | 39 | 0 | 39 | 3366 |  
| 0 | 0 | 39 | 0 | 0 | 3442 |  
| 0 | 0 | 37 | 0 | 0 | 3185 |  
| 0 | 0 | 41 | 0 | 0 | 3667 |  
| 0 | 0 | 41 | 0 | 0 | 3622 |  
| 0 | 0 | 35 | 0 | 0 | 2386 |  
| 0 | 0 | 41 | 0 | 0 | 3596 |  
| 0 | 0 | 38 | 0 | 0 | 3135 |  
 
The differences between ethnic groups have been found to be not statistically significant, but this may be caused by the 
much greater influence of gestational age on birth weight, as can be seen in the plot above.  The inclusion of gestational age
as a covariate is therefore necessary.
 As the three ethnic groups have been converted into two dummy variables ED1 and ED2, the interaction between gestation and both
ED variables will now need to be constructed.  These are ED1G=ED1*Gest, and ED2G=ED2*Gest. The data is now as shown to the right,
and analysis will follow the following steps.
 Step 1 :  All 5 independent variables, ED1, ED2, gestation,
    ED1S, ED2S, plus the dependent variable BWt, are subjected to multiple regression analysis.  Although the full data output 
    is produced by the program, we are only interested in the model degrees of freedom (5) and Sums of Square (2544655).
 Step 2  :  The exercise is repeated, excluding the two interaction terms of ED1S and ED2S.  The 3 independent variables,
    ED1, ED2, Gestation, plus the dependent variable BWt is subjected to multiple regression analysis.  Again, we are interested in
    the degrees of freedom (3) and Sums of Square (2527306)
 Step 3 : Analysis of Interaction  Using the combined information from the two steps , we can now reconstruct the
    Analysis of Variance Table obtained initially in Step 1, as shown in the table to the right. The Probability of Type I Error
    for F= 0.49, with 2 and 16 degrees of freedom is α=0.63, and we can now conclude at this point that
     no significant interaction exists between gestation and ethnic origin of the babies.  In other words, the growth rates
     of babies near term are not different in the three ethnic groups.
       | df | SSq | MSq | F |  
            | Inclusive of Interaction | 5 | 2544655 |  
            | Exclusive of Interaction | 3 | 2527306 |  
            | Attributable to Interaction | 2 | 17349 | 17349/2=8675 |  8675/17535=0.49 |  
            | Residual | 16 | 280560 | 280560/16=17535 |   
Step 4 : Covariance Analysis . The Regression Formula obtained in Step 2, excluding interactions, can now be examined. 
      
    - The formula is Birth weight (y in g) = -4166 + 84ED1 +69ED2 + 192Gestation (in weeks)
    
 - Birth weight increases by 192g per week near term (t=11.8, α<0.001, statistically significant)
    
 - A French baby, at term (40 weeks), averaged 40*192-4166 = 3514g
    
 - German babies (ED1) are 84g more than French babies (t=1.13, α=0.27, not statistically significant)
    
 - Greek babies are 69g more than French babies (t=1.02, α=0.32, not statistically significant)
    
  
Comments :  These simple steps demonstrate the mathematical sequence used to handle complex data using the multiple regression algorithm.
 
- The creation of binary dummy variables to replace variables with multiple groups
 - The creation of interaction variables between different factors, where Interaction value = Factor1 value multiplied by Factor 2 value
 - The double analysis of variance, with and without the interaction variables, to isolate the interaction effect. This is necessary,
    because some correlation (and therefore overlapping effect) exists between different factors, and this double procedure allows
    the overlap to remain with the main effect, so that the uncorrelated interactions can be isolated.
 - Only when there is no significant interaction, can the covariance analysis be interpreted.
   
Two very important concepts  involved when handling multivariate data are also demonstrated in this model.
 
- Interaction, where the influence of on factor on the dependent variable is altered by another factor.  Interaction can be 
    helpful or unhelpful, but they need to be defined, isolated, and interpreted.  An example is that interaction between sex and 
    gestation means boys and girls have different growth rates
 - Confounding, caused by correlations between factors, so that it is difficult or even impossible to identify how much each
    factor affects the outcome.  Confounding is always bad as it results in misleading interpretations, and the greatest virtue of 
    multiple regression analysis is its ability to separate the unique and overlapping parts of effects from multiple factors.
    An example of correlation and confounding would be if girls are born earlier than boys, so that it is unclear 
    whether it is the sex or the gestation that affects birth weight.  
   
 
 
Factorial Analysis of Variance
Factorial Analysis of Covariance 
  
The Factorial model of Analysis of Variance was initially used in agriculture and animal laboratories, where subjects 
(plants or animals) are randomly allocated to groups, which are given a combination of two or more treatments.  Such a model has
many advantages 
 
- The same subject is used in a number of experiments simultaneously, thus greatly reduce the cost of research
 - In many cases, the combination of two treatments may have greater (synergism) or less (antagonism) effect than the sum 
of their individual treatment.  These are called interactions and provides additional useful information to have'
 - Mathematically, the analysis of Variance calculates the effect of each treatment (single factors), then in groups of 
    combined treatments (combined factors).  The difference between the combined effect and the sums of the single effects 
    then represented the interaction, which can be numerically presented and statistically tested.
 - The two important underlying assumptions in this model are, firstly, that the treatment must be randomly and 
    independently allocated, so there is no correlation between treatments, and secondly, that all groups and subgroups 
    at different levels have the same sample size.    
  
The Factorial model is a powerful and efficient model of investigation, so gradually it is adopted in all aspects of 
psychosocial research, and into the clinical area, and from the controlled experiment to the epidemiological model.  In doing so,
the important assumptions of Factorial models cannot be met, as independent variables are often not randomly allocated treatments, 
but characteristics in the natural environment, and sample size availability in subgroups are seldom the same.
 
- The sample size in the groups can only be controlled to an extent.  For example, the number of boys and girls born are
    never exactly the same, and to artificially create equal numbers will require removing some cases arbitrarily, and this 
    process itself will introduce a bias.
 - The difference in birth weight between boys and girls amongst Germans may be different to that amongst Greeks (interaction).
    Although interaction can be useful information, in clinical investigations they often represents an unwanted distraction making 
    interpretation of data difficult.
 - We cannot allocate sex at random to different groups, and a possibililty of correlation occurs.  For example, the sex ratio may
    differ in different ethnic groups, so that the influence of ethnicity and sex cannot be separated (confounding).
  
When the assumptions of the Factorial model is violated, the results produced becomes misleading, and sometimes the numbers do not add up.  When there is extensive correlation between independent variables, the overlapping influences are counted repeatedly and thus inflated in the single effects, so that the combined effect is less than the sum of the single effects, resulting in a conceptually unacceptable negative interaction.
 The mathematics of multiple regression is able to resolve this difficulty, because it separates those influence (in terms of Sums of Squares) that are unique to each independent variable, and those influence that overlaps between the correlated variables.  In short, it treats every factor both as an independent variable and a covariate.  In most modern statistical packages therefore, the multiple regression algorithm is used for calculation even though the user interface retains the Analysis of Variance format.
 
 
| Sex | Ethnicity | BWt |  
| Boy | German | 2813 |  
| Boy | German | 2581 |  
| Boy | German | 3172 |  
| Boy | German | 3314 |  
| Boy | Greek | 3555 |  
| Boy | Greek | 3312 |  
| Boy | Greek | 3643 |  
| Boy | French | 3442 |  
| Boy | French | 3185 |  
| Boy | French | 3667 |  
| Girl | German | 3029 |  
| Girl | German | 3200 |  
| Girl | German | 3212 |  
| Girl | Greek | 3048 |  
| Girl | Greek | 2706 |  
| Girl | Greek | 3453 |  
| Girl | Greek | 2670 |  
| Girl | Greek | 3366 |  
| Girl | French | 3622 |  
| Girl | French | 2386 |  
| Girl | French | 3596 |  
| Girl | French | 3135 |  
 
Factorial Model for Birth Weight
The data, as plotted, is shown in the diagram to the left, but for this analysis, we will ignore gestational age, and only
examine how the two factors, sex and ethnic origin, affect birth weight.  The data is as shown in the table to the right.
 To allow multiple regression, the 3 groups in the ethnicity factor is converted into two binary variables, as follows
 
- For Germans, ED1=1, ED2 = 0 (German and not Greek)
 - For Greeks, ED1=0, ED2 = 1; (Greek and not German)
 - For French, ED1=0, ED2=0; (Not German and not Greek)
      
To allow the estimation of interaction, two additional interaction variables are created
 
- Interaction between ED1 and sex ED1S = ED1 * sex
 - Interaction between ED2 and sex ED2S = ED2 * sex.
  
| sex | Ed1 | Ed2 | ED1S | ED2S | BWt |  
| 0 | 1 | 0 | 0 | 0 | 2813 |  
| 0 | 1 | 0 | 0 | 0 | 2581 |  
| 0 | 1 | 0 | 0 | 0 | 3172 |  
| 0 | 1 | 0 | 0 | 0 | 3314 |  
| 0 | 0 | 1 | 0 | 0 | 3555 |  
| 0 | 0 | 1 | 0 | 0 | 3312 |  
| 0 | 0 | 1 | 0 | 0 | 3643 |  
| 0 | 0 | 0 | 0 | 0 | 3442 |  
| 0 | 0 | 0 | 0 | 0 | 3185 |  
| 0 | 0 | 0 | 0 | 0 | 3667 |  
| 1 | 1 | 0 | 1 | 0 | 3029 |  
| 1 | 1 | 0 | 1 | 0 | 3200 |  
| 1 | 1 | 0 | 1 | 0 | 3212 |  
| 1 | 0 | 1 | 0 | 1 | 3048 |  
| 1 | 0 | 1 | 0 | 1 | 2706 |  
| 1 | 0 | 1 | 0 | 1 | 3453 |  
| 1 | 0 | 1 | 0 | 1 | 2670 |  
| 1 | 0 | 1 | 0 | 1 | 3366 |  
| 1 | 0 | 0 | 0 | 0 | 3622 |  
| 1 | 0 | 0 | 0 | 0 | 2386 |  
| 1 | 0 | 0 | 0 | 0 | 3596 |  
| 1 | 0 | 0 | 0 | 0 | 3135 |  
 
The data is then subjected to analysis using similar steps as that for covariance analysis.
 Step 1 : A two stage Analysis of Variance using the multiple regression algorithm, with and without the interaction
   variables are carried out.  In these analysis, only the degrees of freedom and Sums of Squares for the model are of interest.
    
   - The first calculation, including 5 independent variables of Sex, ED1, ED2,ED1S, ED2S, and the outcome variable BWt
       are used.  The degrees of freedom = 5, and the Sums of Square = 768244
   
 - The second calculation excludes the two interaction variables (ED1S and ED2S).  Three independent variables Sex, ED1, ED2,
       and the outcome variable BWt are analysed.  The degrees of freedom = 3, and Sums of Squares = 400694 
   
 - The Table of Analysis of Variance can now be restructured accordingly, as shown in the table to the right. 
   
 - Probability of Type I Error for F=1.43, with 2 and 16 degrees of freedom, α=0.27, not statistically significant.     
   
  
 | df | SSq | MSq | F |  
| Inclusive of Interaction | 5 | 768244 |  
| Exclusive of Interaction | 3 | 400694 |  
| Attributable to Interaction | 5-3=2 | 768244-400694=367550 | 367550/2=183775 | 183775/128561=1.43 |  
| Residual | 16 | 2056971 | 2056971/16=128561 |    
 
At this point therefore, we can conclude that no significant interaction exists between sex and ethnicity.  In other words, the difference in birth weight between boys and girls in different ethnic groups are similar.
 Step 2 : The regression formula obtained without the interaction variables can now be used to interpret the data.
 
- The formula is Birth weight (y in grams) = 3395 -183(sex) -270(ED1) -61(Ed2)
 - French (ED1=0 and Ed2=0) Boys (sex=0) averaged 3395g
 - French girls are 183g less (t = 1.15, α=0.27)
 - Germans are 270g less than French babies in their respective sexes (t = 1.37, α=0.19)
 - Greek are 61g  less than French babies in their respective sexes. (t = 0.32, α=0.75)
 - None of these differences are statistically significant
     
 
 
| Sex | Ethnicity | Gest | BWt |  
| Boy | German | 36 | 2813 |  
| Boy | German | 35 | 2581 |  
| Boy | German | 37 | 3172 |  
| Boy | German | 38 | 3314 |  
| Boy | Greek | 39 | 3555 |  
| Boy | Greek | 38 | 3312 |  
| Boy | Greek | 40 | 3643 |  
| Boy | French | 39 | 3442 |  
| Boy | French | 37 | 3185 |  
| Boy | French | 41 | 3667 |  
| Girl | German | 37 | 3029 |  
| Girl | German | 39 | 3200 |  
| Girl | German | 38 | 3212 |  
| Girl | Greek | 37 | 3048 |  
| Girl | Greek | 36 | 2706 |  
| Girl | Greek | 40 | 3453 |  
| Girl | Greek | 36 | 2670 |  
| Girl | Greek | 39 | 3366 |  
| Girl | French | 41 | 3622 |  
| Girl | French | 35 | 2386 |  
| Girl | French | 41 | 3596 |  
| Girl | French | 38 | 3135 |  
 
All previous discussion on Factorial Analysis of Variance and in Covariance Analysis are subsections of the full
Factorial Covariance Model , which will be discussed in this section.  The data is as presented in the table to the left,
and plotted to the right.
 The aim is to analyse the influence of two factors, sex and ethnicity on the birth weight of a baby, corrected for a 
single covariate, the gestational age in weeks.  The algorithm to be used in the multiple regression.
 As the reasons for the various procedures have already been covered in previous section, only the various stages of computation will be listed here.
 
| Sex | ED1 | ED2 | ED1S | ED2S | Gest | ED1G | ED2G | ED1SG | ED2SG | BWt |  
| 0 | 1 | 0 | 0 | 0 | 36 | 36 | 0 | 0 | 0 | 2813 |  
| 0 | 1 | 0 | 0 | 0 | 35 | 35 | 0 | 0 | 0 | 2581 |  
| 0 | 1 | 0 | 0 | 0 | 37 | 37 | 0 | 0 | 0 | 3172 |  
| 0 | 1 | 0 | 0 | 0 | 38 | 38 | 0 | 0 | 0 | 3314 |  
| 0 | 0 | 1 | 0 | 0 | 39 | 0 | 39 | 0 | 0 | 3555 |  
| 0 | 0 | 1 | 0 | 0 | 38 | 0 | 38 | 0 | 0 | 3312 |  
| 0 | 0 | 1 | 0 | 0 | 40 | 0 | 40 | 0 | 0 | 3643 |  
| 0 | 0 | 0 | 0 | 0 | 39 | 0 | 0 | 0 | 0 | 3442 |  
| 0 | 0 | 0 | 0 | 0 | 37 | 0 | 0 | 0 | 0 | 3185 |  
| 0 | 0 | 0 | 0 | 0 | 41 | 0 | 0 | 0 | 0 | 3667 |  
| 1 | 1 | 0 | 1 | 0 | 37 | 37 | 0 | 37 | 0 | 3029 |  
| 1 | 1 | 0 | 1 | 0 | 39 | 39 | 0 | 39 | 0 | 3200 |  
| 1 | 1 | 0 | 1 | 0 | 38 | 38 | 0 | 38 | 0 | 3212 |  
| 1 | 0 | 1 | 0 | 1 | 37 | 0 | 37 | 0 | 37 | 3048 |  
| 1 | 0 | 1 | 0 | 1 | 36 | 0 | 36 | 0 | 36 | 2706 |  
| 1 | 0 | 1 | 0 | 1 | 40 | 0 | 40 | 0 | 40 | 3453 |  
| 1 | 0 | 1 | 0 | 1 | 36 | 0 | 36 | 0 | 36 | 2670 |  
| 1 | 0 | 1 | 0 | 1 | 39 | 0 | 39 | 0 | 39 | 3366 |  
| 1 | 0 | 0 | 0 | 0 | 41 | 0 | 0 | 0 | 0 | 3622 |  
| 1 | 0 | 0 | 0 | 0 | 35 | 0 | 0 | 0 | 0 | 2386 |  
| 1 | 0 | 0 | 0 | 0 | 41 | 0 | 0 | 0 | 0 | 3596 |  
| 1 | 0 | 0 | 0 | 0 | 38 | 0 | 0 | 0 | 0 | 3135 |  
 
Step 1. Preparation of the data
 
- Sex : Boy=0, Girl=1
 - Creation of two dummy variables. ED1=0 for non-German and 1 for German,  and ED2=0 for non-Greek and 1 for Greek
 - Two Interaction variable between Sex and the dummy variables. ED1S = Sex * ED1, and ED2S = Sex * ED2
 - Gest : Gestation in weeks 
 - 4 interaction variables involving Gestation. ED1G = Gest * ED1, ED2G = Gest * ED2, ED1SG = Gest * ED1S, ED2SG = Gest * ED2S
 - BWt : Birth weight in grams(g)
  
Step 2 : Interaction related to Gestation
Two Analysis, inclusive and exclusive of interaction variables of gestation, are carried out, to obtain the degrees
   of freedom and Sums of Squares of the two models.
 
- The first analysis includes the 10 independent variables of Sex, ED1, ED2, ED1S, ED2S, Gestation, ED1G, ED2G, ED1SG, ED2SG, 
    and the dependent variable Birth weight.  The degrees of Freedom = 10, and Sums of Squares = 2729846
 - The second analysis excludes the 4 gestation related interaction variables (ED1G, ED2G, ED1SG, ED2SG). Six independent variables
    of Sex, ED1, ED2, ED1S, ED2S, Gestation, and the dependent variable BWt are analysed.  The model degrees of freedom is now 6, 
    and Sums of Squares = 2682429
 - The Analysis of Variance Table can now be constructed, as shown below and to the right.  Probability of Type I Error for F=1.37,
    with 4 and 11 Degrees of Freedom  α = 0.31, not statistically significant.
  
 | df | SSq | MSq | F |  
| Inclusive of Interaction | 10 | 2729845.964 |  |  
| Exclusive of Interaction | 6 | 2682429 |  |  
| Attributive to Interaction | 10-6=4 | 2729846-2682429=47417 | 47417/4=11854 | 11854/8670=1.37 |  
| Residual | 11 | 95369 | 95369/11=8670 |  
 
At this point, we can conclude that there is no significant interaction involving gestation.  In other words, growth rates in all groups are similar.
       
 Step 3 : Evaluating Interaction between sex and ethnicity
 As with gestation, consideration of interaction between sex and ethnicity also involves two analysis.
 
- The first analysis includes 6 independent variables of Sex, Ed1, ED2, ED1S, ED2S, Gestation, and the dependent variable BWt,
    and these are now subjected to analysis of variance using the multiple regression algorithm. The model degrees of freedom is 6, 
    and Sums of Square = 2682429.
 - The second analysis, excludes the two interaction variables between sex and ethnicity (ED1S, ED2S).  Four independent variables,
    Sex, Ed1, ED2, Gestation, and the dependent variable BWt, are now subjected to analysis of variance using the multiple 
    regression algorithm. The model degrees of freedom is 4, and Sums of Square = 2673221.    
 - The Analysis of Variance Table can now be constructed, as shown below and to the right.  Probability of Type I Error for F=0.48,
    with 2 and 15 Degrees of Freedom  α = 0.63, not statistically significant.
  
 | df | SSq | MSq | F |  
| Inclusive of Interaction | 6 | 2682429 |  |  
| Exclusive of Interaction | 4 | 2673221 |  |  
| Attributive to Interaction | 6-4=2 | 2682429-2673221=9208 | 9208/2=4604 | 4604/9519=0.48 |  
| Residual | 15 | 142785 | 142785/15=9519 |  
 
At this point, we can conclude that there is no significant interaction between sex and ethnicity.  In other words, once corrected for
gestation, the difference in birth weight between boys and girls are similar in all ethnic groups.
 Step 5 : Final Analysis
The regression formula in the last analysis, free of any interaction terms, can now be interpreted. T
 
- The formula is Birth Weight (y in g) = -4022 -166Sex + 58ED1 + 77ED2 + 191Gest(week)
 - Weight gain is 191g per week near term (t=15.94, α<0.0001, statistically highly significant)
 - A French Boy (Sex=0, ED1=0, ED2=0), at 40 weeks gestation, averaged 40*191-4022 = 3618g
 - A French girl is 166g lighter (t=4.04, α=0.0009, statistically highly significant)
 - German babies are 58g heavier than French babies with respective sex and gestation (t=1.06, α=0.30, not statistically significant)     
 - Greek babies are 77g heavier than French babies with respective sex and gestation (t=1.55, α=0.44, not statistically significant)
 - We have established that gestation and sex of the babies significantly affect birth weight, but ethnic origins do not.
      
 
 
 
Multiple Regression
 
Steel RGD, Torrie JH, Dickey DA (1997) Principles and procedures of statistics. 
A biomedical approach. 3rd Ed. McGraw-Hill Inc New York NY 10020 
ISBN 0-07-061028-2 p. 322-351 
 
Sample Size for Multiple Regression
 
Cohen J. (1988) Statistical Power Analysis for the Behavioural
              Sciences. Second Edition. Lawrence Erlbaum Associates Publishers.
              Hillsdale New Jersey USA. ISBN 0-8058-0283-5. p. 407-410; 551.
 
Covariance Models
 
Overall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology. 
   McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6 p.415-440.
 This provides the template used on this page, of how to use multiple regression to carry out the analysis of covariance.   
 Steel RGD, Torrie JH, Dickey DA (1997) Principles and procedures of statistics. A biomedical approach. 3rd Ed. 
   McGraw-Hill Inc New York NY 10020 ISBN 0-07-061028-2 p. 322-351 
 This provides the mathematics of calculating the coefficients and the analysis of variance  for the multiple regression model.   
 Pedhazur E. (1997) Multiple Regression in Behavioral Research. Explanation and Prediction.(3rd. ed.) 
   Harcourt Brace College Publishers, Fort Worth, USA. ISBN 0-03-072831-2 p.181-196. 
 This is a very detail textbook dealing with multiple regression and the many ways it can be used, and a very useful reference book.  
   It is included here however because it provides an excellent discussion on dummy variables.
  
 
  
  |