Links : Home Index (Subjects) Contact StatTools |
Related link : This page supports the programs in the Data Testing for Normal Distribution Program Page , testing the hypothesis that a data set is normally distributed. It also provides some basic description of the dataset, similar to the data description procedure in SPSS. The example used here is the default example data in the Data Testing for Normal Distribution Program Page . Data Description The following parameters are calculated and presented.
Simple Tests of Normal Distribution : The following were the most commonly used tests of normal distribution until the complex algorithms requiring intense computing became available.
The following are more formal tests of normal distribution. The Chi Square goodness of Fit. The data is divided into groups of 1 standard deviations, and the chi square test is used to see whether the numbers in the groups differ significantly from what they should be if the data are normally distributed. The program also produces a normal distribution plot so that users can visualize the actual distribution of the data. The Kolmogorov-Smirnov test is the most commonly accepted test to see whether a set of data violates the assumption of normality. The data is firstly placed in order of magnitude, and cumulative probability for each data point is calculated and matched against a theoretical cumulative probability from a normal distribution. The largest difference between these two probabilities are tested against the sample size. The result of the test is whether the data significantly deviates from normality. The Shapiro-Francia test also tests whether the data significantly deviates from normality, and has been argued by some as the better test than the Kolmogorov-Smirnov test when the sample size is small. This is because, in the smaller sample size (n<1000), the maximum difference between theoretical and actual cumulative probability may be more variable, so the Kolmogorov-Smirnov test can be less stable. The P Plot : Plots of cumulative probability against the data. A dataset that has an exact normal distribution will plot along the diagonal, so this provides a visual description of the relationship. The Correlation between actual and theoretical distribution also provides a measure of how close the data is to normal distribution. This correlation is often used to optimise a transformation towards normal distribution, but has limited practical use to determine whether the assumption of normality is valid in a set of data. This is because the correlation coefficient tends to be high in any case, and it is difficult to determine a cut off point for decision. |