SSiz Pop Param Exp

To evaluate a population parameter is probably the most common research activity anywhere. It is the basis of market research, political polling, disease prevalence survey, economic activity, quality of schools, and so on.

Increasingly, the statistics involved are used also in quality control situations, where data obtained from sampling are matched against a bench mark value. The Quality Statistics Explained Page will expand on this use of this statistics.

To design a proper survey to evaluate a population parameter involves many technical issues, such as how to select the right population, what is the most appropriate measurement or question to ask, how to control bias, and so on.

This site addresses only the single issue of estimating sample size requirement during the planning stage, and once the data is obtained, to estimate the accuracy of the results.

Glossary : There are a number of terms used in all population parameter estimations.

Parameter value is what we set out to establish.
Error is the width of uncertainty
Confidence interval is the range within which the true parameter value is likely to be. It is Value ±Error
Percent of confidence interval is the percent of time the results we obtained will likely to fall within the confidence interval, should we repeated the survey many time with the same sample size. This represents how sure we can be that the result is within the confidence interval. The most commonly used is the 95% confidence interval, which means that 95% of the time we will get a result within the confidence interval if we were to repeat the survey (or 95% sure).

There are therefore two calculations.

At the planning stage, we define the confidence interval we required, and hence the error tolerable, from these we can estimate the sample size required to do the job.
When all the data has been collected, we know the central value, the variation, and the sample size, from these we can estimate the error, and hence the confidence interval of the results we have.

Establishing population means is a frequent research activity, particularly in the educational, social, and biomedical fields. Educational departments may wish to know the mathematical abilities of a cohort of school children, obstetricians need to know the normal birth weight, and so on.

Glossary

The mean is the mean value of the measurement of interest in the population
The Standard Deviations is the Standard Deviation of the measurement in the population. this is mostly unknown and has to be guesstimated from previous studies or pilot studies.
The error is half the confidence interval on each side of the mean
Sample size is the number of subject needed in the survey to find the correct result.
Percent confidence interval is the percent of time that the true mean will be within the confidence interval, if the survey is repeated many times.

Estimate sample size requirement :

At the planning stage, to estimate the sample size required, parameters needed are the percent confidence interval (usually 95%), the standard deviation (SD) to be expected in the population, and the error we will tolerate (er). The sample size required to achieve this level of precision can then be estimated.

An example

We wish to establish the mean IQ of first year university students. We expect the standard deviation to be 10, and we want a 95% confidence interval based on an error of ±2 IQ points (2 / 10 = 0.2 SDs). We can look up the Sample Size to Establish Population Means Expalnation and Tables Page and find that the sample size required is 99 subjects.

Determine error and confidence interval :

After data collection, we will know the sample size, and the mean and Standard Deviation of the measurement of interest. We can then nominate the percent confidence interval (usually 95%). With these, the error can be estimated, and from the error the actual confidence interval.

An example

We proceeded to measure 97 university student's IQ, and found the mean and Standard Deviation of IQ in the group measured to be 110 and 12 accordingly. The error for a 95% confidence interval as calculated from the p[rogram in the Sample Size for Population Mean Program Page is ±2.4. The 95% CI is therefore 110±2.4, 107.6 to 112.4

Given that the validity of research results depends on adequate sample size, and sample size requirements depends on a correct estimate of the population Standard Deviation of the measurement of interest, an accurate estimation of population Standard Deviations is a critical starting point in any research. Despite the urging of professional statisticians however, it is surprising how infrequent one sees the precise estimation of Standard Deviations in the medical literature. Often published Standard Deviations from small samples are used for sample size estimations as if they are representative.

Only in medical laboratories and in high precision engineering are much attention given to precise estimation of Standard Deviation. In medical laboratories, particularly, a precise estimation of Standard Deviation is necessary in order to establish the normal range of measurements, from which abnormal values are defined.

Glossary

Error is measured as a percentage of the true Standard Deviation value
Confidence interval depends on the actual Standard Deviation value, from which the width of the error can be calculated from the percentage error.
Sample size is the number of subject needed in the survey to find the correct result.
Percent confidence interval is the percent of time that the true standard deviation will be within the confidence interval, if the survey is repeated many times.

Estimate sample size requirement :

At the planning stage, to estimate the sample size required, parameters needed are the percent confidence interval (usually 95%), and the error tolerable as a percent of the true Standard Deviation

An example

We have just started a medical testing laboratory, and prior to establishing a normal range for blood sugar levels, we wish firstly to establish its Standard Deviation amongst normal subjects. We suspect the standard deviation to be around 1, and we want the 95% confidence interval to be ±0.1 (0.1/1 *100 = 10%). We can look up the Sample Size to Establish Population Standard Deviation Explanations and Tables Page and find that the sample size required is 193 samples of blood.

Determine error and confidence interval :

After data collection, we will know the sample size, and the Standard Deviation of the measurement of interest in that sample. We can then nominate the percent confidence interval (usually 95%). With these, the error can be estimated, and from the error the actual confidence interval.

An example

We proceeded to measure 193 blood sugars, and found the standard deviation of blood sugar to be 1.2. The error for a 95% confidence interval as calculated from the Sample Size to Establish Population Standard Deviation Program Page is ±10%, and 10% of 1.2 is 0.12. The 95% CI of standard deviation of blood sugar is therefore 1.2±0.12, 1.08 to 1.32

Establishing population proportion is a common activity. A politicians may ask what is the proportion of the population that will vote for his party. A drug company may ask what is the proportion of a certain age group will have the illness that requires its medicine. A marketing managers may ask what is the proportion of teenagers that think a particular clothing style is cool.

Glossary

Although percent is often correctly used, these pages will use the term proportion (prop = numbers positive / total number), a number between 0 (nobody) and 1 (everybody) in order to avoid confusion with the other percent, the 95% confidence interval.
The error (er) will also be a proportion, and represent half the range of the confidence interval. The confidence interval will therefore be prop±er
Sample size is the number of subject needed in the survey to find the correct result.
Percent confidence interval is the percent of time that the true proportion will be within the confidence interval, if the survey is repeated many times.

There are two common methods of calculations

The first is to transform a proportion into a mean and a standard error that assumes an underlying Normal distribution. This is the most common method used. However, the assumption of Normal distribution becomes increasingly erroneous when the proportion concerned is close to the extremes (0 or 1), or when the sample size is very small. The method becomes unacceptable if the confidence interval overlaps the ends (prop-er<0 or prop+er>1), or if the sample size is less than 10. When this happens the more precise calculations based on the Binomial distribution is required.

The more precise method is to base calculations on the Binomial distribution which truly reflects the behaviour of proportions. It calculates the probability of having a count of those with positive attributes in a population of defined size and defined proportion with positive attributes. As proportions are exponentially distributed, the confidence interval so calculated is asymmetrical. The calculation also involves repeated estimate of the binomial coefficient, which consumes much computing time when the sample size become large.

A common compromise is to use the normal distribution for calculation unless the sample size is less than 10 or if the confidence interval overlaps 0 or 1, when the binomial distribution is then used.

Estimate sample size requirement :

At the planning stage, to estimate the sample size required, parameters needed are the percent confidence interval (usually 95%), roughly the proportion we expect to find (P), and the error we will tolerate (ER). The sample size required to achieve this level of precision can then be estimated.

An example

We wish to establish the proportion of the population that will vote for the Labour Party at the next election. We suspect this may be 52% (prop=0.52), and we want a precision of ± 5% (er=0.05). We want to be able to get the same result 95% of the time (95% CI). We can look up the Sample Size to Establish Population Proportions Explanation, Calculations, and Tables Page and find that the sample size required is 381 subjects.

Determine error and confidence interval :

After data collection, we will know the sample size, and the proportion of positives in that sample. We can then nominate the percent confidence interval (usually 95%). With these, the error can be estimated, and from the error the actual confidence interval.

An example

We proceeded to ask 381 people whether they will vote labour. At the end of which we found 150 who said they will (prop=150/381=0.39) The error for a 95% confidence interval as calculated from the Sample Size to Establish Population Proportions Explanation, Calculations, and Tables Page is ±0.049. The 95% CI is therefore 0.39±0.049, 0.341 to 0.439 (34% to 44%)

Sample size for population proportions and population mean

Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 131-135

Sample size for population standard deviation

Greenwood JA and Sandomire MM (1950) Journal of the American Statistical Association 45 (250) p. 257 - 260

Burnett RW (1975)Accurate estimation of standard deviations for quantitative methods used in clinical chemistry. Clin. Chem. 21 (13) p. 1935-1938