Related link :  
Sample Size Introduction and Explanation Page
 
Sample Size to Establish Population Proportions Explanation, Calculations, and Tables Page
 
Sample Size to Establish Population Means Expalnation and Tables Page
 
Sample Size to Establish Population Standard Deviation Program Page
 
Sample Size to Establish Population Standard Deviation Explanations and Tables Page
Introduction
Population Mean
Population Standard Deviation
Population Proportion
References
 
To evaluate a population parameter is probably the most common research activity
   anywhere.   It is the basis of market research, political polling, disease prevalence
   survey, economic activity, quality of schools, and so on.
 Increasingly, the statistics involved are used also in quality control situations, where 
   data obtained from sampling are matched against a bench mark value. The Quality Statistics Explained Page
    will expand on this use of this statistics.
 To design a proper survey to evaluate a population parameter involves many 
   technical issues, such as how to select the right population, what is the most 
   appropriate measurement or question to ask, how to control bias, and so on.
 This site addresses only the single issue of estimating sample size requirement
   during the planning stage, and once the data is obtained, to estimate the accuracy
   of the results.
 Glossary : There are a number of terms used in all population parameter
   estimations.
 
- Parameter value  is what we set out to establish.  
 - Error  is the width of uncertainty
 - Confidence interval  is the range within which the true parameter value 
    is likely to be.   It is Value ±Error
 - Percent of confidence interval is the percent of time the results we obtained
    will likely to fall within the confidence interval, should we repeated the survey
   many time with the same sample size.   This represents how sure we can be 
   that the result is within the confidence interval.   The most commonly used is
   the 95% confidence interval, which means that 95% of the time we will get a 
   result within the confidence interval if we were to repeat the survey (or 95% sure).
  
There are therefore two calculations.
 
- At the planning stage, we define the confidence interval we required, and 
    hence the error tolerable, from these we can estimate the sample size required to do the job.
 - When all the data has been collected, we know the central value, the variation, 
    and the sample size, from these we can estimate the error, and hence the 
   confidence interval of the results we have.    
  	  	 	  	 
 
Establishing population means is a frequent research activity, particularly 
in the educational, social, and biomedical fields.   Educational departments 
may wish to know the mathematical abilities of a cohort of school children, 
obstetricians need to know the normal birth weight, and so on.
 Glossary 
 
- The mean  is the mean value of the measurement of interest in the population
 - The Standard Deviations  is the Standard Deviation of the measurement in the population.
    this is mostly unknown and has to be guesstimated from previous studies or pilot studies.
 - The error  is half the confidence interval on each side of the mean
 - Sample size is the number of subject needed in the survey to find the
    correct result.
 - Percent confidence interval is the percent of time that the true
    mean will be within the confidence interval, if the survey is repeated many times.		   
  	  
Estimate sample size requirement : 
At the planning stage, to estimate 
   the sample size required, parameters needed
   are the percent confidence interval (usually 95%), the standard deviation (SD) 
   to be expected in the population, and the error we will tolerate (er).   The 
   sample size required to achieve this level of precision can then be estimated.
 An example
 We wish to establish the mean IQ of first year university students.   We expect
   the standard deviation to be 10, and we want a 95% confidence interval based
   on an error of ±2 IQ points (2 / 10 = 0.2 SDs).  We can look up the Sample Size to Establish Population Means Expalnation and Tables Page
    and find that the 
   sample size required is 99 subjects.
 Determine error and confidence interval : 
 After data collection, we will know the sample size, and the mean and Standard Deviation of the 
   measurement of interest.  We can then nominate the percent confidence interval 
   (usually 95%).   With these, the error can be estimated, and from the error the 
   actual confidence interval.
 An example
 We proceeded to measure 97 university student's IQ, and found the mean and 
   Standard Deviation of IQ in the group measured to be 110 and 12 accordingly.
   The error for a 95% confidence interval as calculated from the p[rogram in the Sample Size for Population Mean Program Page
    is ±2.4.  The 95% CI is therefore 
   110±2.4, 107.6 to 112.4
  
Given that the validity of research results depends on adequate sample size, and
sample size requirements depends on a correct estimate of the population Standard Deviation of the measurement of interest, 
an accurate estimation of population Standard Deviations is a critical starting point in any research.
Despite the urging of professional statisticians however, it is surprising how infrequent one sees the 
precise estimation of Standard Deviations in the medical literature.   Often
published Standard Deviations from small samples are used for sample size estimations as if they are
representative.
 Only in medical laboratories and in high precision engineering are much 
   attention given to precise estimation of Standard Deviation.   In medical
   laboratories, particularly, a precise estimation of Standard Deviation is 
   necessary in order to establish the normal range of measurements, from which
   abnormal values are defined.
 Glossary
 
- Error is measured as a percentage of the true Standard Deviation value
 - Confidence interval depends on the actual Standard Deviation value, from
    which the width of the error can be calculated from the percentage error.
 - Sample size is the number of subject needed in the survey to find the
    correct result.
 - Percent confidence interval is the percent of time that the true
    standard deviation will be within the confidence interval, if the survey is repeated
		many times.		   
  	 	  
Estimate sample size requirement : 
At the planning stage, to estimate 
   the sample size required, parameters needed are the percent confidence 
   interval (usually 95%), and the error tolerable as a percent of the true Standard Deviation
 An example
 We have just started a medical testing laboratory, and prior to establishing
   a normal range for blood sugar levels, we wish firstly to establish its Standard Deviation
   amongst normal subjects.   We suspect the standard deviation to be around 1, and
   we want the 95% confidence interval to be ±0.1 (0.1/1 *100 = 10%).
   We can look up the Sample Size to Establish Population Standard Deviation Explanations and Tables Page
    
   and find that the sample size required is 193 samples of blood.
 Determine error and confidence interval : 
 After data collection, we will know the sample size, and the Standard Deviation of the 
   measurement of interest in that sample.  We can then nominate the percent confidence interval 
   (usually 95%).   With these, the error can be estimated, and from the error the 
   actual confidence interval.
 An example
 We proceeded to measure 193 blood sugars, and found the standard deviation of 
   blood sugar to be 1.2.  The error for a 95% confidence interval as calculated from the Sample Size to Establish Population Standard Deviation Program Page
 
    is ±10%, and 
   10% of 1.2 is 0.12.  The 95% CI of standard deviation of blood sugar is therefore 1.2±0.12, 1.08 to 1.32 
  
Establishing population proportion is a common activity.   A politicians may ask what
is the proportion of the population that will vote for his party.   A drug company may ask what
is the proportion of a certain age group will have the illness that requires its medicine.
A marketing managers may ask what is the proportion of teenagers that think a particular 
clothing style is cool.
 Glossary
 
- Although percent is often correctly used, these pages will use the term 
   proportion (prop = numbers positive / total number), a number between 0 (nobody) and 
   1 (everybody) in order to avoid confusion with the other percent, the 95% 
   confidence interval.
 - The error (er) will also be a proportion, and represent half the range of the confidence 
   interval.   The confidence interval will therefore be prop±er
 - Sample size is the number of subject needed in the survey to find the
    correct result.
 - Percent confidence interval is the percent of time that the true
    proportion will be within the confidence interval, if the survey is repeated
   many times.		   
  
There are two common methods of calculations
 
The first is to transform a proportion into a mean and a standard error that
   assumes an underlying Normal distribution.   This is the most common method used.
   However, the assumption of Normal distribution becomes increasingly erroneous 
   when the proportion concerned is close to the extremes (0 or 1), or when the 
   sample size is very small.   The method becomes unacceptable if the confidence 
   interval overlaps the ends (prop-er<0 or prop+er>1), or if the sample size is 
   less than 10.   When this happens the more precise calculations based on the 
   Binomial distribution is required.
 The more precise method is to base calculations on the Binomial distribution 
   which truly reflects the behaviour of proportions.   It calculates the 
   probability of having a count of those with positive attributes in a population 
   of defined size and defined proportion with positive attributes.
   As proportions are exponentially distributed, the confidence interval so
   calculated is asymmetrical.   The calculation also involves repeated estimate
   of the binomial coefficient, which consumes much computing time when the sample
   size become large.
 A common compromise is to use the normal distribution for calculation unless the 
   sample size is less than 10 or if the confidence interval overlaps 0 or 1,
   when the binomial distribution is then used. 
  
Estimate sample size requirement : 
At the planning stage, to estimate 
   the sample size required, parameters needed
   are the percent confidence interval (usually 95%), roughly the proportion we 
   expect to find (P), and the error we will tolerate (ER).   The sample size 
   required to achieve this level of precision can then be estimated.
 An example
 We wish to establish the proportion of the population that will vote for the 
   Labour Party at the next election.   We suspect this may be 52% (prop=0.52), and we
   want a precision of ± 5% (er=0.05).   We want to be able to get the 
   same result 95% of the time (95% CI).   We can look up the Sample Size to Establish Population Proportions Explanation, Calculations, and Tables Page
 and find that the sample size required is 381 subjects.
 Determine error and confidence interval : 
 After data collection, we will know the sample size, and the proportion of 
   positives in that sample.  We can then nominate the percent confidence interval 
   (usually 95%).   With these, the error can be estimated, and from the error the 
   actual confidence interval.
 An example
 We proceeded to ask 381 people whether they will vote labour.   At the end of which
   we found 150 who said they will (prop=150/381=0.39)  The error for a 95% confidence interval
   as calculated from the Sample Size to Establish Population Proportions Explanation, Calculations, and Tables Page
    
   is ±0.049. The 95% CI is therefore 0.39±0.049, 0.341 to 0.439 (34% to 44%)
  
Sample size for population proportions and population mean
 
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical 
Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 131-135 
 
Sample size for population standard deviation
 
Greenwood JA and Sandomire MM (1950) Journal of the American Statistical 
   Association 45 (250) p. 257 - 260
 Burnett RW (1975)Accurate estimation of standard deviations for quantitative 
   methods used in clinical chemistry. Clin. Chem. 21 (13) p. 1935-1938
    
  
 |