Related link :
Sample Size Introduction and Explanation Page
Sample Size for Population Parameters Explanation Page
Introduction
Tables for Sample Size
Program
References
Explanation
Models & Nomenclature
This page provides tables and a Javascript program for samples size needed to establish the population
proportion of a parameter.
Calculation of sample size and error for population proportions requires the use of the Binomial
Coefficient to produce an approximate number, then use iterative calculations to obtain results that are
increasingly precise, until the correct answer is given. As the Binomial Coefficient requires
the calculation of Factorial numbers, the time require for computation increases exponentially with
the sample size involved. The php engine usually allows a maximum of 30 seconds for any single process, and
this is insufficient when the sample size exceeds 300.
This page provides tables for estimating sample size and error estimations that should suffice most research situations,
and users are encouraged to use these.
On the exceptional occasion that the tables are insufficient, the Javascript program can be used. .
The other is a Javascript program to perform those calculations required by the researcher, but not covered
by the tables.
In most cases the sample size can be immediately obtained from the tables included in this page.
If the user needs to calculate a sample size or confidence interval that cannot be obtained from the tables,
a small Javascript program is provided for its calculation.
Please note however that some browsers have time limits, and when that limit is reached it asks the user whether to continue or not.
Although long programs can be run, it does require the user to attend and repeatedly tell the browser to continue. The limits are as follows.
- Internet Explorer - 5 million statements
- Firefox - 10 secs
- Safari - 5 secs
- Chrome - no time limit
- Opera - no time limit
Normal Transform Distribution
The most common way proportions are handled is to transform it into a Normally distributed measurement, where the
estimated proportion is the mean, and the variation (error) around it expressed as Standard Error. Using this model,
the 95% confidence interval (preoportion ± error) of a proportion can be easily calculated.
Calculations using the Normal Transformation are based on the two tail model. To calculate for the one tail model, the Type I Error needs to be doubled, and the 90% confidence interval used. The results of 90% confidence interval (p=0.1) for the two tail model are the same as the 95% confidence interval (p=0.05) for the one tail model.
The transformed proportion however is only approximately Normal, when the sample size is large (over 30) and
the proportion near the center (0.15-0.85). As sample size decreases and the proportion involved are close to 0 or 1,
the assumption of normality becomes increasingly invalid.
Binomial Distribution
The more accurate approach in these difficult circumstances would be to assume the Binomial distribution, which is what
proportions are. Researchers are reluctant to use this for two reasons. Firstly, the calculation is
tedious, even with the use of computers, and sample size exceeding 200 takes a long time to compute. Secondly,
Binomial distribution is exponential in nature, so the variation on the higher side is greater than the lower side, so
two variations (errors) will have to be taken into consideration.
Calculations using the binomial distribution are based on the one tail model. As sample size for the higher and lower confidence intervals are asymmetrical, the two tail model is inappropriate
The tables and program are therefore both provided on this page, and users can decide which ones to use.
Nomenclature : The following terms are used in considering proportions
- Confidence level : usually expressed as a percent, represents the percent of time that the result can be replicated if
the research exercise is repeated many many times. The common level is 95%
- Proportion where prop = npositive / (npositive + nnegative), usually
expressed as a number between 0 and 1.
- Error represents the variance of the proportion. In the Normal distribution model, error is
± 1.96(SE). In the Binomial distribution, error on each side of the proportion is computed separately.
- The Confidence Interval is the range within which the true proportion is likely to be. In the Normal
distribution model it is proportion ± 1.96SE. In the Binomial distribution model it is proportion -
error of lower end to proportion + error of the upper end.
How information the tables and Javascript program produce are used
- In the planning of the study, the sample size required to allow a confident interpretation of
the results can be calculated.
- At the end of data collection, when the sample size used and proportion observed are available, the margin of error
such as the 95% confidence interval, can be calculated. This allows a confident interpretation of the results.
Explanation
Sample Size for 90%CI
Sample Size for 95%CI
Sample Size for 99%CI
The 3 tables are for 90%, 95%, and 99% confidence interval. In each table, the rows represents the estimated proportion, and the columns represents tolerable error (also in proportions). These are from 0.05 (5%) and increased at 0.05 intervals.
The sample size in each cell is in the format a(b:c), where
- a is the sample size calculated with the assumption of transformation to Normally distributed mean. The sample size is calculated
for the two tail model
- b and c are the sample sizes with the assumptions of Binomial distribution, b being on the lower end towards 0 (minus error) and c the upper end towards 1 (plus error). The samplesize is calculated for the one tail model
- * represents where sample size cannot be calculated, mostly because it takes too long
- <5 is presented whenever the sample size is calculated to be less than 5
| error=±0.05 | error=±0.1 | error=±0.15 | error=±0.2 | error=±0.25 | error=±0.3 | error=±0.35 | error=±0.4 | error=±0.45 | error=±0.5 |
p=0.05 | 52(*:89) | 13(*:40) | 6(*:22) | <5(*:18) | <5(*:9) | <5(*:7) | <5(*:6) | <5(*:6) | <5(*:5) | <5(*:<5) |
p=0.10 | 98(85:134) | 25(*:44) | 11(*:23) | 7(*:14) | <5(*:12) | <5(*:10) | <5(*:9) | <5(*:8) | <5(*:<5) | <5(*:<5) |
p=0.15 | 138(137:176) | 35(30:55) | 16(*:28) | 9(*:16) | 6(*:14) | <5(*:9) | <5(*:8) | <5(*:7) | <5(*:6) | <5(*:<5) |
p=0.20 | 174(173:202) | 44(43:57) | 20(18:31) | 11(*:17) | 7(*:12) | 5(*:11) | <5(*:7) | <5(*:6) | <5(*:6) | <5(*:5) |
p=0.25 | 203(202:233) | 51(50:65) | 23(22:33) | 13(10:21) | 9(*:13) | 6(*:9) | 5(*:8) | <5(*:8) | <5(*:5) | <5(*:5) |
p=0.30 | 228(229:248) | 57(59:68) | 26(25:34) | 15(12:18) | 10(9:14) | 7(*:8) | 5(*:8) | <5(*:7) | <5(*:6) | <5(*:<5) |
p=0.35 | 247(248:264) | 62(62:75) | 28(28:35) | 16(16:21) | 10(10:15) | 7(5:12) | 6(*:7) | <5(*:6) | <5(*:<5) | <5(*:<5) |
p=0.40 | 260(262:271) | 65(67:71) | 29(29:36) | 17(17:21) | 11(12:13) | 8(7:10) | 6(<5:6) | 5(*:6) | <5(*:5) | <5(*:<5) |
p=0.45 | 268(268:281) | 67(68:72) | 30(30:32) | 17(17:21) | 11(13:12) | 8(8:9) | 6(6:7) | 5(<5:5) | <5(*:5) | <5(*:<5) |
p=0.50 | 271(271:290) | 68(69:76) | 31(31:36) | 17(17:20) | 11(11:14) | 8(9:10) | 6(5:6) | 5(5:6) | <5(<5:<5) | <5(*:*) |
| error=±0.05 | error=±0.1 | error=±0.15 | error=±0.2 | error=±0.25 | error=±0.3 | error=±0.35 | error=±0.4 | error=±0.45 | error=±0.5 |
p=0.05 | 73(*:127) | 19(*:46) | 9(*:26) | 5(*:20) | <5(*:17) | <5(*:9) | <5(*:8) | <5(*:7) | <5(*:6) | <5(*:5) |
p=0.10 | 139(115:193) | 35(*:62) | 16(*:33) | 9(*:22) | 6(*:14) | <5(*:12) | <5(*:10) | <5(*:9) | <5(*:8) | <5(*:7) |
p=0.15 | 196(184:242) | 49(37:75) | 22(*:36) | 13(*:22) | 8(*:16) | 6(*:14) | <5(*:9) | <5(*:8) | <5(*:7) | <5(*:6) |
p=0.20 | 246(238:282) | 62(53:82) | 28(18:41) | 16(*:26) | 10(*:17) | 7(*:12) | 6(*:11) | <5(*:7) | <5(*:6) | <5(*:6) |
p=0.25 | 289(286:325) | 73(66:89) | 33(26:45) | 19(10:28) | 12(*:17) | 9(*:13) | 6(*:12) | 5(*:9) | <5(*:8) | <5(*:5) |
p=0.30 | 323(322:348) | 81(82:98) | 36(32:47) | 21(19:28) | 13(9:18) | 9(*:14) | 7(*:11) | 6(*:8) | <5(*:7) | <5(*:6) |
p=0.35 | 350(351:367) | 88(88:98) | 39(39:44) | 22(22:27) | 14(13:18) | 10(8:14) | 8(*:11) | 6(*:7) | 5(*:6) | <5(*:<5) |
p=0.40 | 369(369:386) | 93(94:101) | 41(42:46) | 24(24:26) | 15(14:16) | 11(9:11) | 8(7:10) | 6(*:6) | 5(*:6) | <5(*:5) |
p=0.45 | 381(384:392) | 96(97:101) | 43(44:47) | 24(24:27) | 16(17:18) | 11(10:12) | 8(8:9) | 6(<5:7) | 5(*:5) | <5(*:<5) |
p=0.50 | 385(385:402) | 97(97:104) | 43(43:48) | 25(25:28) | 16(17:18) | 11(11:12) | 8(7:8) | 7(5:8) | 5(<5:<5) | <5(*:*) |
| error=±0.05 | error=±0.1 | error=±0.15 | error=±0.2 | error=±0.25 | error=±0.3 | error=±0.35 | error=±0.4 | error=±0.45 | error=±0.5 |
p=0.05 | 127(*:209) | 32(*:80) | 15(*:43) | 8(*:27) | 6(*:22) | <5(*:18) | <5(*:16) | <5(*:9) | <5(*:8) | <5(*:7) |
p=0.10 | 239(185:314) | 60(*:102) | 27(*:52) | 15(*:33) | 10(*:23) | 7(*:20) | 5(*:13) | <5(*:12) | <5(*:10) | <5(*:9) |
p=0.15 | 339(297:403) | 85(57:116) | 38(*:61) | 22(*:36) | 14(*:27) | 10(*:20) | 7(*:15) | 6(*:13) | 5(*:9) | <5(*:8) |
p=0.20 | 425(393:482) | 107(88:137) | 48(28:67) | 27(*:41) | 17(*:27) | 12(*:21) | 9(*:16) | 7(*:12) | 6(*:10) | 5(*:7) |
p=0.25 | 498(474:553) | 125(110:153) | 56(42:73) | 32(18:45) | 20(*:29) | 14(*:21) | 11(*:16) | 8(*:13) | 7(*:9) | 5(*:8) |
p=0.30 | 558(549:598) | 140(129:158) | 62(52:77) | 35(25:44) | 23(12:28) | 16(*:21) | 12(*:17) | 9(*:11) | 7(*:8) | 6(*:7) |
p=0.35 | 604(605:638) | 151(148:164) | 68(65:78) | 38(33:44) | 25(19:27) | 17(10:21) | 13(*:15) | 10(*:12) | 8(*:9) | 7(*:7) |
p=0.40 | 637(637:661) | 160(159:171) | 71(72:76) | 40(37:46) | 26(22:28) | 18(14:21) | 13(9:15) | 10(*:11) | 8(*:8) | 7(*:6) |
p=0.45 | 657(657:672) | 165(166:172) | 73(73:78) | 42(42:43) | 27(26:27) | 19(17:20) | 14(10:14) | 11(6:11) | 9(*:9) | 7(*:5) |
p=0.50 | 664(665:680) | 166(167:172) | 74(75:78) | 42(43:44) | 27(27:28) | 19(17:20) | 14(11:14) | 11(9:10) | 9(5:6) | 7(*:*) |
Three parameters are needed for sample size calculations
- The level of confidence, usually 95%
- The anticipated proportion, expressed as a number between 0 and 1
- The tolerable error, also expressed in terms of a proportion.
Three parameters are needed for confidence interval estimations
- The level of confidence, usually 95%
- The anticipated proportion, expressed as a number between 0 and 1
- The sample size collected.
Two models can be used.
- Based on the Normal distribution, the quickest, and most commonly used. The calculation is based on a two tail model. For
the one tail model, the Type I Error needs to be doubled. For example, the results for sample size and precision for
the 90% confidence interval (p=0.1) in the two tail model are the same as the 95% confidence interval (p=0.05) for the
one tail model
- Based on the Binomial distribution, greater precision but requires prolong iterative calculation,
which may take many minutes if the sample size involved is over 200. The sample size and intervals
are also assymmetrical and requires separate calculations. The one tail model is calculated
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical
Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 135
|