Prediction Exp

In StatTools, the discussions and algorithm of Prediction Statistics concerns the quality of relationship between a binary Test and a binary Outcome.

In reality, a much wider domain of tests and outcomes exists. Outcomes that are measurements, such as birth weight, and containing multiple categories, such as mental illness classifications, require specific and complex multivariate methods of analysis, and are covered elsewhere. Tests that are measurements, such as maternal height and age, requires different treatment and are considered in the Receiver Operator Characteristics (ROC) Explained Page .

	Outcome Positive	Outcome Negative
Test Positive	True Positive	False Positive
Test Negative	False Negative	True Negative

The framework under consideration for this page, in the Prediction Statistics Program Page , and in the Sample Size for Prediction Statistics Explained and Tables Page , is therefore as presented in the table to the right.

True Positive (TP) if they are test positive and outcome positive
False Positive (FP) if they are test positive but outcome negative
False Negative (FN) if they are test negative but outcome positive
True Negative (TN) if they are test negative and outcome negative

The statistics of evaluating the relationship between tests and outcomes occurs under the following scenarios

The collection of reference data, to evaluate the quality of relationship between a test and an outcome.
Developing the parameters that can be generalized and use in clinical situation in the future
In appropriate future situations, the use of the parameter to influence diagnostic decisions.

The terminologies and formulae for these procedures will be covered in the next panel

	Outcome Positive (O+)	Outcome Negative (O-)
Test Positive (T+)	True Positive (TP)	False Positive (FP)
Test Negative (T-)	False Negative (FN)	True Negative (TN)

This panel presents the terminology, and formulae for calculation, for the parameters involved in Prediction

Step 1. Defining the research parameters

The Test and Outcome must be defined, as shown in the table to the right. For example, we may decide to use the observation of an unengaged head in early labour as the Test to predict the need for a Caesarean Section as the Outcome. In this scenario, an observed unengaged head is Test Positive (T+), and an engaged head is Test Negative (T-). A baby delivered by Caesarean Section is Outcome Positive (O+) and one delivered vaginally is Outcome Negative (O-). The combination of these are as follows

A baby with unengaged head in early labour and subsequently delivered by Caesarean Section is Test Positive (T+) and Outcome Positive (O+), so it is a case of True Positive (TP)
A baby with unengaged head in early labour and subsequently delivered vaginally is Test Positive (T+) and Outcome Negative (O-), so it is a case of False Positive (FP)
A baby with engaged head in early labour and subsequently delivered by Caesarean Section is Test Negative (T-) and Outcome Positive (O+), so it is a case of False Negative (FN)
A baby with engaged head in early labour and subsequently delivered vaginally is Test Negative (T-) and Outcome Negative (O-), so it is a case of True Negative (TN)

Step 2. Collecting reference Data for evaluation

Data is collected to enable the evaluation of relationship between Test and Outcome. Although prospective collection of data can take place, the common approach is to retrospective use of data already collected for the following reasons

An easy access to greater volume of information
An ability to select representative samples
An ability to have similar number of Outcome Positives (O+) and Outcome Negative (O-) so that the statistical power for detecting both are similar

The sample size required is then determined with the help of tables in the Sample Size for Prediction Statistics Explained and Tables Page . The parameters required are

The Probability of Type I Error (α), the common value of 0.05 is usually used
The power to detect a significant predictor (1-β), the common value of 0.8 is usually used
A clinically useful prediction rate, which should be significantly better than the diagnostic equivalent of null at 0.5

Once the sample size required is calculated, this number of cases in Outcome Positive, and similar numbers of Outcome Negative, are collected for evaluation.

Step 2. Evaluating the quality of Prediction using the data collected.

Once the data is collected, the numbers can arranged according to the table at the top right corner of the panel.

O+ and O- are the numbers of cases which are Outcome Positive and Outcome Negative
T+ and T- are the number of cases which are Test Positive and Test Negative
TP is the number of cases that are True Positive (T+ and O+)
FP is the number of cases that are False Positive (T+ and O-)
FN is the number of cases that are False Negative (T- and O+)
TN is the number of cases that are True Negative (T- and O-)

From these primary numbers two sets of parameters can be calculated. Please note that all calculation are performed by programs in the Prediction Statistics Program Page . The formulae listed in this panel is to assist understanding only

Parameters of quality
- The True Positive Rate TPR = TP / O+ is the proportion of Outcome Positives that are Test Positive. TPR is also known as Sensitivity
- The True Negative Rate TNR = TN / O- is the proportion of Outcome Negatives that are Test Negative. TNR is also known as Specificity
- The False Positive Rate FPR = FP / O- is the proportion of Outcome Negatives that are Test Positive.
- The False Negative Rate FNR = FN / O+ is the proportion of Outcome Positives that are Test Negative.
- Mathematically, FPR = 1-TNR, and FNR = 1-TPR
- The statistical significance of each are calculated as follows
  - For TPR, Standard Error SE = sqrt(TPR(1-TPR)/O+), and one tail 95% confidence interval is >TPR-1.65SE
  - For TNR, Standard Error SE = sqrt(TNR(1-TNR)/O-), and one tail 95% confidence interval is >TNR-1.65SE
Parameters for future clinical usage
- The Likelihood Ratio for Test Positive LR+ = TPR / FPR is the ratio O+/O- when Test Positive
- The Likelihood Ratio for Test Negative LR- = FNR / TNR is the ratio O+/O- when Test Negative

Step 3. Use of Likelihood Ratio to make clinical decisions.

The Likelihood Ratio can be used to modify the perception of risk in applicable clinical situations, using a Bayesian Probability Algorithm

The risk or probability of an outcome before results of the Test is known is called Pre-test Probability
The risk can be converted to Pre-test Odd = Pre-test Probability / (1 - Pre-test Probability)
The Post-test Odd is then obtained Post-test Odd = Pre-test Odd * Likelihood Ratio
The Post-test Probability = Post-test Odd/ (1 + Post-test Odd)

If there are more than one Test to an Outcome, and providing the Tests are not tautological (so strongly correlated they are repeat of the same test), the Post-test Probability after one Test becomes the Pre-test Probability of the next Test, so that, with increasing information, the perception of risks is modified.

This panel provides examples to demonstrate the concepts and formulations described in the two previous panels. Please Note that the numbers in these examples are entirely artificially made up to demonstrate the procedures, and they do not reflect any real clinical information. Please also note : The numbers presented in this page are adjusted to 2 decimal places, so may be slightly different to that produced with different rounding precision

We are midwives wishing to establish a method of assessing the risk or probability of Caesarean Section in women who are admitted to the labour ward in early labour. Outcome Positive (O+) is Caesarean Section (CS), and Outcome Negative (O-) is vaginal delivery (VD).

Study 1 . Parity :

We wish to use the parity of the woman as the Test, as we know that women having their first baby are More likely to require a Caesarean Section. Test Positive (T+) is nulliparous pregnancy (NP), Test Negative (T-) is Multiparous Pregnancy (MP)

Although we are uncertain initially, we feel that a diagnostic accuracy of 70% and significantly greater than 50% for both True Positive Rate and True Negative Rate would be clinically useful. We use the commonly accepted parameter of α=0.05, and Power = 0.8.

We use α=0.05, Power (1-β)=0.8, and s=0.7, using the table in Sample Size for Prediction Statistics Explained and Tables Page , we will need 29 cases of Caesarean Section and 29 cases of vaginal delivery to evaluate or Test/Outcome relationship.

	CS (O+)	VD (O-)
NP (T+)	TP=12	FP=3
MP (T-)	FN=18	TN=27

We reviewed our obstetric records of 30 women delivered by Caesarean Section, and 30 delivered vaginally, and obtained the data as shown in the table to the right

True Positive TP = 12, False Positive FP = 3, False Negative FN = 18, True Negative TN = 27
True Positive Rate TPR = 12 / 30 = 0.4, SE = sqrt(0.4(0.6)/30) = 0.09 95% Confidence Interval = >0.4-1.65(0.09) = >0.09
True Negative Rate TNR = 27 / 30 = 0.9, SE = sqrt(0.9(0.1)/30) = 0.06 95% Confidence Interval = >0.9-1.65(0.06) = >0.81
False Positive Rate FPR = 1-TNR = 0.1, False Negative Rate FNR = 1-TPR = 0.6
Likelihood Ratio Test Positive LR+ = TPR/FPR = 0.4/0.1 = 4.0
Likelihood Ratio Test Negative LR- = FNR/TNR = 0.6/0.9 = 0.67

We use the two Likelihood Ratios in a public hospital that has an overall Caesarean Section Rate of 20%

Pre-test Probability = 0.2, Pre-test Odd = (0.2/(1-0.2)) = 0.25
For nullipara LR+ = 4.0, Post-Test Odd = 0.25*4 = 1, Post-test Probability = 1/(1+1) = 0.5
For multipara LR- = 0.67, Post-Test Odd = 0.25*0.67 = 0.17, Post-test Probability = 0.17/(1+0.17) = 0.14
In this public hospital with overall CS rate of 20%, nullipara CS rate is 50%, multipara CS rate is 14%

We use the two Likelihood Ratios in a private hospital that has an overall Caesarean Section Rate of 35%

Pre-test Probability = 0.35, Pre-test Odd = (0.35/(1-0.35)) = 0.54
For nullipara LR+ = 4.0, Post-Test Odd = 0.54*4 = 2.15, Post-test Probability = 2.15/(1+2.15) = 0.68
For multipara LR- = 0.67, Post-Test Odd = 0.54*0.67 = 0.36, Post-test Probability = 0.36/(1+0.36) = 0.27
In this private hospital with overall CS rate of 35%, nullipara CS rate is 68%, multipara CS rate is 27%

Study 2 . Head Engagement :

We wish to use whether the head is engaged when admitted in early labour as the Test, as we know that those with an unengaged head in early labour are more likely to require a Caesarean Section. Test Positive (T+) is head unengaged (HU), Test Negative (T-) is head engaged (HE) Pregnancy (MP)

Although we are uncertain initially, we feel that a diagnostic accuracy of 55% and significantly greater than 50% for both True Positive Rate and True Negative Rate would be clinically useful. We use the commonly accepted parameter of α=0.05, and Power = 0.8.

We use α=0.05, Power (1-β)=0.8, and s=0.55, using the table in Sample Size for Prediction Statistics Explained and Tables Page , we will need 122 cases of Caesarean Section and 122 cases of vaginal delivery to evaluate or Test/Outcome relationship.

	CS (O+)	VD (O-)
HU (T+)	TP=39	FP=26
HE (T-)	FN=91	TN=104

We reviewed our obstetric records of 130 women delivered by Caesarean Section, and 130 delivered vaginally, and obtained the data as shown in the table to the right

True Positive TP = 39, False Positive FP = 26, False Negative FN = 91, True Negative TN = 104
True Positive Rate TPR = 39 / 130 = 0.3, SE = sqrt(0.3(0.7)/130) = 0.04 95% Confidence Interval = >0.3-1.65(0.04) = >0.23
True Negative Rate TNR = 104 / 130 = 0.8, SE = sqrt(0.8(0.2)/130) = 0.04 95% Confidence Interval = >0.8-1.65(0.04) = >0.74
False Positive Rate FPR = 1-TNR = 0.2, False Negative Rate FNR = 1-TPR = 0.7
Likelihood Ratio Test Positive LR+ = TPR/FPR = 0.3/0.2 = 1.5
Likelihood Ratio Test Negative LR- = FNR/TNR = 0.7/0.8 = 0.88

We use the two Likelihood Ratios in a public hospital that has an overall Caesarean Section Rate of 20%

Pre-test Probability = 0.2, Pre-test Odd = (0.2/(1-0.2)) = 0.25
For unengaged head LR+ = 1.5, Post-Test Odd = 0.25*1.5 = 0.38, Post-test Probability = 0.38/(1+0.38) = 0.27
For engaged head LR- = 0.88, Post-Test Odd = 0.25*0.88 = 0.22, Post-test Probability = 0.22/(1+0.22) = 0.18
In this public hospital with overall CS rate of 20%, those with unengaged head in early labour have CS rate of 27%, those with engaged head 18%

We use the two Likelihood Ratios in a private hospital that has an overall Caesarean Section Rate of 35%

Pre-test Probability = 0.35, Pre-test Odd = (0.35/(1-0.35)) = 0.54
For unengaged head LR+ = 1.5, Post-Test Odd = 0.54*1.5 = 0.81, Post-test Probability = 0.81/(1+0.81) = 0.45
For engaged head LR- = 0.88, Post-Test Odd = 0.54*0.88 = 0.47, Post-test Probability = 0.47/(1+0.47) = 0.32
In this private hospital with overall CS rate of 35%, those with unengaged head in early labour have CS rate of 45%, those with engaged head 32%

Study 3 . Combining the two : From the previous two studies we know the following

For parity, LR+ (nullipara) = 4.0, LR- (multipara) = 0.67
For head engagement, LR+ (head unengaged) = 1.5, LR- (head engaged) = 0.88

In the public hospital with an overall Caesarean Section Rate of 20%

Pre-test Probability = 0.2, Pre-test Odd = (0.2/(1-0.2)) = 0.25
For nullipara Post-Test Odd = 1, Post-test Probability = 0.5. This can be used as the Pre-test Probability and Odd for the next stage
- For unengaged head LR+ = 1.5, Post-Test Odd = 1*1.5 = 1.5, Post-test Probability = 1.5/(1+1.5) = 0.60
- For engaged head LR- = 0.88, Post-Test Odd = 1*0.88 = 0.88, Post-test Probability = 0.88/(1+0.88) = 0.47
For multipara Post-Test Odd = 0.17, Post-test Probability = 0.14
- For unengaged head LR+ = 1.5, Post-Test Odd = 0.17*1.5 = 0.24, Post-test Probability = 0.24/(1+0.24) = 0.20
- For engaged head LR- = 0.88, Post-Test Odd = 0.17*0.88 = 0.14, Post-test Probability = 0.14/(1+0.14) = 0.13

In the private hospital with an overall Caesarean Section Rate of 35%

Pre-test Probability = 0.35, Pre-test Odd = (0.35/(1-0.35)) = 0.54
For nullipara Post-Test Odd = 2.15, Post-test Probability = 0.68. This can be used as the Pre-test Probability and Odd for the next stage
- For unengaged head LR+ = 1.5, Post-Test Odd = 2.15*1.5 = 3.23, Post-test Probability = 3.23/(1+3.23) = 0.76
- For engaged head LR- = 0.88, Post-Test Odd = 2.15*0.88 = 1.90, Post-test Probability = 1.90/(1+1.90) = 0.68
For multipara Post-Test Odd = 0.36, Post-test Probability = 0.27
- For unengaged head LR+ = 1.5, Post-Test Odd = 0.36*1.5 = 0.54, Post-test Probability = 0.54/(1+0.54) = 0.35
- For engaged head LR- = 0.88, Post-Test Odd = 0.36*0.88 = 0.32, Post-test Probability = 0.32/(1+0.32) = 0.24

		Pre-test Probability	Likelihood Ratio	Post-test Probability
Public Hospital	Nullipara	0.20	4.00	0.50
	Multipara	0.20	0.67	0.14
	Unengaged Head	0.20	1.50	0.27
	Engaged Head	0.20	0.88	0.18
	Nullipara+Unengaged Head	0.50	1.50	0.60
	Nullipara+Engaged Head	0.50	0.88	0.47
	Multipara+Unengaged Head	0.14	1.50	0.20
	Multipara+Engaged Head	0.14	0.88	0.13
Private Hospital	Nullipara	0.35	4.00	0.68
	Multipara	0.35	0.67	0.27
	Unengaged Head	0.35	1.50	0.45
	Engaged Head	0.35	0.88	0.32
	Nullipara+Unengaged Head	0.68	1.50	0.76
	Nullipara+Engaged Head	0.68	0.88	0.65
	Multipara+Unengaged Head	0.27	1.50	0.35
	Multipara+Engaged Head	0.27	0.88	0.24

Summary

The table to the right shows how the Likelihood ratios can be used to modify diagnosis, in terms of probability, and how multiple tests can be sequentially integrated, so that clinical decisions can be made and modified as additional information becomes available.

The end results is independent of how the sequence is arranged, so that using the unengaged head to modify decisions made with parity, or using parity to modify decisions made with unengaged head, will produce the same results. The only thing users need to be careful about is that, when multiple Tests are used, each should represent an independent predictor. Tests so closely correlated that they represents multiple versions of the same thing leads to inappropriate weighting of some predictors, and will produce misleading results.

Sensitivity and specificity :

Practical Statistics for medical Research. (1994) F.Altman. Chapman Hall, London. ISBN 0 412 276205 (First Ed. 1991) p.409-417

Likelihood Ratio :

Simel D.L., Samsa G.P., Matchar D.B. (1991) Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J. Clin. Epidemiology vol 44 No. 8 pp 763-770

Pre and post test probability :

Deeks J.J, and Morris J.M. (1996) Evaluating diagnostic tests. In Bailliere's Clinical Obstetrics and Gynaecology Vol.10 No. 4, December 1996 ISBN 0-7020-2260-8 p. 613-631.

Fagan T.J. (1975) Nomogram for Bayer's Theorem. New England J. Med. 293:257

General :

Sackett D, Haynes R, Guyatt G, Tugwell P. (1991) Clinical Epidemiology: A Basic Science for Clinical Medicine. Second edition. ISBN 0-316-76599-6.

Sample size :

Beam, C. A. (1992), "Strategies for Improving Power in Diagnostic Radiology Research," American Journal of Radiology, 159, 631-637.

Casagrande, J. T., Pike, M. C., and Smith, P. G. (1978), "An Improved Approximate Formula for Calculating Sample Sizes for Comparing Two Binomial Distributions," Biometrics, 34, 483-486.