ROCs Exp

StatTools offers a suite of programs and explanations related to the statistics predictive tests. As all of them contains a great deal of details, many aspects discussed elsewhere will not be repeated here. This page discusses only Receiver Operator Characteristics (ROC), and supports the program in the Receiver Operator Characteristics (ROC) Program Page .

Users are referred to the Prediction Statistics Explanation Page and the Meta-analysis for Predictions Explanation Page for other aspects of prediction, and the Sample Size for Receiver Operator Characteristics (ROC) Explained and Tables Page for sample size required in using the ROC.

Historical Aspects : During the Second World War, the British invented the RADAR which reflects a radio wave against an object, and so were able to detect the German bombers coming over the channel. The early versions of RADAR was imprecise, and the signal contains much electronic noise and difficult to interpret. If the alarm was raised with a weak signal, there was a risk of false signal, wasting much fuel and energy in response. If the alarm was not raised until a clear signal is obtained, the risk was that the fighters were scrambled too late and more bombs would be dropped on the cities. In order to analyse the signal and make decisions on how to respond, the scientist developed a statistical method of handling the signals, and named this the RADAR Receiver Operator Characteristics. The method became widely used in signal analysis in many aspects of science and industry, and adopted in the analysis of tests and diagnosis in Medicine, under the term Receiver Operator Characteristics (ROC), with the RADAR part removed.

Theoretical Considerations : A scalar, preferably normally distributed measurement is used as a Test to predict a binary Outcome. We will use the use of maternal height (Test) to predict the need for Caesarean Section to deliver the baby (Outcome). The relationship between Test and Outcome is best demonstrated as the diagram to the right.

In a sample that was delivered without Caesarean Section (Outcome Negative O-, red), the normal distribution curve defines the height we will use to make a binary decision whether it is Test Positive or Negative.
- Values of Test less than the decision line (shorter women, maroon) represents the number of False Positives (FP)
- Values of Test greater than the decision line (taller women, red), represents the True Negatives (TN)
- True Negative Rate TNR = TN / (TN+FP), and False Positive Rate FPR = FP / (TN+FP) = 1-TNR
In a sample that was delivered by Caesarean Section (Outcome Positive O+, blue), the normal distribution curve defines the height we will use to make a binary decision whether it is Test Positive or Negative.
- Value of Test less than the decision line (shorter women, blue) represents the number of True Positives (TP)
- Value of Test greater than the decision line (taller women, navy), represents the false negatives (FN)
- True Posive Rate TPR = TP / (TP+FN), and False Negative Rate FNR = FN / (TP+FN) = 1-TPR

If the value used to make the decision changes, in our example, if the maternal height we used to predict changes, then TP,FP,FN, and TN also changed, and the TPR and FPR changed accordingly.

The changing relationship between FPR and TPR forms a curve, as shown in the diagram to the left. This curve is called the Receiver Operator Characteristics Curve (ROC Curve), and the area under this curve is called the Area Under the ROC usually represented by the abbreciation θ

In a perfect test, where the values of the test from Outcome Positives do not overlap those from Outcome Negatives, TPR will increase from 0 to 1 while FPR remains at 0. TPR will then remain at 1 while FPR increases from 0 to 1. In other words, the ROC curve hugs the left and top border, and the area under it θ=1

In a completely useless test, the values of measurement from Outcome Positives and Outcome Negatives overlap completely, so that the TPR and FPR increase together. The ROC curve is therefore the diagonal, and the Area under it θ=0.5

In most cases, there is partial overlap of values, and the ROC curve looks pretty much like that in the diagram above and to the left, with a value between 0.5 and 1

The null value for θ is therefore 0.5, a useful test has a 95% confidence interval of θ that is greater than 0.5. Where θ<0.5, the test predicts the opposite of the intended outcome.

H	L
1	147.5
1	147.5
1	149.0
1	150.0
1	150.5
1	151.0
1	151.5
0	152.5
1	152.5
1	152.5
0	153.5
0	153.5
1	153.0
0	154.0
0	154.0
1	153.5
1	153.5
1	154.0
1	154.0
1	154.0
0	155.5
0	156.0
0	156.0
0	156.0
1	155.5
1	155.5
1	156.0
1	156.0
1	156.0
0	157.0
1	156.5
1	156.5
0	157.5
0	157.5
1	157.5
0	158.5
0	158.5
0	158.5
0	158.5
0	158.5
0	159.0
0	159.0
1	158.5
0	159.5
0	159.5
1	160.0
0	161.5
0	161.5
0	162.0
0	162.5

The first two columns of the default example data from the Receiver Operator Characteristics (ROC) Program Page will be used to demonstrate how a single ROC is calculated and interpreted. Please note that the data is generated by the computer to demonstrate the method, and not real clinical observations

Reference Data

We wish to use the maternal height as the Test to predict the need for Caesarean Section as the Outcome. We reviewed our medical records and samples 25 cases that were delivered vaginally and 25 cases delivered by Caesarean Section, and recorded the maternal height. The data is as shown in the table to the right.

Column 1 shows the outcome, with 0 represent vaginal delivery (Outcome Negative, O-) and 1 Caesarean Section (Outcome Positive, O+). As the higher value (1) represent Outcome Positive (O+), the first row of column 1 is H

Column 2 shows the test values, maternal height in cms. As a reduced height (lower value) is used to predict Caesarean Section (O+), the first row of column 2 is L for low values to be Test Positive.

Theoretical Discussions

The plot to the left shows the distribution of the data points. Those requiring Caesarean Section (O+) are on the left and those delivered vaginally (O-) on the right.

If we were to arbitrarily draw a decision line at maternal height of 155cms, as shown in the diagram, the 15 cases below that line on the left would be True Positives (TP=15), as they are Test Positive and Outcome Positive. The 10 cases above the line on the left would be False Negatives (FN=10), as they are Test Negative but Outcome Positive. The 5 cases below the line on the right would be False Positives (FP=5), as they are Test Positive but Outcome Negative. The 20 cases above the line on the right would be True Positives (TP=20), as they are Test Negative and Outcome Negative. The calculated derivatives, the True Positive Rate is then TPR = TP / (TP + FN) = 15 / 25 = 0.6, and the False Positive Rate FPR = FP / (FP + TN) = 5 / 25 = 0.2

If the decision line is moved upward towards higher values, the number of positives, both true and false would increase and the number of negatives, both true and false, would decrease. If the decision line is moved downwards towards lower values, the reverse follows. The TP, FP, FN, TN, and their calculated derivatives of TPR and FPR, therefore changes as the decision line moves throughout the whole range of the test values. The relationships between TPR and FPR over the range of the values form the Receiver Operator Characteristic Curve, as shown in the diagram to the left.

Using the ROC to set the decision cut off value

Looking at the diagram to the left, We can see that, if we were to set the cut off value for decision at 160cms, 24 out of 25 cases that had Caesarean Section would be True Positive, but 21 out of 25 cases delivered vaginally would become False Positive, making True Positive Rate TPR = 24/25 = 0.96 and False Positive Rate FPR = 21/25 = 0.84.

At the other extreme of setting the cut off value at 153cms, 9 out of 25 are True Positives and 1 out of 25 False Negatives, making the True Positive Rate TPR = 9/25 = 0.36 and the False Positive Rate FPR = 1/25 = 0.04

A clinician given this set of data may choose a cut off value anywhere between these extremes to suit his/her purposes. The common approach is to set decision cut offs according to the purposes for using the test, and they are as follows

The most efficient prediction point
- Using a measure of accuracy, the Youden Index, YI=(TPR + TNR) / 2 = (TPR-FPR+1)/2, and a value of the test where Y is maximum
- Using a point on the ROC where the value is results in Q*, where Q* = FPR where TPR=1-FPR
For epidemiological screening or early clinical alerts. This prioritize TPR over TNR, so that a cut off value where TPR is x2 or x3 that of TNR
For decision to take action, particularly if the action involves risks or costs. This prioritize TNR over TPR, so that a cut off value where TNR is x2 or x3 that of TPR

Interpreting and Using Results of the ROC

Using the first 2 columns of the default example data from the Receiver Operator Characteristics (ROC) Program Page , using maternal height to predict the need for Caesarean Section, the results are interpreted as follows.

Ht(cms)	TPR	FPR	FNR	TNR	YI	LR+	LR-
160.0	0.96	0.84	0.04	0.16	0.12	1.14	0.25
159.5	0.96	0.76	0.04	0.24	0.2	1.26	0.17
159.0	0.96	0.68	0.04	0.32	0.28	1.41	0.13
158.5	0.92	0.48	0.08	0.52	0.44	1.92	0.15
157.5	0.88	0.4	0.12	0.6	0.48	2.20	0.20
157.0	0.88	0.36	0.12	0.64	0.52	2.44	0.19
156.5	0.8	0.36	0.2	0.64	0.44	2.22	0.31
156.0	0.68	0.24	0.32	0.76	0.44	2.83	0.42
155.5	0.6	0.2	0.4	0.8	0.4	3.00	0.50
154.0	0.48	0.12	0.52	0.88	0.36	4.00	0.59
153.5	0.4	0.04	0.6	0.96	0.36	10.00	0.63
153.0	0.36	0.04	0.64	0.96	0.32	9.00	0.67

The ROC value θ = 0.81, Standard Error = 0.06, 95% Confidence interval = 0.70 to 0.93. As the 95% confidence interval does not overlap the null value of 0.5, maternal height can be concluded as a significant predictor of the need for Caesarean Section.
The parameters are as shown in the table to the right, from which the following decisions can be made
- The maximum Youden Index is 0.52, where maternal height is 157cms
- Q* is when TPR = TPR = 0.75 (approximately), where maternal height = 156.2cms
- Approximately therefore, the most accurately cut off value to predict the need for Caesarean Section is a maternal height of 156cms to 157cms.
- At a maternal height of 159cms, the TPR is 0.96, which is x3 that if TNR(1-FPR) of 0.32. This can be take as a cut off for alert. For example, that junior staff should be required to consult someone more experienced to make clinical decisions when maternal height is less than 159cms
- At 153cms, the TPR is 0.36, against a TNR (1-FPR) of 0.96, so that TNR is roughly x3 that of TPR. This can be taken as a cut off level for action, for example, to proceed to an elective Caesarean Section.
The table to the right also provides the Likelihood Ratios that can be used to calculate Bayesian post-test probabilities from pre-test probabilities under differing clinical situations. The procedures for doing so are discussed in the Prediction Statistics Explanation Page .

Unpaired Comparison of Two ROCs

Unpaired comparisons of ROCs are used to compare the predictive quality of a test under different conditiopns on in different populations. An example is comparing the ROC using maternal height to predict Caesarean Section in nullipara and multipara, or between women from urban and rural communities.

The comparison uses the two ROCs and their Standard Errors, where

Difference = θ₁ - θ₁
SE_Diff = sqrt(SE₁² + SE₂²)
95% Confidence Interval of the Difference = Diff ±zSE_Diff, where z=1.65 for the one tail model, and 1.96 for the two tail model.
Sample size discussions and tables can be found in the Sample Size for Receiver Operator Characteristics (ROC) Explained and Tables Page

Paired ROC comparisons allow the comparisons between different tests for the same outcome, by administering all the tests in the same individuals. Such a comparison is powerful in that intra-subject comparisons are made, reducing the influence of between subject variations. The procedure used is as described by Delong et.al., (see references). This algorithm is particularly attractive, in that it is nonparametric, so allowing for the comparison between measurements of different distributions, providing that they are at least ordinal.

H	L	H	H
1	147.5	30	27
1	147.5	28	25
1	149.0	25	25
1	150.0	36	29
1	150.5	31	24
1	151.0	34	26
1	151.5	21	22
0	152.5	37	22
1	152.5	23	29
1	152.5	30	23
0	153.5	35	23
0	153.5	39	28
1	153.0	35	25
0	154.0	32	26
0	154.0	25	24
1	153.5	30	32
1	153.5	28	24
1	154.0	27	27
1	154.0	37	28
1	154.0	35	30
0	155.5	23	24
0	156.0	34	24
0	156.0	28	24
0	156.0	28	23
1	155.5	27	28
1	155.5	30	30
1	156.0	23	29
1	156.0	28	31
1	156.0	36	29
0	157.0	29	23
1	156.5	21	25
1	156.5	24	31
0	157.5	25	24
0	157.5	21	27
1	157.5	39	25
0	158.5	29	27
0	158.5	30	24
0	158.5	35	26
0	158.5	30	23
0	158.5	27	27
0	159.0	27	23
0	159.0	31	20
1	158.5	37	24
0	159.5	33	21
0	159.5	33	27
1	160.0	31	23
0	161.5	31	29
0	161.5	32	26
0	162.0	25	24
0	162.5	29	24

The data used for demonstration in this panel is the default data from the Receiver Operator Characteristics (ROC) Program Page . In this exercise, we wish to compare 3 Tests to predict the need for Caesarean Section, these being maternal height (cms), maternal age (years), and maternal BMI. We have collected data from 25 mothers who delivered by Caesarean Section, and 25 delivered vaginally, the data are as in the table to the right.

Column 1 is the outcome, 0 vaginal delivery (Outcome Negative, O-) and 1 for Caesarean Section (Outcome Positive, O+). As the higher value 1 is for O+, H is placed in the first row of column 1
Column 2 is Test 1, maternal height in cms. As lower value (shorter) is used to predict O+, L is placed in the first row of column 2
Column 3 is Test 2, maternal age in years. As higher value (older) is used to predict O+, H is placed in the first row of column 3
Column 4 is Test 3, maternal Body Mass Index (BMI). As higher value (fatter) is used to predict O+, H is placed in the first row of column 4

Results

1	0.065	-0.004
0.065	1	0.069
-0.004	0.069	1

The table to the left shows correlation between the 3 tests, and these must be used to correct the differences between the 3 θs. After correction, the test for heterogeneity between the 3 θs is chi square=10.39, df=2,p=0.006. This is statistically highly significant, allowing an interpretation that the 3 θs are different to each other.

ROC	θ	SE_θ	95%CI
1. Height	0.81	0.06	0.70 to 0.93
2. Age	0.50	0.08	0.33 to 0.66
3. BMI	0.73	0.07	0.60 to 0.87

The 3 ROC curves are plotted as shown in the diagram to the right, maternal height in red, age in green, and BMI in green. The θ values are shown in the table to the left.

The 3 ROCs can also be repeatedly compared pairwise, and the results shown in the table below and to the left.

Diff=difference between the two θs
r = correlation between the two ROCs
SE_Diff = Standard Error of the difference, corrected by correlation
95%CI = 95% confidence interval of the difference, Diff±2.41SE_Diff.
There are 3 comparisons, so the Bonferroni's correction for p=0.05 is 0.05/3=0.017. For a two tail model, p=0.017/2=0.008, and z for probability of 0.008 is 2.41. This differs from a single comparison where the 95% confidence interval is Diff±1.96SE_Diff, where 1.96 is z for 0.025 (0.05/2)

θ₁	SE₁	θ₂	SE₂	r	Diff	SE_Diff	95%CI
0.81	0.06	0.50	0.08	0.065	0.032	0.10	0.079 to 0.561
0.81	0.06	0.73	0.07	-0.004	0.08	0.097	-0.154 to 0.314
0.50	0.08	0.73	0.07	0.069	-0.239	0.106	-0.495 to 0.017

From these results, it can be seen that maternal height is the best predictor, significantly better than maternal age.

Maternal BMI is second best, but it's θ is not significantly different to maternal height or age.

Hanley JA, McNeil BJ (1982) The meaning and use of the Area Under a Receiver Operating Characteristic (ROC) curve. Radiology 143:29-36

DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837-845