Related Links:
Prediction Statistics Explanation Page
Receiver Operator Characteristics (ROC) Program Page
Sample Size for Receiver Operator Characteristics (ROC) Explained and Tables Page
Introduction
Unpaired ROC
Paired ROC
References
StatTools offers a suite of programs and explanations related to the statistics predictive tests. As all of
them contains a great deal of details, many aspects discussed elsewhere will not be repeated here. This page discusses
only Receiver Operator Characteristics (ROC), and supports the program in the Receiver Operator Characteristics (ROC) Program Page
.
Users are referred to the Prediction Statistics Explanation Page
and the Meta-analysis for Predictions Explanation Page
for
other aspects of prediction, and the Sample Size for Receiver Operator Characteristics (ROC) Explained and Tables Page
for sample size required in using the ROC.
Historical Aspects : During the Second World War, the British invented the RADAR which reflects a radio wave
against an object, and so were able to detect the German bombers coming over the channel. The early versions of RADAR
was imprecise, and the signal contains much electronic noise and difficult to interpret. If the alarm was raised with
a weak signal, there was a risk of false signal, wasting much fuel and energy in response. If the alarm was not raised until
a clear signal is obtained, the risk was that the fighters were scrambled too late and more bombs would be dropped on the
cities. In order to analyse the signal and make decisions on how to respond, the scientist developed a statistical method of
handling the signals, and named this the RADAR Receiver Operator Characteristics. The method became widely used in signal
analysis in many aspects of science and industry, and adopted in the analysis of tests and diagnosis in Medicine, under the
term Receiver Operator Characteristics (ROC), with the RADAR part removed.
Theoretical Considerations :
A scalar, preferably normally distributed measurement is used as a Test to predict a binary Outcome. We will
use the use of maternal height (Test) to predict the need for Caesarean Section to deliver the baby (Outcome). The relationship between Test and Outcome is best demonstrated as the diagram to the right.
- In a sample that was delivered without Caesarean Section (Outcome Negative O-, red), the normal distribution curve defines the
height we will use to make a binary decision whether it is Test Positive or Negative.
- Values of Test less than the decision line (shorter women, maroon) represents the number of False Positives (FP)
- Values of Test greater than the decision line (taller women, red), represents the True Negatives (TN)
- True Negative Rate TNR = TN / (TN+FP), and False Positive Rate FPR = FP / (TN+FP) = 1-TNR
- In a sample that was delivered by Caesarean Section (Outcome Positive O+, blue), the normal distribution curve defines the
height we will use to make a binary decision whether it is Test Positive or Negative.
- Value of Test less than the decision line (shorter women, blue) represents the number of True Positives (TP)
- Value of Test greater than the decision line (taller women, navy), represents the false negatives (FN)
- True Posive Rate TPR = TP / (TP+FN), and False Negative Rate FNR = FN / (TP+FN) = 1-TPR
If the value used to make the decision changes, in our example, if the maternal height we used to predict changes, then
TP,FP,FN, and TN also changed, and the TPR and FPR changed accordingly.
The changing relationship between FPR and TPR forms a curve, as shown in the diagram to the left. This curve is called the
Receiver Operator Characteristics Curve (ROC Curve), and the area under this curve is called the Area Under the ROC
usually represented by the abbreciation θ
In a perfect test, where the values of the test from Outcome Positives do not overlap those from Outcome Negatives, TPR
will increase from 0 to 1 while FPR remains at 0. TPR will then remain at 1 while FPR increases from 0 to 1. In other words,
the ROC curve hugs the left and top border, and the area under it θ=1
In a completely useless test, the values of measurement from Outcome Positives and Outcome Negatives overlap completely, so that
the TPR and FPR increase together. The ROC curve is therefore the diagonal, and the Area under it θ=0.5
In most cases, there is partial overlap of values, and the ROC curve looks pretty much like that in the diagram above and to
the left, with a value between 0.5 and 1
The null value for θ is therefore 0.5, a useful test has a 95% confidence interval of θ that is greater
than 0.5. Where θ<0.5, the test predicts the opposite of the intended outcome.
H | L |
1 | 147.5 |
1 | 147.5 |
1 | 149.0 |
1 | 150.0 |
1 | 150.5 |
1 | 151.0 |
1 | 151.5 |
0 | 152.5 |
1 | 152.5 |
1 | 152.5 |
0 | 153.5 |
0 | 153.5 |
1 | 153.0 |
0 | 154.0 |
0 | 154.0 |
1 | 153.5 |
1 | 153.5 |
1 | 154.0 |
1 | 154.0 |
1 | 154.0 |
0 | 155.5 |
0 | 156.0 |
0 | 156.0 |
0 | 156.0 |
1 | 155.5 |
1 | 155.5 |
1 | 156.0 |
1 | 156.0 |
1 | 156.0 |
0 | 157.0 |
1 | 156.5 |
1 | 156.5 |
0 | 157.5 |
0 | 157.5 |
1 | 157.5 |
0 | 158.5 |
0 | 158.5 |
0 | 158.5 |
0 | 158.5 |
0 | 158.5 |
0 | 159.0 |
0 | 159.0 |
1 | 158.5 |
0 | 159.5 |
0 | 159.5 |
1 | 160.0 |
0 | 161.5 |
0 | 161.5 |
0 | 162.0 |
0 | 162.5 |
The first two columns of the default example data from the Receiver Operator Characteristics (ROC) Program Page
will be used to demonstrate how
a single ROC is calculated and interpreted. Please note that the data is generated by the computer to demonstrate the method,
and not real clinical observations
Reference Data
We wish to use the maternal height as the Test to predict the need for Caesarean Section as the Outcome.
We reviewed our medical records and samples 25 cases that were delivered vaginally and 25 cases delivered by Caesarean Section,
and recorded the maternal height. The data is as shown in the table to the right.
Column 1 shows the outcome, with 0 represent vaginal delivery (Outcome Negative, O-) and 1 Caesarean Section (Outcome Positive,
O+). As the higher value (1) represent Outcome Positive (O+), the first row of column 1 is H
Column 2 shows the test values, maternal height in cms. As a reduced height (lower value) is used to predict Caesarean
Section (O+), the first row of column 2 is L for low values to be Test Positive.
Theoretical Discussions
The plot to the left shows the distribution of the data points. Those requiring Caesarean Section (O+) are on
the left and those delivered vaginally (O-) on the right.
If we were to arbitrarily draw a decision line at maternal height
of 155cms, as shown in the diagram, the 15 cases below that line on the left would be True Positives (TP=15), as they are
Test Positive and Outcome Positive. The 10 cases above the line on the left would be False Negatives (FN=10), as they are
Test Negative but Outcome Positive. The 5 cases below the line on the right would be False Positives (FP=5), as they are
Test Positive but Outcome Negative. The 20 cases above the line on the right would be True Positives (TP=20), as they are
Test Negative and Outcome Negative. The calculated derivatives, the True Positive Rate is then TPR = TP / (TP + FN) = 15 / 25
= 0.6, and the False Positive Rate FPR = FP / (FP + TN) = 5 / 25 = 0.2
If the decision line is moved upward towards higher values, the number of positives, both true and false would increase and
the number of negatives, both true and false, would decrease. If the decision line is moved downwards towards lower values,
the reverse follows. The TP, FP, FN, TN, and their calculated derivatives of TPR and FPR, therefore changes as the decision line
moves throughout the whole range of the test values. The relationships between TPR and FPR over the range of the values form
the Receiver Operator Characteristic Curve, as shown in the diagram to the left.
Using the ROC to set the decision cut off value
Looking at the diagram to the left, We can see that, if we were to set the cut off value for decision at 160cms, 24 out of 25
cases that had Caesarean Section would be True Positive, but 21 out of 25 cases delivered vaginally would become False Positive,
making True Positive Rate TPR = 24/25 = 0.96 and False Positive Rate FPR = 21/25 = 0.84.
At the other extreme of setting the cut off value at 153cms, 9 out of 25 are True Positives and 1 out of 25 False Negatives, making
the True Positive Rate TPR = 9/25 = 0.36 and the False Positive Rate FPR = 1/25 = 0.04
A clinician given this set of data may choose a cut off value anywhere between these extremes to suit his/her purposes. The common
approach is to set decision cut offs according to the purposes for using the test, and they are as follows
- The most efficient prediction point
- Using a measure of accuracy, the Youden Index, YI=(TPR + TNR) / 2 = (TPR-FPR+1)/2, and a value of the test where Y is maximum
- Using a point on the ROC where the value is results in Q*, where Q* = FPR where TPR=1-FPR
- For epidemiological screening or early clinical alerts. This prioritize TPR over TNR, so that a cut off value where TPR is x2
or x3 that of TNR
- For decision to take action, particularly if the action involves risks or costs. This prioritize TNR over TPR, so that a
cut off value where TNR is x2 or x3 that of TPR
Interpreting and Using Results of the ROC
Using the first 2 columns of the default example data from the Receiver Operator Characteristics (ROC) Program Page
, using maternal height to
predict the need for Caesarean Section, the results are interpreted as follows.
Ht(cms) | TPR | FPR | FNR | TNR | YI | LR+ | LR- |
160.0 | 0.96 | 0.84 | 0.04 | 0.16 | 0.12 | 1.14 | 0.25 |
159.5 | 0.96 | 0.76 | 0.04 | 0.24 | 0.2 | 1.26 | 0.17 |
159.0 | 0.96 | 0.68 | 0.04 | 0.32 | 0.28 | 1.41 | 0.13 |
158.5 | 0.92 | 0.48 | 0.08 | 0.52 | 0.44 | 1.92 | 0.15 |
157.5 | 0.88 | 0.4 | 0.12 | 0.6 | 0.48 | 2.20 | 0.20 |
157.0 | 0.88 | 0.36 | 0.12 | 0.64 | 0.52 | 2.44 | 0.19 |
156.5 | 0.8 | 0.36 | 0.2 | 0.64 | 0.44 | 2.22 | 0.31 |
156.0 | 0.68 | 0.24 | 0.32 | 0.76 | 0.44 | 2.83 | 0.42 |
155.5 | 0.6 | 0.2 | 0.4 | 0.8 | 0.4 | 3.00 | 0.50 |
154.0 | 0.48 | 0.12 | 0.52 | 0.88 | 0.36 | 4.00 | 0.59 |
153.5 | 0.4 | 0.04 | 0.6 | 0.96 | 0.36 | 10.00 | 0.63 |
153.0 | 0.36 | 0.04 | 0.64 | 0.96 | 0.32 | 9.00 | 0.67 |
- The ROC value θ = 0.81, Standard Error = 0.06, 95% Confidence interval = 0.70 to 0.93. As the 95% confidence interval
does not overlap the null value of 0.5, maternal height can be concluded as a significant predictor of the need for Caesarean
Section.
- The parameters are as shown in the table to the right, from which the following decisions can be made
- The maximum Youden Index is 0.52, where maternal height is 157cms
- Q* is when TPR = TPR = 0.75 (approximately), where maternal height = 156.2cms
- Approximately therefore, the most accurately cut off value to predict the need for Caesarean Section is a maternal height
of 156cms to 157cms.
- At a maternal height of 159cms, the TPR is 0.96, which is x3 that if TNR(1-FPR) of 0.32. This can be take as a cut off
for alert. For example, that junior staff should be required to consult someone more experienced to make clinical
decisions when maternal height is less than 159cms
- At 153cms, the TPR is 0.36, against a TNR (1-FPR) of 0.96, so that TNR is roughly x3 that of TPR. This can be taken as a
cut off level for action, for example, to proceed to an elective Caesarean Section.
- The table to the right also provides the Likelihood Ratios that can be used to calculate Bayesian post-test probabilities from
pre-test probabilities under differing clinical situations. The procedures for doing so are discussed in the
Prediction Statistics Explanation Page
.
Unpaired Comparison of Two ROCs
Unpaired comparisons of ROCs are used to compare the predictive quality of a test under different conditiopns on in different
populations. An example is comparing the ROC using maternal height to predict Caesarean Section in nullipara and multipara,
or between women from urban and rural communities.
The comparison uses the two ROCs and their Standard Errors, where
Paired ROC comparisons allow the comparisons between different tests for the same outcome, by administering all the tests in the
same individuals. Such a comparison is powerful in that intra-subject comparisons are made, reducing the influence of between subject
variations. The procedure used is as described by Delong et.al., (see references). This algorithm is particularly attractive, in
that it is nonparametric, so allowing for the comparison between measurements of different distributions, providing that they are
at least ordinal.
H | L | H | H |
1 | 147.5 | 30 | 27 |
1 | 147.5 | 28 | 25 |
1 | 149.0 | 25 | 25 |
1 | 150.0 | 36 | 29 |
1 | 150.5 | 31 | 24 |
1 | 151.0 | 34 | 26 |
1 | 151.5 | 21 | 22 |
0 | 152.5 | 37 | 22 |
1 | 152.5 | 23 | 29 |
1 | 152.5 | 30 | 23 |
0 | 153.5 | 35 | 23 |
0 | 153.5 | 39 | 28 |
1 | 153.0 | 35 | 25 |
0 | 154.0 | 32 | 26 |
0 | 154.0 | 25 | 24 |
1 | 153.5 | 30 | 32 |
1 | 153.5 | 28 | 24 |
1 | 154.0 | 27 | 27 |
1 | 154.0 | 37 | 28 |
1 | 154.0 | 35 | 30 |
0 | 155.5 | 23 | 24 |
0 | 156.0 | 34 | 24 |
0 | 156.0 | 28 | 24 |
0 | 156.0 | 28 | 23 |
1 | 155.5 | 27 | 28 |
1 | 155.5 | 30 | 30 |
1 | 156.0 | 23 | 29 |
1 | 156.0 | 28 | 31 |
1 | 156.0 | 36 | 29 |
0 | 157.0 | 29 | 23 |
1 | 156.5 | 21 | 25 |
1 | 156.5 | 24 | 31 |
0 | 157.5 | 25 | 24 |
0 | 157.5 | 21 | 27 |
1 | 157.5 | 39 | 25 |
0 | 158.5 | 29 | 27 |
0 | 158.5 | 30 | 24 |
0 | 158.5 | 35 | 26 |
0 | 158.5 | 30 | 23 |
0 | 158.5 | 27 | 27 |
0 | 159.0 | 27 | 23 |
0 | 159.0 | 31 | 20 |
1 | 158.5 | 37 | 24 |
0 | 159.5 | 33 | 21 |
0 | 159.5 | 33 | 27 |
1 | 160.0 | 31 | 23 |
0 | 161.5 | 31 | 29 |
0 | 161.5 | 32 | 26 |
0 | 162.0 | 25 | 24 |
0 | 162.5 | 29 | 24 |
The data used for demonstration in this panel is the default data from the Receiver Operator Characteristics (ROC) Program Page
. In this exercise,
we wish to compare 3 Tests to predict the need for Caesarean Section, these being maternal height (cms), maternal age (years),
and maternal BMI. We have collected data from 25 mothers who delivered by Caesarean Section, and 25 delivered vaginally, the
data are as in the table to the right.
- Column 1 is the outcome, 0 vaginal delivery (Outcome Negative, O-) and 1 for Caesarean Section (Outcome Positive, O+). As
the higher value 1 is for O+, H is placed in the first row of column 1
- Column 2 is Test 1, maternal height in cms. As lower value (shorter) is used to predict O+, L is placed in the first row
of column 2
- Column 3 is Test 2, maternal age in years. As higher value (older) is used to predict O+, H is placed in the first row
of column 3
- Column 4 is Test 3, maternal Body Mass Index (BMI). As higher value (fatter) is used to predict O+, H is placed in the
first row of column 4
Results
1 | 0.065 | -0.004 |
0.065 | 1 | 0.069 |
-0.004 | 0.069 | 1 |
The table to the left shows correlation between the 3 tests, and these must be used to correct the differences between the 3 θs.
After correction, the test for heterogeneity between the 3 θs is chi square=10.39, df=2,p=0.006. This is statistically highly
significant, allowing an interpretation that the 3 θs are different to each other.
ROC | θ | SEθ | 95%CI |
1. Height | 0.81 | 0.06 | 0.70 to 0.93 |
2. Age | 0.50 | 0.08 | 0.33 to 0.66 |
3. BMI | 0.73 | 0.07 | 0.60 to 0.87 |
The 3 ROC curves are plotted as shown in the diagram to the right, maternal height in red, age in green, and BMI in green. The
θ values are shown in the table to the left.
The 3 ROCs can also be repeatedly compared pairwise, and the results shown in the table below and to the left.
Diff=difference between the two θs
r = correlation between the two ROCs
SEDiff = Standard Error of the difference, corrected by correlation
95%CI = 95% confidence interval of the difference, Diff±2.41SEDiff.
There are 3 comparisons, so the Bonferroni's correction for p=0.05 is 0.05/3=0.017. For a two tail model, p=0.017/2=0.008,
and z for probability of 0.008 is 2.41. This differs from a single comparison where the 95% confidence interval is
Diff±1.96SEDiff, where 1.96 is z for 0.025 (0.05/2)
θ1 | SE1 | θ2 | SE2 |
r | Diff | SEDiff | 95%CI |
0.81 | 0.06 | 0.50 | 0.08 | 0.065 | 0.032 | 0.10 | 0.079 to 0.561 |
0.81 | 0.06 | 0.73 | 0.07 | -0.004 | 0.08 | 0.097 | -0.154 to 0.314 |
0.50 | 0.08 | 0.73 | 0.07 | 0.069 | -0.239 | 0.106 | -0.495 to 0.017 |
From these results, it can be seen that maternal height is the best predictor, significantly better than maternal age.
Maternal BMI is second best, but it's θ is not significantly different to maternal height or age.
Hanley JA, McNeil BJ (1982) The meaning and use of the Area Under a Receiver
Operating Characteristic (ROC) curve. Radiology 143:29-36
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two
or more correlated receiver operating characteristic curves: a nonparametric
approach. Biometrics 44:837-845
|