meta prediction exp

This page explains meta-analysis of predictions that use binary Tests (Test Positive T+ and Test Negative T-) to predict binary Outcomes (Outcome Positive O+ and Outcome Negative O-), as calculated in the Meta-analysis for Predictions Program Page .

In predictive tests, a high True Positive Rate (TPR) is often attained at the expense of having also a high False Positive Rate (FPR). Published results of predictive tests therefore vary accordingly, some with high TPR and high FPR, others low TPR and low FPR. Meta-analysis is therefore important in integrating multiple published results to obtain an overview of relationships between a particular Test with a particular Outcome.

The model proposed by Moses and others (see reference) uses the concept of the Receiver Operator Characteristics (ROC), plotting the Sensitivity and Specificity of each study on the ROC chart, and develop statistical methods to fit a ROC curve over the data.

From each study, True Positive Rate (TPR) and False Positive Rate (FPR) are calculated as follows.

Data entry uses the numbers of True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN)
If there is any zero (0) values in the data, or if the second button "Do Meta-analysis Adding 0.5 to All Values in Data" is clicked, 0.5 is added to all values in the data. The reason for doing so is to ensure that there is no zero value in the data, which will crash the program with a division by zero error. Even when there is actually no zero value in the data, user may still choose to use this adjustment so that the results are comparable, as this adjustment is suggested by Moses et. al. (see reference), and thus the default in many algorithms.
True Positive Rate TPR = TP / (TP+FN), False Positive Rate FPR = FP / (FP+TN). The table with data, with or without 0.5 adjustment, and the calculated FPR and TPR, is then presented.

Curve Fitting TPR from FPR

As data points near the extremes are unstable, and for prediction purposes, the data that counts are those with TPR>=0.5 and FPR<=0.5, only these data points are used in curve fitting. The procedures are as follows.

Using the data from each study (i)
- Logit(FPR_i) u_i = Log((FPR_i / (1-FPR_i)))
- Logit(TPR_i) v_i = Log((TPR_i / (1-TPR_i)))
- Log Odds Ratio Y_i = u_i-v_i
- Standard Error of Log Odds Ratio X_i = u_i+v_i
A standard linear regression analysis is then carried out with X_i as independent variable x and Y_i as dependent variable y, so that y = a + bx. In this formula
- b represents changes to the Log Odds Ratio related to changes in Standard Error, a bias caused by different sample sizes in the data. If b significantly deviate from null (0), then the Log Odds Ratio (y) is unstable, and the results of curve fitting difficult to interpret. If b does not significantly deviate from null, then the results of curve fitting can be taken to be free from bias and confidently interpreted.
- a represents the Log Odds Ratio when x, Standard Error = 0. Approximately, it represents mean Log Odds Ratio anyway if b is statistically not significantly different from null.
- The curve fitting results between FPR and TPR are as follows
  - Mean value TPR = 1 / (1 + exp(-a / ( 1 - b)) * exp(log((1 - FPR) / FPR) * ((1 + b) / (1 - b))))
  - Given that b is supposed to be null (0), the term ((1 + b) / (1 - b)) is closed or equal to 1, and is therefore optional. This term is included in the program in the Meta-analysis for Predictions Program Page , so the results may be marginally different from algorithms calculated without this term.
  - For 95% confidence interval, a = a±1.96SE_y, where SE_y is the Standard Error of y
As the resulting curve is a mathematically created entity, it is inappropriate to label it as a Receiver Operator Characteristic (ROC). The overall effect size, the result of the meta-analysis is represented by Q*, which is a point on the fitted curve where TPR=TNR(1-FPR). The formula also allows an estimate of the Standard Error and therefore the 95% confidence interval of Q*, in terms of the difference in TPR at Q*

TP	FP	FN	TN
23	15	0	12
18	10	0	6
16	4	3	14
10	6	7	7
15	5	6	15
13	7	2	24
15	2	5	14
7	2	3	24
20	8	2	10
15	4	2	11
25	7	5	9

This panel explains the results obtained from the example data used in the Meta-analysis for Predictions Program Page , as shown in the table to the right. Please note the data is computer generated to demonstrate the procedure, and they do not represent any real observations.

TP	FP	FN	TN	FPR	TPR
23.5	15.5	0.5	12.5	0.5536	0.9792
18.5	10.5	0.5	6.5	0.6176	0.9737
16.5	4.5	3.5	14.5	0.2368	0.825
10.5	6.5	7.5	7.5	0.4643	0.5833
15.5	5.5	6.5	15.5	0.2619	0.7045
13.5	7.5	2.5	24.5	0.2344	0.8438
15.5	2.5	5.5	14.5	0.1471	0.7381
7.5	2.5	3.5	24.5	0.0926	0.6818
20.5	8.5	2.5	10.5	0.4474	0.8913
15.5	4.5	2.5	11.5	0.2813	0.8611
25.5	7.5	5.5	9.5	0.4412	0.8226

0.5 is added to all items to avoid a zero value, which will cause the program to crash. The actual data used is therefore as shown in the table to the left. Included also are the False and True Positive Rates (FPR, TPR) calculated.

The studies where FPR<=0.5 and TPR>=0.5, 9 out of the 11 studies, are then used for curve fitting to produce the Summary ROC, with the following results.

The Mean Log Odds Ratio, calculated without regression
    - Mean = 2.2958
    - Standard Deviation = 0.8113
    - Standard Error = 0.2868

Linear regression y=(2.3519) + (-0.1838)x
    - a = 2.3519, Regressed Mean Log Odds Ratio
    - b = 0.2868, t = -0.6359 p = 0.6394, not statistically significant
    - As b is not statistically significant, we can rule out bias caused by Standard Errors.
    - Please note that, without bias, the actual and regressed mean Log Odds Ratio are very similar

FPR	TPR
	Mean	95%CI		Q*	Q*CI	Data
0.01	0.2348	0.1603	0.3304
0.02	0.3326	0.2366	0.4448
0.03	0.3989	0.2922	0.5162
0.04	0.4491	0.3364	0.5672
0.05	0.4892	0.3733	0.6062
0.06	0.5224	0.4049	0.6375
0.07	0.5506	0.4325	0.6633
0.08	0.5751	0.4571	0.6852
0.09	0.5967	0.4792	0.704
0.1	0.6158	0.4992	0.7204
0.11	0.633	0.5176	0.735
0.12	0.6486	0.5345	0.748
0.13	0.6629	0.5501	0.7597
0.14	0.6759	0.5647	0.7703
0.15	0.688	0.5783	0.78
0.16	0.6992	0.5911	0.7889
0.17	0.7096	0.6031	0.7971
0.18	0.7194	0.6145	0.8047
0.19	0.7285	0.6253	0.8118
0.2	0.7371	0.6355	0.8184
0.21	0.7452	0.6453	0.8246
0.22	0.7529	0.6546	0.8305
0.23	0.7602	0.6635	0.836
0.24	0.7671	0.672	0.8412
0.25	0.7737	0.6801	0.8461
0.26	0.78	0.688	0.8507
0.27	0.786	0.6955	0.8552
0.28	0.7918	0.7028	0.8594
0.29	0.7973	0.7098	0.8635
0.3	0.8026	0.7166	0.8673
0.31	0.8077	0.7232	0.871
0.32	0.8126	0.7295	0.8746
0.33	0.8173	0.7357	0.878
0.34	0.8219	0.7416	0.8812
0.35	0.8263	0.7474	0.8844
0.36	0.8306	0.7531	0.8874
0.37	0.8348	0.7586	0.8904
0.38	0.8388	0.7639	0.8932
0.39	0.8427	0.7691	0.896
0.4	0.8465	0.7742	0.8986
0.41	0.8501	0.7792	0.9012
0.42	0.8537	0.784	0.9037
0.43	0.8572	0.7888	0.9061
0.44	0.8606	0.7934	0.9085
0.45	0.8639	0.7979	0.9108
0.46	0.8672	0.8024	0.913
0.47	0.8703	0.8067	0.9152
0.48	0.8734	0.811	0.9173
0.49	0.8764	0.8152	0.9194
0.5	0.8794	0.8193	0.9214
0.51	0.8823	0.8234	0.9234
0.52	0.8851	0.8274	0.9253
0.53	0.8879	0.8313	0.9272
0.54	0.8906	0.8351	0.929
0.55	0.8933	0.8389	0.9309
0.56	0.8959	0.8427	0.9326
0.57	0.8985	0.8463	0.9344
0.58	0.9011	0.85	0.9361
0.59	0.9036	0.8536	0.9378
0.6	0.906	0.8571	0.9394
0.61	0.9085	0.8606	0.941
0.62	0.9109	0.8641	0.9426
0.63	0.9132	0.8675	0.9442
0.64	0.9156	0.8709	0.9457
0.65	0.9179	0.8742	0.9473
0.66	0.9201	0.8775	0.9488
0.67	0.9224	0.8808	0.9503
0.68	0.9246	0.8841	0.9517
0.69	0.9268	0.8873	0.9532
0.7	0.929	0.8905	0.9546
0.71	0.9311	0.8937	0.956
0.72	0.9333	0.8969	0.9574
0.73	0.9354	0.9	0.9588
0.74	0.9375	0.9032	0.9602
0.75	0.9396	0.9063	0.9615
0.76	0.9417	0.9094	0.9629
0.77	0.9437	0.9125	0.9642
0.78	0.9458	0.9156	0.9656
0.79	0.9479	0.9187	0.9669
0.8	0.9499	0.9218	0.9682
0.81	0.952	0.9249	0.9696
0.82	0.954	0.9281	0.9709
0.83	0.9561	0.9312	0.9722
0.84	0.9581	0.9343	0.9735
0.85	0.9602	0.9375	0.9749
0.86	0.9622	0.9407	0.9762
0.87	0.9643	0.9439	0.9775
0.88	0.9664	0.9471	0.9789
0.89	0.9686	0.9504	0.9802
0.9	0.9707	0.9538	0.9816
0.91	0.9729	0.9572	0.983
0.92	0.9752	0.9607	0.9844
0.93	0.9775	0.9643	0.9859
0.94	0.9798	0.968	0.9874
0.95	0.9823	0.9719	0.9889
0.96	0.9849	0.9759	0.9906
0.97	0.9877	0.9803	0.9923
0.98	0.9907	0.9852	0.9942
0.99	0.9943	0.9908	0.9964
0.2358				0.7642
0.2358					0.7136
0.2358					0.8149
0.5536						0.9792
0.6176						0.9737
0.2368						0.825
0.4643						0.5833
0.2619						0.7045
0.2344						0.8438
0.1471						0.7381
0.0926						0.6818
0.4474						0.8913
0.2813						0.8611
0.4412						0.8226

Curve Fit formula
    - TPR = 1 / (1 + exp(-a / ( 1 - b)) * exp(log((1 - FPR) / FPR) * ((1 + b) / (1 - b))))
    - TPR = 1 / (1 + exp(-2.3519 / ( 1 - -0.1838)) * exp(log((1 - FPR) / FPR) * ((1 + -0.1838) / (1 - -0.1838))))
    - TPR = 1 / (1 + 0.1371 * exp(log((1 - FPR) / FPR) * 0.6895))
    - For the 95% confidence interval, a is replaced with a±SE_y (2.3519±0.2868), 0.1371 replaced with 0.2205 and 0.0853

Main results
    - Q* is the value where True Positive and Negative rates are equal (TPR=TNR, TPR=1-FPR)
    - In this example TPR=0.7642, and FPR=1-0.7642=0.2358
    - The Standard Error of Q* SEQ* = 0.0258 95%CI = 0.7136 to 0.8149.
    - In other words, at FPR=0.2358, 95% CI of TPR is 0.7136 to 0.8149

Plotting Coordinates, as shown in the table to the right. The program calculates the coordinates for lines and data points in a table in such a format as to allow the whole table to be imported into Excel to produce the Summary ROC Plot. The table includes the False Positive Rate (x) and the True Positive Rate (y) coordinates for the Summary ROC curve and its 95% Confidence Interval. These are tabulated with 0.01 intervals

The table continues with the coordinate for Q* and its 95% confidence intervals

Finally, the coordinates for all the data points from the input data.

The program then creates a bitmap of all the data and results, using the MacroPlot algorithm, as shown to the left. The bitmap contains the following elements

The plot has False Positive Rate (FPR) as the x axis, and True Positive Rate (TPR) as the y axis
The diagonal line marks the right lower triangle as the null value of 0.5
The upper left quarter square marks the area where significant relationship between Test and Outcome interacts. The data points within this area are used to carry out the curve fitting for the Summary ROC
The solid round dots representing the coordinates of the input data
The solid diamond and vertical line representing the Effect Q* and its 95% confidence interval
The 3 curve lines representing the 3 Summary ROC lines, mean, and 95% confidence interval.

The Area under the Summary ROC (θ), mean and its 95% Confidence Intervals

Moses et.al. (see reference) provide no algorithm for estimating the area. In fact the authors suggest that calculating θ may be inappropriate, given different sample size and variances from the different studies, and that estimating the Summary curve uses only a subset of the input data (those with TPR>=0.5 and FPR<0.5).

The program from Meta-analysis for Predictions Program Page nevertheless calculated the areas under the 3 estimated Summary ROCs (mean and the 95% Confidence Intervals) by summing up the area under each of the 100 slices with 0.1 intervals of the FPR. The algorithm used is the same as that in the Area Under Coordinates Program Page . In this example, θ has a mean of 0.8297, and 95% CI of 0.7677 to 0.8787

Moses LE, Shapiro D, and Littenberg B(1993) Combining independent studies of a diagnostic test into a Summary ROC Curve: Data analysis approaches and some additional considerations. Statistics in Medicine 12:p1293-1316.