StatTools : Meta-analysis for Predictions Explained

Links : Home Index (Subjects) Contact StatTools

Related Links:
Prediction Statistics Explanation Page
Meta-analysis for Predictions Program Page

Introduction Example References
This page explains meta-analysis of predictions that use binary Tests (Test Positive T+ and Test Negative T-) to predict binary Outcomes (Outcome Positive O+ and Outcome Negative O-), as calculated in the Meta-analysis for Predictions Program Page .

In predictive tests, a high True Positive Rate (TPR) is often attained at the expense of having also a high False Positive Rate (FPR). Published results of predictive tests therefore vary accordingly, some with high TPR and high FPR, others low TPR and low FPR. Meta-analysis is therefore important in integrating multiple published results to obtain an overview of relationships between a particular Test with a particular Outcome.

The model proposed by Moses and others (see reference) uses the concept of the Receiver Operator Characteristics (ROC), plotting the Sensitivity and Specificity of each study on the ROC chart, and develop statistical methods to fit a ROC curve over the data.

From each study, True Positive Rate (TPR) and False Positive Rate (FPR) are calculated as follows.

  • Data entry uses the numbers of True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN)
  • If there is any zero (0) values in the data, or if the second button "Do Meta-analysis Adding 0.5 to All Values in Data" is clicked, 0.5 is added to all values in the data. The reason for doing so is to ensure that there is no zero value in the data, which will crash the program with a division by zero error. Even when there is actually no zero value in the data, user may still choose to use this adjustment so that the results are comparable, as this adjustment is suggested by Moses et. al. (see reference), and thus the default in many algorithms.
  • True Positive Rate TPR = TP / (TP+FN), False Positive Rate FPR = FP / (FP+TN). The table with data, with or without 0.5 adjustment, and the calculated FPR and TPR, is then presented.
Curve Fitting TPR from FPR

As data points near the extremes are unstable, and for prediction purposes, the data that counts are those with TPR>=0.5 and FPR<=0.5, only these data points are used in curve fitting. The procedures are as follows.

  • Using the data from each study (i)
    • Logit(FPRi) ui = Log((FPRi / (1-FPRi)))
    • Logit(TPRi) vi = Log((TPRi / (1-TPRi)))
    • Log Odds Ratio Yi = ui-vi
    • Standard Error of Log Odds Ratio Xi = ui+vi
    A standard linear regression analysis is then carried out with Xi as independent variable x and Yi as dependent variable y, so that y = a + bx. In this formula
    • b represents changes to the Log Odds Ratio related to changes in Standard Error, a bias caused by different sample sizes in the data. If b significantly deviate from null (0), then the Log Odds Ratio (y) is unstable, and the results of curve fitting difficult to interpret. If b does not significantly deviate from null, then the results of curve fitting can be taken to be free from bias and confidently interpreted.
    • a represents the Log Odds Ratio when x, Standard Error = 0. Approximately, it represents mean Log Odds Ratio anyway if b is statistically not significantly different from null.
    • The curve fitting results between FPR and TPR are as follows
      • Mean value TPR = 1 / (1 + exp(-a / ( 1 - b)) * exp(log((1 - FPR) / FPR) * ((1 + b) / (1 - b))))
      • Given that b is supposed to be null (0), the term ((1 + b) / (1 - b)) is closed or equal to 1, and is therefore optional. This term is included in the program in the Meta-analysis for Predictions Program Page , so the results may be marginally different from algorithms calculated without this term.
      • For 95% confidence interval, a = a±1.96SEy, where SEy is the Standard Error of y
    As the resulting curve is a mathematically created entity, it is inappropriate to label it as a Receiver Operator Characteristic (ROC). The overall effect size, the result of the meta-analysis is represented by Q*, which is a point on the fitted curve where TPR=TNR(1-FPR). The formula also allows an estimate of the Standard Error and therefore the 95% confidence interval of Q*, in terms of the difference in TPR at Q*