Abstract
In a thought-provoking paper, Efron (2011) investigated the merit and limitation of an empirical Bayes method to correct selection bias based on Tweedie's formula first reported by Robbins (1956). The virtue of Tweedie's formula lies in its representation of selection bias as a simple function of the derivative of marginal log likelihood. Since the marginal likelihood and its derivative can be estimated from the data directly without invoking prior information, bias correction can be carried out conveniently. We propose a Bayesian hierarchical model for chi-squared data such that the resulting Tweedie's formula has the same virtue. Because noncentral chi-squared distributions, the common alternative distributions for chi-squared tests, does not constitute an exponential family, our results cannot be obtained by extending existing results. Furthermore, the corresponding Tweedie's formula manifests new phenomena quite different from those of the normal data and suggests new ways to analyze chi-squared data. We also discuss two real data applications: difference in gene expression among ethnic groups and gene expression profiling for prediction of breast cancer metastasis. This is a joint work with Lilun Du.