Sankhya: The Indian Journal of Statistics

1998, Volume 60, Series B, Pt. 1, 161-175

SAMPLE SIZE DETERMINATION USING POSTERIOR PREDICTIVE DISTRIBUTIONS

By

DONALD B. RUBIN, Harvard University, Cambridge
and
HAL S. STERN, Iowa State University, Ames

SUMMARY. A statistical model developed from scientific theory may ``fail to fit'' the available data if the scientific theory is incorrect or if the sample size is too small. The former point is obvious but the latter is more subtle. In the latter case, the hypothesized model may fail to fit in the sense that it is viewed as unnecessarily complicated, and so the investigators settle upon a simpler model that ignores structure hypothesized by scientific theory. We describe a simulation-based approach for determining the sample size that would be required for distinguishing between the simpler model and the hypothesized model assuming the latter is correct. Data are simulated assuming the hypothesized model is correct and compared to posterior predictive replications of the data, which are drawn assuming the simpler model is correct. This is repeated for a number of sample sizes. The Bayesian approach offers two especially nice features for addressing a problem of this type: first, we can average over a variety of plausible values for the parameters of the hypothesized model rather than fixing a single alternative; second, the approach does not require that we restrict attention to a limited class of regular models (e.g., t-tests or linear models). The posterior predictive approach to sample size determination is illustrated using an application of finite mixture models to psychological data.

AMS (1991) subject classification. 62F15, 62K99, 62P15.

Key words and phrases. Bayesian inference, finite mixture model, power calculation, Markov chain Monte Carlo, study design.

Full paper (PDF)

This article in Mathematical Reviews