Sankhya: The Indian Journal of Statistics

1998, Volume 60, Series B, Pt. 1, 161-175

SAMPLE SIZE DETERMINATION USING POSTERIOR PREDICTIVE DISTRIBUTIONS

By

DONALD B. RUBIN, *Harvard University, Cambridge*

and

HAL S. STERN, *Iowa State University, Ames*

*SUMMARY.* A statistical model developed from scientific theory may ``fail to fit''
the available data if the scientific theory is incorrect or if the
sample size is too small. The former point is obvious but the latter is more
subtle. In the latter case, the hypothesized model may fail to fit in the
sense that it is viewed as unnecessarily complicated, and so
the investigators settle upon a simpler model that ignores structure
hypothesized by scientific theory.
We describe a simulation-based approach
for determining the sample size that would be required for distinguishing
between the simpler model and the hypothesized model assuming the latter
is correct. Data are
simulated assuming the hypothesized model is correct and compared
to posterior predictive replications of the data, which are drawn assuming the
simpler model is correct. This is repeated for a number of sample sizes.
The Bayesian approach offers two especially nice features for addressing
a problem of this type:
first, we can average over a variety of plausible values for the parameters
of the hypothesized model rather than fixing a single alternative;
second, the approach does not require that we restrict attention to
a limited class of regular models (e.g., *t*-tests or linear models).
The posterior predictive approach to sample
size determination is illustrated using an
application of finite mixture models to psychological data.

*AMS (1991) subject classification. *62F15, 62K99, 62P15.

*Key words and phrases. *Bayesian inference, finite mixture model, power calculation,
Markov chain Monte Carlo, study design.