Sankhya: The Indian Journal of Statistics

2004, Volume 66, Pt. 4,  756-778

Bayesian Methods for Variable Selection in Survival Models with Application to DNA Microarray Data

By

Kyeong Eun Lee and Bani K. Mallick, Texas A \& M University, College Station, USA

SUMMARY . Selection of significant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This paper  considers hierarchical Bayesian gene selection model for  survival data. In survival analysis the popular  models are usually well suited for data with few covariates and many observations  subjects). In contrast for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates $p$ exceeds the number of samples $n$. For a given vector of response values which are times to event (death or censored times) and $p$ gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when $n<<p$. In our approach,  rather than fixing the number of selected genes, we assign a prior distribution to this number. That way it creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo  MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and  Breast Carcinomas data.

 AMS (1991) subject classification. Primary 62F15, 62N99; Secondary 62P10 ;

Key words and phrases. Bayesian model, gene expression, proportional hazard regression, variable selection, Weibull regression

 Full paper (PDF)