Sankhya: The Indian Journal of Statistics
2004, Volume 66, Pt. 4, 756-778
Bayesian Methods for Variable Selection in Survival Models with Application to DNA Microarray Data
Kyeong Eun Lee and Bani K. Mallick, Texas A \& M University, College Station, USA
SUMMARY . Selection of significant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This paper considers hierarchical Bayesian gene selection model for survival data. In survival analysis the popular models are usually well suited for data with few covariates and many observations subjects). In contrast for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates $p$ exceeds the number of samples $n$. For a given vector of response values which are times to event (death or censored times) and $p$ gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when $n<<p$. In our approach, rather than fixing the number of selected genes, we assign a prior distribution to this number. That way it creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.
AMS (1991) subject classification. Primary 62F15, 62N99; Secondary 62P10 ;
Key words and phrases. Bayesian model, gene expression, proportional hazard regression, variable selection, Weibull regression