Title: New Asymptotic Results in Principal Component Analysis

Author(s): Vladimir Koltchinskii and Karim Lounici
Issue: Volume 79 Series A Part 2 Year 2017
Pages: 254 -- 297
Let $X$ be a mean zero Gaussian random vector in a separable Hilbert space $\mathbb{H}$ with covariance operator $\Sigma := \mathbb{E}(X \otimes X)$. Let $\Sigma = \sum_{r \ge 1} \mu_r P_r$ be the spectral decomposition of $\Sigma$ with distinct eigenvalues $\mu_1 > \mu_2 > \ldots $ and the corresponding spectral projectors $P_1, P_2, \ldots$. Given a sample $X_1, \ldots, X_n$ of size $n$ of i.i.d. copies of $X$, the sample covariance operator is defined as $\hat{Σ}_n := n^{-1} \sum_{j=1}^n X_j \otimes X_j$. The main goal of principal component analysis is to estimate spectral projectors $P_1, P_2, \ldots $ by their empirical counterparts $\hat{P}_1, \hat{P}_2, \ldots $ properly defined in terms of spectral decomposition of the sample covariance operator $\hat{\Sigma}_n$. The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic $\| \hat{P}_r - P_r \|_2^2$, where $\| \cdot \|_2^2$ is the squared Hilbert-Schmidt norm. This is done in a “high-complexity” asymptotic framework in which the so called effective rank ${\bf r}(\Sigma) := \frac{\mbox{tr}(\Sigma)}{\| \Sigma \|_{\infty}}$ \ (tr($\cdot$) being the trace and $\| \cdot \|_{\infty}$ being the operator norm) of the true covariance $\Sigma$ is becoming large simultaneously with the sample size $n$, but ${\bf r}(\Sigma) = o(n)$ as $n \to \infty$. In this setting, we prove that, in the case of one-dimensional spectral projector $P_r$, the properly centered and normalized statistic $\| \hat{P}_r - P_r \|_2^2$ with {\em data-dependent} centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.
AMS (2000) subject classification . Primary 62H25, 62H12; Secondary 60B20, 60G1.
Keywords and phrases: Sample covariance, Spectral projectors, Effective rank, Principal component analysis, Asymptotic distribution, Perturbation theory.