Abstract
We revisit the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point of the manifold, it moves in the direction of the highest curvature in the space spanned by the eigenvectors of the local tangent space PCA. Compared to the recent work in the case where the sub-manifold is of dimension one (Panaretos et al. 2014), essentially a curve lying on the manifold attempting to capture the one-dimensional variation, the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture the higher dimensional variation in the data. We show the principal sub-manifold yields the usual principal components in Euclidean space. By means of examples, we illustrate that how to find, use and interpret principal sub-manifold with an extension of using it in shape analysis. (This is a joint work with Tung Pham)