Abstract
A major challenge in the world of Big Data is heterogeneity. This often results from the aggregation of smaller data sets into larger ones. Such aggregation creates heterogeneity because different experimenters typically make different design choices. Even when attempts are made at common designs, environmental or operator effects still often create heterogeneity. Thus motivates moving away from the classical conceptual model of Gaussian distributed data, in the direction of Gaussian mixtures. But classical mixture estimation methods are usually useless in Big Data contexts, because there are far too many parameters to efficiently estimate. Thus there is a strong need for statistical procedures which are robust against mixture distributions without the need for explicit estimation. Some early ideas in this important new direction are discussed.