Detecting a small number of outliers from a set of data observations is always challenging. In this paper, we present an approach that exploits space transformation and uses spectral analysis in the newly transformed space for outlier detection. Unlike most existing techniques in the literature which rely on notions of distances or densities, this approach introduces a novel concept based on local quadratic entropy for evaluating the similarity of a data object with its neighbors. This information theoretic quantity is used to regularize the closeness amongst data instances and subsequently benefits the process of mapping data into a usually lower dimensional space. Outliers are then identified by spectral analysis of the eigenspace spanned by the set of leading eigenvectors derived from the mapping procedure. The proposed technique is purely data-driven and imposes no assumptions regarding the data distribution, making it particularly suitable for identification of outliers from irregular, non-convex shaped distributions and from data with diverse, varying densities.
Proceedings of the 2013 Siam International Conference on Data Mining, Sdm, 2013, p. 225-233
Main Research Area:
SIAM International Conference on Data MiningSIAM International Conference on Data Mining, 2013