Approximate calculation of Tukey's depth and median with high-dimensional data

Authors

  • Milica Bogićević School of Electrical Engineering, Belgrade
  • Milan Merkle School of Electrical Engineering, Belgrade

DOI:

https://doi.org/10.2298/YJOR180520022B

Keywords:

big data, multivariate medians, depth functions, computing Tukey's depth

Abstract

We present a new fast approximate algorithm for Tukey (halfspace) depth level sets and its implementation-ABCDepth. Given a d-dimensional data set for any d ≥ 1, the algorithm is based on a representation of level sets as intersections of balls in Rd. Our approach does not need calculations of projections of sample points to directions. This novel idea enables calculations of approximate level sets in very high dimensions with complexity that is linear in d, which provides a great advantage over all other approximate algorithms. Using different versions of this algorithm, we demonstrate approximate calculations of the deepest set of points ("Tukey median") and Tukey's depth of a sample point or out-of-sample point, all with a linear in d complexity. An additional theoretical advantage of this approach is that the data points are not assumed to be in "general position". Examples with real and synthetic data show that the executing time of the algorithm in all mentioned versions in high dimensions is much smaller than the time of other implemented algorithms. Also, our algorithms can be used with thousands of multidimensional observations.

References

Ahn, H.-K., Knauer, C., Scherfenberg, M., Schlipf, L., and Vigneron, A., Computing the discrete Frechet distance with imprecise input, in: Algorithms and Computation, ISAAC 2010, Lecture Notes in Computer Science, Ieju Island, Korea, (6507)(2016) 422-433.

Bogicevic, M., and Merkle, M., Multivariate medians and halfspace depth: algorithms and implementation, in: Proc. 1st International Conference on Electrical, Electronic and Computing Engineering (IcETRAN 2014), Vrnjacka Banja, Serbia, (1)(2014) 27-33. http://milanmerkle.etf.rs/wp-content/uploads/2016/11/Bogicevic-Merkle-2014.pdf.

Bogicevic, M., and Merkle, M., Data centrality computation: implementation and complexity calculation, in: Proc. 2nd International Conference on Electrical, Electronic and Computing Engineering (IcETRAN 2015), Srebrno Jezero, Serbia, (1)(2015) 23-29. http://milanmerkle.etf.rs/wp-content/uploads/2016/11/Bogicevic-Merkle-2015.pdf.

Bogicevic, M., and Merkle, M., ABCDepth: Efficient algorithm for Tukey depth, in: 9th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2016), 2016. http://cmstatistics.org/RegistrationsV2/CMStatistics2016/viewSubmission.php?in=1774&token=p2q8o69709n312568rs4p4n4spn97n57.

Bogicevic, M., and Merkle, M., arXiv:1603.05609, 2016. ABCDepth: efficient algorithm for Tukey depth.

Chan, T. M., An optimal randomized algorithm for maximum Tukey depth, in: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, (2004) 430-436.

Chen, D., Morin, P., and Wagner, U., Absolute approximation of Tukey depth: theory and experiments, Comput. Geom., (46) (2013) 566-573.

Ding, B., and Konig, A. C., A fast set intersection in memory, Proceedings of the VLDB Endowment, (4)(2011) 255-266.

Donoho, D. L., Breakdown properties of multivariate location estimators, Harvard University, Cambridge, Massachusetts, US, 1982.

Donoho, D. L., and Gasko, M., Multivariate generalizations of the median and trimmed mean, I, Technical report 133, Department of Statistics, University of California, Berkeley, December, 1987.

Donoho, D. L., and Gasko, M., Breakdown properties of location estimates based on halfspace depth and projected outlyingness, Ann. Statist., (20) (1992) 1803-1827.

Dutta, S., Ghosh, A. K., and Chaudhuri, P., Some intriguing properties of Tukey's halfspace depth, Bernoulli, (17) (2011) 1420-1434.

Dyckerho, R., and Mozharovskyi, P., Exact computation of the halfspace depth, Computational Statistics and Data Analysis, (98) (2016) 19-30.

Genest, M., Jean-Claude, and Plante, J.-F., Package depth, 2012. https://cran.r-project.org/web/packages/depth/index.html.

Gray, J., Graphics for regression diagnostics, ASA Proc. Statistical Computing Section, (1985) 102-107.

Hoare, C. A. R., Algorithm 64: Quicksort, Comm. ACM, (4) (1961) 321.

Liu, X., and Zuo, Y., Computing halfspace depth and regression depth, Communications in Statistics Simulation and Computation, (43) (2014) 969-985.

Merkle, M., Jensen's inequality for medians, Stat. Prob. Letters, (71) (2005) 277-281.

Merkle, M., Jensen's inequality for multivariate medians, J. Math. Anal. Appl., (370) (2010) 258-269.

Mozharovskyi, P., Contributions to depth-based classification and computation of the Tukey depth, PhD thesis, Faculty of Economics and Social Sciences, University of Cologne, France, 2014.

Rousseeuw, P. J., and Hubert, M., Statistical depth meets computational geometry: a short survey, arXiv: 1508.03828, 2015.

Rousseeuw, P. J., and Leroy, A. M., Robust regression and outlier detection, Wiley, (1997) 57.

Rousseeuw, P. J., and Ruts, I., Bivariate location depth, Journal of the Royal Statistical Society, Series C (Applied Statistics), (45) (1996) 516-526.

Rousseeuw, P. J., and Ruts, I., Constructing the bivariate Tukey median, Statistica Sinica, (8) (1998) 827-839.

Rousseeuw, P. J., and Ruts, I., The depth function of a population distribution, Metrika, (49) (1999) 213-244.

Rousseeuw, P. J., and Struyf, A., Computing location depth and regression depth in higher dimension, Statistics and Computing, (8) (1998) 193-203.

Ruts, I., and Rousseeuw, P. J., Computing depth contours of bivariate point clouds, Computational Statistics and Data Analysis, (23) (1996) 153-168.

Small, C. G., A survey of multidimensional medians, Internat. Statist. Inst. Rev., (58) (1990) 263-277.

Struyf, A., and Rousseeuw, P. J., High-dimensional computation of the deepest location, Comp. Statist. & Data Anal., (34)(2000) 415-426.

Tukey, J., Order statistics, In Mimeographed notes for Statistics 411, Princeton University, 1974.

Tukey, J., Mathematics and picturing data, in: Proc. International Congress of Mathematicians, Vancouver, 1974, (2) (1975) 523-531.

Zhou, Y., and Sering, R., Multivariate spatial U-quantiles: A Bahadur-Kiefer representation, a Theil-Sen estimator for multiple regression, and a robust dispersion estimator, J. Statist. Plann. Inference, (138)(2008) 1660-1678.

Zuo, Y., and Sering, R., General notions of statistical depth function, Ann. Stat., (28) (2000) 461-482.

Downloads

Published

2018-11-01

Issue

Section

Research Articles