Fast Approximation Algorithms for Some Maximin Clustering Problems

Authors

  • V. Khandeev Sobolev Institute of Mathematics, Novosibirsk, Russia
  • S. Neshchadim Novosibirsk State University, Novosibirsk, Russia

DOI:

https://doi.org/10.2298/YJOR2023021K

Keywords:

Euclidean space, clustering, max-min problem, NP-hardness, bounded scatter, approximation algorithm

Abstract

In this paper, we consider three cases of an intractable problem of searching for two subsets in a finite set of points of Euclidean space. In all three cases, it is required to maximize the minimum cluster’s cardinality under constraint on each cluster’s scatter. The scatter is the sum of the distances from the cluster elements to the center, which is defined differently in each of the three cases. In the first case, cluster centers are fixed points. In the second case, the centers are unknown points from the input set. In the third case, the centers are defined as the centroids (the arithmetic mean) of clusters. We propose a general scheme that allows us to build a polynomial 1/2-approximation algorithm for a generalized problem and can be used for constructing 1/2-approximation algorithms for the first two cases and for the one-dimensional third case. Also we show how, using precomputed general information, their time complexities can be reduced to the complexity of sorting. Finally, we present the results of computational experiments showing the accuracy of the proposed algorithms on randomly generated input data.

References

V. Khandeev and S. Neshchadim, “Approximate algorithms for some maximin clustering problems,” Communications in Computer and Information Science, vol. 1661, pp. 89-103, 2022. doi: 10.1007/978-3-031-16224-4 6

C. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.

C. Aggarwal, Data Mining: The Textbook. Switzerland: Springer, 2015.

J. Osborne, Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data. London: SAGE, 2013.

A. Ageev, A. Kel’manov, A. Pyatkin, S. Khamidullin, and V. Shenmaier, “Approximation polynomial algorithm for the data editing and data cleaning problem,” Pattern Recognition and Image Analysis, vol. 27, no. 3, pp. 365-370, 2017. doi: 10.1134/S1054661817030038

A. Eremeev, A. Kel’manov, A. Pyatkin, and I. Ziegler, “On finding maximum cardinality subset of vectors with a constraint on normalized squared length of vectors sum,” Lecture Notes in Computer Science, vol. 10716, pp. 142-151, 2018. doi: 10.1007/978-3-319-73013-413

A. Kel’manov, A. Panasenko, and V. Khandeev, “Exact algorithms of search for a cluster of the largest size in two integer 2-clustering problems,” Numerical Analysis and Applications, vol. 12, no. 2, pp. 105-115, 2019. doi: 10.1134/S1995423919020010

A. Kel’manov, A. Pyatkin, S. Khamidullin, V. Khandeev, Y. Shamardin, and V. Shenmaier, “An approximation polynomial algorithm for a problem of searching for the longest subsequence in a finite sequence of points in euclidean space,” Communications in Computer and Information Science, vol. 871, pp. 120-130, 2018. doi: 10.1007/978-3-319-93800-4 10

A. Farcomeni and L. Greco, Robust methods for data reduction. CRC press, 2016.

V. Khandeev and S. Neshchadim, “Max-min problems of searching for two disjoint subsets,” Lecture Notes in Computer Science, vol. 13078, pp. 231-245, 2021. doi: 10.1007/978-3-030- 91059-4 17

A. Kel’manov, A. Panasenko, and V. Khandeev, “Exact algorithms of search for a cluster of the largest size in two integer 2-clustering problems,” Numerical Analysis and Applications, vol. 12, pp. 105-115, 2019. doi: 10.1134/S1995423919020010

Downloads

Published

2023-12-02

Issue

Section

Research Articles