1-Mean and 1-Medoid 2-Clustering Problem With Arbitrary Cluster Sizes: Complexity and Approximation

Authors

  • Artem V. Pyatkin Sobolev Institute of Mathematics, Russia, Novosibirsk + Novosibirsk State University, Russia, Novosibirsk

DOI:

https://doi.org/10.2298/YJOR211018008P

Keywords:

Euclidean Space, Mean, Medoid, 2-clustering, 2-approximation algorithm, Strong NP-hardness

Abstract

We consider the following 2-clustering problem. Given N points in Euclidean space, partition it into two subsets (clusters) so that the sum of squared distances between the elements of the clusters and their centers would be minimum. The center of the first cluster coincides with its centroid (mean) while the center of the second cluster should be chosen from the set of the initial points (medoid). It is known that this problem is NP-hard if the cardinalities of the clusters are given as a part of the input. In this paper we prove that the problem remains NP-hard in the case of arbitrary clusters sizes and suggest a 2-approximation polynomial-time algorithm for this problem.

References

A.V. Pyatkin, “NP-hardness of 1-Mean and 1-Medoid 2-Clustering Problem with Arbitrary Clusters Sizes”, in MOTOR 2021. Communications in Computer and Information Science, vol. 1476, pp. 248-256, 2021.

P. Berkhin, “A Survey of Clustering Data Mining Techniques”, in Kogan J., Nicholas C., Teboulle M. (eds) Grouping Multidimensional Data. Springer, Berlin, Heidelberg, 2006.

R. C. Dubes, and A. K. Jain, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, New Jersey, 07632, 1988.

S. Ghoreyshi, and J. Hosseinkhani, “Developing a Clustering Model based on K-Means Algorithm in order to Creating Different Policies for Policyholders”, International Journal of Advanced Computer Science and Information Technology, vol. 4, no. 2, pp. 46-53, 2015.

W. D. Fisher, “On Grouping for Maximum Homogeneity”, Journal of the American Statistical Association, vol. 53, no. 284, pp. 789-798, 1958.

J. MacQueen, “Some methods for classification and analysis of multivariate observations”, in Proceedings of the 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 5.1, pp. 281-297, 1967.

L. Kaufman, and P. J. Rousseeuw, “Clustering by means of medoids”, in Y. Dodge (ed.), Statistical Data Analysis Based on the L1-Norm and Related Methods. North-Holland, Amsterdam, 1987, pp. 405-416 .

D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-hardness of Euclidean sum-ofsquares clustering”, Machine Learning, vol. 75, no. 2, pp. 245-248, 2009.

M. Inaba, N. Katoh, and H. Imai “Applications of weighted Voronoi diagrams and randomization to variance-based clustering”, in Proceedings of the tenth annual symposium on Computational geometry, pp. 332-339, 1994.

M. Mahajan, P. Nimbhorkar, and K. Varadarajan, “The planar k-means problem is NPhard”, Theoretical Computer Science, vol. 442, pp. 13-21, 2012.

A. Kumar, Y. Sabharwal, and S. Sen, “A simple linear time (1+ε)-approximation algorithm for geometric k-means clustering in any dimensions”, in Proceedings - Annual Symposium on Foundations of Computer Science, pp. 454-462, 2004.

A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight”, Journal of Applied and Industrial Mathematics, vol. 2, no. 1, pp. 32-38, 2008.

E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences”, Sibirskii Zhurnal Industrial’noi Matematiki, vol. 9, no. 1, pp. 55-74, 2006. (in Russian).

A. V. Kelmanov, and A. V. Pyatkin, “On the complexity of a search for a subset of ‘’similar” vectors”, Doklady Mathematics, vol. 78, no. 1, pp. 574-575, 2008.

A. V. Kel’manov, and A. V. Pyatkin, “On a Version of the Problem of Choosing a Vector Subset”, Journal of Applied and Industrial Mathematics, vol. 3, no. 4, pp. 447-455, 2009.

A. V. Dolgushev, and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis”, Journal of Applied and Industrial Mathematics, vol. 5, no. 4, pp. 551- 558, 2011.

A. V. Kel’manov, and V. I. Khandeev, “A 2-approximation polynomial algorithm for a clustering problem”, Journal of Applied and Industrial Mathematics, vol. 7, no. 4, pp. 515-521, 2013.

E. Kh. Gimadi, A. V. Pyatkin, and I. A. Rykov, “On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension”, Journal of Applied and Industrial Mathematics, vol. 4, no. 1, pp. 48-53, 2010.

V.V. Shenmaier, “Solving some vector subset problems by Voronoi diagrams”, Journal of Applied and Industrial Mathematics, vol. 10, no. 4, pp. 560-566, 2016.

A. V. Kel’manov, A. V. Pyatkin, and V. I., Khandeev, “NP-Hardness of Quadratic Euclidean 1-Mean and 1-Median 2-Clustering Problem with Constraints on the Cluster Sizes”, Doklady Mathematics, vol. 100, no. 3, pp. 545-548, 2019.

A.V. Pyatkin, “Easy NP-hardness Proofs of Some Subset Choice Problems”, in Kochetov Y., Bykadorov I., Gruzdeva T. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2020. Communications in Computer and Information Science, vol. 1275, pp. 70-79, 2020.

M. J. Garey, and D. S. Johnson, Computers and Intractability. The Guide to the Theory of NP-Completeness, W. H. Freeman and Company, San Francisco, 1979.

Downloads

Published

2022-05-08

Issue

Section

Research Articles