We consider two closely related fundamental clustering problems in this paper. In the Min-Sum k-Clustering problem, one is given a metric space and has to partition the points into k clusters while minimizing the total pairwise distances between the points assigned to the same cluster. In the Balanced k-Median problem, the instance is the same and one has to obtain a partitioning into k clusters C_1,..., C_k, where each cluster C_i has a center c_i, while minimizing the total assignment costs for the points in the metric; here the cost of assigning a point j to a cluster C_i is equal to |C_i| times the distance between j and c_i in the metric. In this paper, we present an O(logn)-approximation for both these problems where n is the number of points in the metric that are to be served. This is an improvement over the O(∈~(-1) log~(1+∈) n)-approximation (for any constant ∈ > 0) obtained by Bartal, Charikar, and Raz [STOC 01]. We also obtain a quasi-PTAS for Balanced k-Median in metrics with constant doubling dimension. As in the work of Bartal et al., our approximation for general metrics uses embeddings into tree metrics. The main technical contribution in this paper is an O(1)-approximation for Balanced k-Median in hierarchically separated trees (HSTs). Our improvement comes from a more direct dynamic programming approach that heavily exploits properties of standard HSTs. In this way, we avoid the reduction to special types of HSTs that were considered by Bartal et al., thereby avoiding an additional O(∈~(-1) log~∈ n) loss.
展开▼