【24h】

Nested Mini-Batch K-Means

机译:嵌套小批量K均值

获取原文

摘要

A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t + 1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1% of the empirical minimum 100 × earlier than the standard mini-batch algorithm.
机译:提出了一种新的算法,该算法通过使用Elkan(2003)的距离限制方法来加速Sculley(2010)的小批量k均值算法。我们认为,将距离限制合并到小批量算法中时,应该优先重用已使用的数据。为此,我们建议使用嵌套的迷你批处理,由此在迭代t处的微型批处理中的数据将在迭代t + 1处自动重用。使用嵌套的微型批处理存在两个困难。首先是数据的不均衡使用可能会使估计值产生偏差,我们通过确保每个数据样本对质心的贡献恰好一次来解决这一问题。第二个是选择小批量大小,我们通过平衡质心的过早微调与冗余导致的速度下降来解决。实验表明,所得的nmbatch算法非常有效,通常比标准的mini-batch算法提前100%达到经验最小值的1%之内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号