...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets
【24h】

RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets

机译:RDEC:将正则化集成到用于不平衡数据集的深度嵌入式群集中

获取原文
           

摘要

Clustering is a fundamental machine learning task and can be used in many applications. With the development of deep neural networks (DNNs), combining techniques from DNNs with clustering has become a new research direction and achieved some success. However, few studies have focused on the imbalanced-data problem which commonly occurs in real-world applications. In this paper, we propose a clustering method, regularized deep embedding clustering (RDEC), that integrates virtual adversarial training (VAT), a network regularization technique, with a clustering method called deep embedding clustering (DEC). DEC optimizes cluster assignments by pushing data more densely around centroids in latent space, but it is sometimes sensitive to the initial location of centroids, especially in the case of imbalanced data, where the minor class has less chance to be assigned a good centroid. RDEC introduces regularization using VAT to ensure the model’s robustness to local perturbations of data. VAT pushes data that are similar in the original space closer together in the latent space, bunching together data from minor classes and thereby facilitating cluster identification by RDEC. Combining the advantages of DEC and VAT, RDEC attains state-of-the-art performance on both balanced and imbalanced benchmark/real-world datasets. For example, accuracies are as high as 98.41% on MNIST dataset and 85.45% on a highly imbalanced dataset derived from the MNIST, which is nearly 8% higher than the current best result.
机译:集群是一项基本的机器学习任务,可以在许多应用程序中使用。随着深度神经网络(DNN)的发展,将DNN与聚类技术相结合已成为一个新的研究方向,并取得了一定的成功。但是,很少有研究集中在实际应用中经常发生的数据不平衡问题上。在本文中,我们提出了一种聚类方法,即正则化深度嵌入聚类(RDEC),该方法将虚拟对抗训练(VAT),一种网络正则化技术与一种称为深度嵌入聚类(DEC)的聚类方法相集成。 DEC通过在潜伏空间中的质心周围更密集地推送数据来优化集群分配,但是有时它对质心的初始位置敏感,尤其是在数据不平衡的情况下,次要类被分配良好质心的机会较小。 RDEC引入了使用增值税的正则化,以确保模型对本地数据扰动的稳健性。 VAT将原始空间中相似的数据推送到潜在空间中更靠近的地方,将次要类的数据聚集在一起,从而便于RDEC进行集群识别。 RDEC结合了DEC和VAT的优势,在平衡和不平衡的基准/实际数据集上均达到了最先进的性能。例如,MNIST数据集的准确性高达98.41%,而源自MNIST的高度不平衡的数据集的准确性高达85.45%,这比当前的最佳结果高出近8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号