A New Spark Based K-Means Clustering with Data Removing Strategy

机译：具有数据去除策略的新的火花基K-mears聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is an important technique in machine learning, which has been used to organize data into groups of similar data points called also clusters. In fact, conventional clustering methods are not suitable when dealing with large scale data. This is explained by the high computational cost of these methods which require unrealistic time to build the grouping. We propose in this work a new Spark based K-means Clustering with Data Removing Strategy referred to as (SKMDRS). The proposed method is based on data removing strategy which aims to reduce the computational time, by removing at each iteration data points that are unlikely to change the clusters to which they belong thereafter. In addition, the clustering process is distributed through Spark framework in order to enhance the scalability. Conducted experiments show the efficiency of the proposed method compared to existing ones.

机译：群集是机器学习中的重要技术，它已被用于将数据组织成称为群集的类似数据点组。实际上，在处理大规模数据时，传统的聚类方法不适合。这是通过这些方法的高计算成本来解释，这些方法需要不切实际的时间来构建分组。我们在这项工作中提出了一种新的火花基的K-Means群集，数据删除策略称为（SKMDR）。该方法基于数据去除策略，该策略旨在通过在每个迭代数据点处移除不太可能改变它们所属的簇的每个迭代数据点来减少计算时间。此外，聚类过程通过Spark框架分发，以提高可扩展性。进行的实验表明，与现有的实验表明该方法的效率。

著录项

来源
《International conference on digital economy》|2019年|xv 412 p.|共16页
会议地点
作者
Kenza Rziga; Mohamed Aymen Ben HajKacem; Nadia Essoussi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类电子贸易、网上贸易;
关键词
Big data; Clustering; K-means; MapReduce; Spark;

机译：大数据;聚类;k均值;mapreduce;火花;

相似文献

外文文献
中文文献
专利

1. A new hybrid strategy for data clustering using cuckoo search based on Mantegna levy distribution, PSO and k-means [J] . Omid Tarkhaneh, Ayaz Isazadeh, Hossein Jabbari Khamnei International Journal of Computer Applications in Technology . 2018,第2期

机译：基于Mantegna Levy分配，PSO和K-Meance的Cuckoo搜索的数据聚类的新混合策略
2. Distance based k-means clustering algorithm for determining number of clusters for high dimensional data [J] . Alibuhtto M., Mahat N. Decision Science Letters . 2020,第1期

机译：基于距离的K均值聚类算法，用于确定高维数据的簇数
3. Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm [J] . S.Padmaja, Dr.Ananthi Sheshasaayee International Journal of Engineering and Technology . 2016,第1期

机译：基于Web日志数据的用户行为使用改进的K-means群集算法群集
4. A New Spark Based K-Means Clustering with Data Removing Strategy [C] . Kenza Rziga, Mohamed Aymen Ben HajKacem, Nadia Essoussi International conference on digital economy . 2019

机译：一种基于Spark的新型K-Means聚类数据删除策略
5. Electromagnetsim Based K-Means Clustering for Big Data [D] . Eerlapati, Abhinav. 2017

机译：基于电磁的大数据K均值聚类
6. Analysis of big data job requirements based on K-means text clustering in China [O] . Dai Debao, Ma Yinxia, Zhao Min, 2021

机译：基于K-MESS文本聚类的大数据职能分析
7. An Improved Clustering Algorithm for Big Data Based on K-Means with Optimized Clusters’ Number [O] . 2015

机译：基于K-ils的大数据具有优化簇数的大数据的改进聚类算法

A New Spark Based K-Means Clustering with Data Removing Strategy

摘要

著录项

相似文献

相关主题

期刊订阅