A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

Lejiang GUO; Wei WANG; Fangxin CHEN; Xiao TANG; Weijiang WANG

首页> 外文期刊>Przeglad Elektrotechniczny >A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

【24h】

A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

机译：基于模糊聚类的相似重复数据检测方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The changing information technology makes data increase exponentially in all areas, the quality of the huge amounts of data is the core problems. Data cleaning is an effective technology to solve data quality problems. This paper focuses on the duplicate data cleaning techniques. It studies the quality of the data from the architectural level, the instance-level problems, the multi-source single-source problems, duplicated records cleaning application platform and the evaluation criteria. In these studies, a improved novel detection method adopts the fuzzy clustering algorithm with the Levenshtein distance combination to data cleaning .It can accurately and quickly detect and remove duplicate raw data. The improved method includes a similar duplicate records detection process, the major system framework design, system function modules of the implementation process and results analysis in the paper. The precision and recall rates are higher than several other data cleaning methods. These comparisons confirm the validity of the method. The experimental results exhibit that the proposed method is effective in data detection and cleaning process.%Artykuł proponuje nowe metody czyszczenia danych z uwzględnieniem liczby przypadków, wielu źródeł, podwójnych rekordów i innych kryteriów oceny. Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina. W ten sposób szybko wykrywane są i usuwane podwójne wiersze danych.

机译：不断变化的信息技术使数据在所有领域都呈指数级增长，海量数据的质量是核心问题。数据清理是解决数据质量问题的有效技术。本文重点介绍重复数据清除技术。它从体系结构级别，实例级别的问题，多源单源问题，重复记录清理应用程序平台和评估标准等方面研究数据的质量。在这些研究中，一种改进的新颖检测方法是采用带有Levenshtein距离组合的模糊聚类算法进行数据清理，可以准确，快速地检测和删除重复的原始数据。改进后的方法包括类似的重复记录检测过程，主要的系统框架设计，实现过程的系统功能模块以及结果分析。精度和召回率高于其他几种数据清除方法。这些比较证实了该方法的有效性。实验结果表明，所提方法在数据检测和清除过程中是有效的。 Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina。 W十sposóbszybko wykrywanesąi usuwanepodwójnewiersze danych。

著录项

来源
《Przeglad Elektrotechniczny》 |2012年第1b期|p.26-30|共5页
作者
Lejiang GUO; Wei WANG; Fangxin CHEN; Xiao TANG; Weijiang WANG;
展开▼
作者单位

The Department of Early Warning Surveillance Intelligence, Air Force Radar Academy, Wuhan, 430019, China;

School of Power and Mechanical Engineering, Wuhan University, Wuhan, 430072,China;

The Department of Early Warning Surveillance Intelligence, Air Force Radar Academy, Wuhan,430019, China;

Air Force Radar Academy Wuhan University;

Air Force Radar Academy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
approximate duplicate data; fuzzy clustering; data cleaning; levenshtein distance;

机译：近似重复数据;模糊聚类数据清理;莱文施泰因距离;

相似文献

外文文献
中文文献
专利

1. Detection and Elimination of Duplicate Data Using Token-Based Method for a Data Warehouse: A Clustering Based Approach [J] . J. Jebamalar Tamilselvi, V. Saravanan International journal of dynamics of fluids . 2009,第2期

机译：使用基于令牌的数据仓库方法检测和消除重复数据：一种基于聚类的方法
2. Detection and Elimination of Duplicate Data Using Token-Based Method for a Data Warehouse: A Clustering Based Approach [J] . J. Jebamalar Tamilselvi, V. Saravanan International journal of computational intelligence research . 2009,第2期

机译：使用基于令牌的数据仓库方法检测和消除重复数据：一种基于聚类的方法
3. IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM [J] . RAVIKANTH.M, DR.D.VASUMATHI Journal of Theoretical and Applied Information Technology . 2017,第8期

机译：利用模糊聚类和遗传算法提高去重复性代表数据方案统计方法的质量
4. Evaluate clustering performance and computational efficiency for PSO based fuzzy clustering methods in processing big imbalanced data [C] . Jin Wang, Hua Fang, Bo Li, IEEE International Conference on Communications . 2017

机译：基于PSO的模糊聚类方法在处理大不平衡数据时评估聚类性能和计算效率
5. LMI-based controller synthesis for fuzzy control systems and clustering-based sensor fusion for UXO detection. [D] . Li, Jing. 2001

机译：用于模糊控制系统的基于LMI的控制器综合，以及用于UXO检测的基于聚类的传感器融合。
6. AF-DHNN: Fuzzy Clustering and Inference-Based Node Fault Diagnosis Method for Fire Detection [O] . Shan Jin, Wen Cui, Zhigang Jin, 2015

机译：AF-DHNN：基于模糊聚类和基于推理的节点故障诊断方法
7. A hybrid mobile call fraud detection model using optimized fuzzy C-means clustering and group method of data handling-based network [O] . Sharmila Subudhi, Suvasini Panigrahi 2018

机译：一种使用优化模糊C-Means聚类和基于数据处理网络的组方法的混合移动电话欺诈检测模型

A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

摘要

著录项

相似文献

相关主题

期刊订阅