Cleaning Framework for BigData: An Interactive Approach for Data Cleaning

机译：大数据清洗框架：一种用于数据清洗的交互式方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data is a valuable resource. Proper use of high-quality data can help people make better predictions, analyses and decisions. However, no matter how much effort we put into collecting a good dataset, errors will inevitably creep into the data, making it necessary for data cleaning. This becomes a concern particularly when large-scale heterogeneous data from multiple sources are integrated for other purposes. Data cleaning can be complicated, time-consuming, and expensive, but it is a necessary step in any data-related system since poor-quality data may not be suitable to achieve the intended purposes. The core of our data cleaning system is data association and repairing. Association aims to identify the same object and link with the most associated objects, and repairing is to make a database reliable by fixing errors in the data. For big data applications, we don't necessarily need to use all the data. In most situations, we only need a small subset of the most relevant data. So the goal of association is to convert big raw data into a small subset of the most relevant data that are most useful for a particular application. After we obtain a small amount of relevant data, we also need to further analyze the data to help people digest the data and turn the data into knowledge. We use a number of techniques to associate the data to get useful knowledge for data repairing. Our research shows that data association can effectively help with data repairing. To capture the interaction, we provide a uniform framework that unifies the association and repairing process seamlessly based on context patterns, usage patterns, metadata, and repairing rules.

机译：数据是宝贵的资源。正确使用高质量数据可以帮助人们做出更好的预测，分析和决策。但是，无论我们付出多少努力来收集一个好的数据集，错误都会不可避免地渗入数据中，这对于清理数据是必要的。特别是当将来自多个源的大规模异构数据集成用于其他目的时，这将成为一个问题。数据清理可能很复杂，耗时且昂贵，但是在任何与数据相关的系统中这都是必不可少的步骤，因为劣质数据可能不适合实现预期的目的。我们的数据清理系统的核心是数据关联和修复。关联的目的是识别相同的对象并与最相关的对象链接，而修复则是通过修复数据中的错误来使数据库可靠。对于大数据应用程序，我们不一定需要使用所有数据。在大多数情况下，我们只需要一小部分最相关的数据。因此，关联的目标是将大的原始数据转换为对特定应用最有用的最相关数据的一小部分。在获得少量相关数据之后，我们还需要进一步分析数据，以帮助人们消化数据并将其转变为知识。我们使用多种技术来关联数据，以获得有用的数据修复知识。我们的研究表明，数据关联可以有效地帮助数据修复。为了捕获交互，我们提供了一个统一的框架，该框架基于上下文模式，使用模式，元数据和修复规则无缝地统一了关联和修复过程。

著录项

来源
《IEEE International Conference on Big Data Computing Service and Applications》|2016年|174-181|共8页
会议地点
作者
Hong Liu; Ashwin Kumar Tk; Johnson P Thomas; Xiaofei Hou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data Association; Data Cleaning; Data Repairing;

机译：数据关联;数据清理;数据修复;

相似文献

外文文献
中文文献
专利

1. A Unified Framework and Sequential Data Cleaning Approach for a Data Warehouse [J] . J. Jebamalar Tamilselvi, V. Saravanan International journal of computer science and network security . 2008,第5期

机译：数据仓库的统一框架和顺序数据清理方法
2. How clean is clean: a new approach to assess and enhance environmental cleaning and disinfection in an acute tertiary care facility [J] . Wai Khuan Ng BMJ Open Quality . 2014,第1期

机译：清洁的程度：评估和加强急性三级护理设施中环境清洁和消毒的新方法
3. A Review of Data Cleaning Approaches in a Hydrographic Framework with a Focus on Bathymetric Multibeam Echosounder Datasets [J] . Julian Le Deunf, Nathalie Debese, Thierry Schmitt, Geosciences . 2020,第7期

机译：水文框架中数据清洁方法的综述，侧重于碱基多阵线回声数据集
4. Cleaning Framework for BigData: An Interactive Approach for Data Cleaning [C] . Hong Liu, Ashwin Kumar Tk, Johnson P Thomas, IEEE International Conference on Big Data Computing Service and Applications . 2016

机译：用于大数据的清洁框架：数据清洁的交互式方法
5. Cleaning Framework for Big Data [D] . Liu, Hong. 2017

机译：大数据清洗框架
6. How clean is clean: a new approach to assess and enhance environmental cleaning and disinfection in an acute tertiary care facility [O] . Wai Khuan Ng 2014

机译：清洁的程度：评估和加强急性三级护理设施中环境清洁和消毒的新方法
7. Data Cleaning Framework: An Extensible Approach to Data Cleaning [O] . Gu Randy S. 2010

机译：数据清理框架：一种可扩展的数据清理方法
8. State Review Framework: Indiana. Clean Water Act, Clean Air Act, and Resource Conservation and Recovery Act Implementation in Federal Fiscal Year 2011. [R] . 2013

机译：州审查框架：印第安纳州。 “联邦财政年度清洁水法”，“清洁空气法”和“资源保护与恢复法”实施情况。

Cleaning Framework for BigData: An Interactive Approach for Data Cleaning

摘要

著录项

相似文献

相关主题

期刊订阅