首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >Cleaning Framework for BigData: An Interactive Approach for Data Cleaning
【24h】

Cleaning Framework for BigData: An Interactive Approach for Data Cleaning

机译:大数据清洗框架:一种用于数据清洗的交互式方法

获取原文

摘要

Data is a valuable resource. Proper use of high-quality data can help people make better predictions, analyses and decisions. However, no matter how much effort we put into collecting a good dataset, errors will inevitably creep into the data, making it necessary for data cleaning. This becomes a concern particularly when large-scale heterogeneous data from multiple sources are integrated for other purposes. Data cleaning can be complicated, time-consuming, and expensive, but it is a necessary step in any data-related system since poor-quality data may not be suitable to achieve the intended purposes. The core of our data cleaning system is data association and repairing. Association aims to identify the same object and link with the most associated objects, and repairing is to make a database reliable by fixing errors in the data. For big data applications, we don't necessarily need to use all the data. In most situations, we only need a small subset of the most relevant data. So the goal of association is to convert big raw data into a small subset of the most relevant data that are most useful for a particular application. After we obtain a small amount of relevant data, we also need to further analyze the data to help people digest the data and turn the data into knowledge. We use a number of techniques to associate the data to get useful knowledge for data repairing. Our research shows that data association can effectively help with data repairing. To capture the interaction, we provide a uniform framework that unifies the association and repairing process seamlessly based on context patterns, usage patterns, metadata, and repairing rules.
机译:数据是宝贵的资源。正确使用高质量数据可以帮助人们做出更好的预测,分析和决策。但是,无论我们付出多少努力来收集一个好的数据集,错误都会不可避免地渗入数据中,这对于清理数据是必要的。特别是当将来自多个源的大规模异构数据集成用于其他目的时,这将成为一个问题。数据清理可能很复杂,耗时且昂贵,但是在任何与数据相关的系统中这都是必不可少的步骤,因为劣质数据可能不适合实现预期的目的。我们的数据清理系统的核心是数据关联和修复。关联的目的是识别相同的对象并与最相关的对象链接,而修复则是通过修复数据中的错误来使数据库可靠。对于大数据应用程序,我们不一定需要使用所有数据。在大多数情况下,我们只需要一小部分最相关的数据。因此,关联的目标是将大的原始数据转换为对特定应用最有用的最相关数据的一小部分。在获得少量相关数据之后,我们还需要进一步分析数据,以帮助人们消化数据并将其转变为知识。我们使用多种技术来关联数据,以获得有用的数据修复知识。我们的研究表明,数据关联可以有效地帮助数据修复。为了捕获交互,我们提供了一个统一的框架,该框架基于上下文模式,使用模式,元数据和修复规则无缝地统一了关联和修复过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号