首页> 外文会议>International conference on information knowledge engineering >Data Cleaning in Out-of-Core Column-Store Databases: An Index-Based Approach
【24h】

Data Cleaning in Out-of-Core Column-Store Databases: An Index-Based Approach

机译:核心外列存储数据库中的数据清理:基于索引的方法

获取原文

摘要

Write optimization in out-of-core (or external memory) column-store databases is a well-known challenge. Timestamped Binary Association Table (or TBAT) and Asynchronous Out-of-Core Update (or AOC Update) have shown significant improvements for this problem. However, after a time period of AOC updates, the selection query performance on TBAT gradually decreases. Even though data cleaning methods can merge update records in TBAT to increase ad-hoc searching speed, it could be a time-consuming process. In this work, we introduce multiple data cleaning methods utilizing the index structure called offset B+-tree (or OB-tree). When the OB-tree and updating records can be fit into the system memory, an eager data cleaning approach is introduced for fast cleaning speed. In a data intensive environment, the OB-tree index or the updating records might be too large to fit into memory; therefore, a progressive data cleaning approach is introduced which can divide the update records into small slips and clean the data a memory-economic manner.
机译:在核心外(或外部存储器)列存储数据库中进行写优化是一个众所周知的挑战。带时间戳的二进制关联表(或TBAT)和异步内核外更新(或AOC更新)已显示出针对此问题的显着改进。但是,在一段时间的AOC更新之后,TBAT上的选择查询性能逐渐降低。即使数据清理方法可以合并TBAT中的更新记录以提高即席搜索速度,但这也可能是一个耗时的过程。在这项工作中,我们介绍了多种利用索引结构的数据清理方法,这些索引结构称为偏移B +树(或OB树)。当OB树和更新记录可以放入系统内存中时,便引入了一种急切的数据清理方法以加快清理速度。在数据密集型环境中,OB树索引或更新记录可能太大而无法容纳到内存中。因此,引入了一种渐进式数据清理方法,该方法可以将更新记录分成小清单,并以节省内存的方式清理数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号