首页> 外文期刊>Computer networks >Finding needles in a hay stream: On persistent item lookup in data streams
【24h】

Finding needles in a hay stream: On persistent item lookup in data streams

机译:在干草流中查找针:在数据流中的持久性项目查找

获取原文
获取原文并翻译 | 示例
       

摘要

In a data stream composed of an ordered sequence of data items, persistent items refer to those persisting to occur over a long timespan. Compared with ordinary items, persistent ones, though not necessarily occurring more frequently, typically convey more valuable information. Persistent item lookup, the functionality to identify all persistent items, emerges as a pivotal building block in many computing and network systems. In this paper, we devise a generic persistent item lookup algorithm supporting high-speed, high-accuracy lookup with limited memory cost. The key technicalities we propose in our design are two-fold. First, our algorithm attempts to record only persistent items seen so far based on the currently available information about the stream, thus significantly reducing memory overhead, especially for real-life highly skewed data streams. Second, our algorithm balances the recording load in both time and space domains: in the time domain, we partition persistent items into approximately equal-size subsets and record only one subset in each epoch; in the space domain, we apply the state-of-the-art load balancing technique to evenly distribute recorded items across the on-die memory. By holistically integrating these components, we iron out a persistent item lookup algorithm outperforming existing solutions in a wide range of practical settings.
机译:在由订购的数据项组成的数据流中,持久性项目指的是持久地发生在长时间的那些。与普通物品相比,持久性的项目,但不一定不一定地发生,通常传达更多有价值的信息。持久性项目查找,要识别所有持久性项目的功能,以许多计算和网络系统中的一个关键构建块出现。在本文中,我们设计了一种支持高速,高精度查找的通用持久性项目查找算法,内存成本有限。我们在设计中提出的主要技术人员是两倍。首先,我们的算法尝试仅基于关于流的当前可用信息来记录到目前为止看到的持久项目,从而显着减少了存储器开销,尤其是对于真实寿命高度倾斜的数据流。其次,我们的算法余额在时间和空间域中的录制负载余额:在时域中,我们将持久的项目分区为大约相等大小的子集,并在每个时代中只记录一个子集;在空间域中,我们应用最先进的负载均衡技术,以均匀地分布在导通存储器上的记录项目。通过整体集成这些组件,我们将在各种实际设置中熨烫持久的项目查找算法优于现有的解决方案。

著录项

  • 来源
    《Computer networks》 |2020年第9期|107518.1-107518.11|共11页
  • 作者单位

    Sun Yat Sen Univ Sch Data & Comp Sci Guangzhou Peoples R China;

    Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China;

    Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China;

    Beijing Inst Technol Sch Informat & Elect Beijing Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Persistent item lookup; Data stream mining;

    机译:持久项目查找;数据流挖掘;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号