首页> 外文会议>ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems >Don't Let The Negatives Bring You Down: Sampling from Streams of Signed Updates
【24h】

Don't Let The Negatives Bring You Down: Sampling from Streams of Signed Updates

机译:不要让否定否定让你失望:从签名更新的溪流中抽样

获取原文

摘要

Random sampling has been proven time and time again to be a powerful tool for working with large data. Queries over the full dataset are replaced by approximate queries over the smaller (and hence easier to store and manipulate) sample. The sample constitutes a flexible summary that supports a wide class of queries. But in many applications, datasets are modified with time, and it is desirable to update samples without requiring access to the full underlying datasets. In this paper, we introduce and analyze novel techniques for sampling over dynamic data, modeled as a stream of modifications to weights associated with each key. While sampling schemes designed for stream applications can often readily accommodate positive updates to the dataset, much less is known for the case of negative updates, where weights are reduced or items deleted altogether. We primarily consider the turnstile model of streams, and extend classic schemes to incorporate negative updates. Perhaps surprisingly, the modifications to handle negative updates turn out to be natural and seamless extensions of the well-known positive update-only algorithms. We show that they produce unbiased estimators, and we relate their performance to the behavior of corresponding algorithms on insert-only streams with different parameters. A careful analysis is necessitated, in order to account for the fact that sampling choices for one key now depend on the choices made for other keys. In practice, our solutions turn out to be efficient and accurate. Compared to recent algorithms for L_p sampling which can be applied to this problem, they are significantly more reliable, and dramatically faster.
机译:随机采样已被证明是时间和时间再次成为使用大数据的强大工具。通过较小的近似查询替换完整数据集上的查询(并且因此更容易存储和操作)示例。该示例构成了一个灵活的摘要,支持广泛的查询。但在许多应用中,数据集随时间修改,并且希望更新样本而不需要访问完整的底层数据集。在本文中,我们介绍和分析了用于对动态数据采样采样的新技术,以与每个键相关联的重量的修改流建模。虽然为流应用程序设计的采样方案通常可以容易地适应数据集的正更新,但对于负更新的情况而言,较少的较低,其中减少了权重或完全删除的项目。我们主要考虑流的流模型,并扩展经典方案以结合负面更新。也许令人惊讶的是,处理负面更新的修改结果是众所周知的正更新算法的自然和无缝扩展。我们表明它们产生了无偏见的估计,我们将其性能与不同参数不同参数的Insert-Plustle上的相应算法的行为相关联。仔细分析是必要的,以便考虑到一个关键的采样选择现在取决于为其他键制造的选择。在实践中,我们的解决方案结果是高效准确的。与最近的L_P采样算法相比,可以应用于此问题,它们明显更可靠,并且剧烈地更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号