【24h】

Pairs Covered by a Sequence of Sets

机译:由一系列套装覆盖的对

获取原文

摘要

Enumerating minimal new combinations of elements in a sequence of sets is interesting, e.g., for novelty detection in a stream of texts. The sets are the bags of words occuring in the texts. We focus on new pairs of elements as they are abundant. By simple data structures we can enumerate them in quadratic time, in the size of the sets, but large intersections with earlier sets rule out all pairs therein in linear time. The challenge is to use this observation efficiently. We give a greedy heuristic based on the twin graph, a succinct description of the pairs covered by a set family, and on finding good candidate sets by random sampling. The heuristic is motivated and supported by several related complexity results: sample size estimates, hardness of maximal coverage of pairs, and approximation guarantees when a few sets cover almost all pairs.
机译:枚举一系列集中的最小新组合是有趣的,例如,用于文本流中的新奇检测。该套装是在文本中发生的单词袋。我们专注于新的元素对,因为它们很丰富。通过简单的数据结构,我们可以在二次时间中枚举它们,在集合的大小中,但是与早期设置的大交叉点在线性时间中排除在其中的所有对。挑战是有效地使用此观察。我们基于双图,给出了一条贪婪的启发式,这是一个简洁的描述由集合家庭覆盖的对,以及通过随机抽样找到良好的候选集。启发式是有动机和支持的几种相关复杂性结果:样本量估计,最大覆盖的最大覆盖的硬度,并且当几套覆盖几乎所有对时,近似保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号