首页> 外文会议>The First symposium on innovations in computer science >Space-Eficient Estimation of Robust Statistics and Distribution Testing
【24h】

Space-Eficient Estimation of Robust Statistics and Distribution Testing

机译:稳健统计和分布测试的空间高效估计

获取原文
获取原文并翻译 | 示例

摘要

The generic problem of estimation and inference given a sequence of i.i.d. samples has been extensively studied in the statistics, property testing, and learning communities. A natural quantity of interest is the sample complexity of the particular learning or estimation problem being considered. While sample complexity is an important component of the computational efficiency of the task, it is also natural to consider the space complexity: do we need to store all the samples as they are drawn, or is it sufficient to use memory that is significantly sublinear in the sample complexity? Surprisingly, this aspect of the complexity of estimation has received significantly less attention in all but a few specific cases. While space-bounded, sequential computation is the purview of the field of data-stream computation, almost all of the literature on the algorithmic theory of data-streams considers only "empirical problems", where the goal is to compute a function of the data present in the stream rather than to infer something about the source of the stream. Our contributions are two-fold. First, we provide results connecting space efficiency to the estimation of robust statistics from a sequence of i.i.d. samples. Robust statistics are a particularly interesting class of statistics in our setting because, by definition, they are resilient to noise or errors in the sampled data. We show that this property is enough to ensure that very space-efficient stream algorithms exist for their estimation. In contrast, the numerical value of a "non-robust" statistic can change dramatically with additional samples, and this limits the utility of any finite length sequence of samples. Second, we present a general result that captures a trade-off between sample and space complexity in the context of distributional property testing.
机译:给定序列i.i.d的估计和推理的一般问题。样本已在统计,属性测试和学习社区中进行了广泛的研究。感兴趣的自然数量是所考虑的特定学习或估计问题的样本复杂性。尽管样本复杂度是任务计算效率的重要组成部分,但考虑到空间复杂度也是很自然的:我们是否需要在绘制所有样本时存储所有样本,或者使用在内存中显着亚线性的内存是否足够样本复杂度?令人惊讶的是,除了少数特定情况外,估计复杂性的这一方面几乎没有受到太多关注。尽管空间有限的顺序计算是数据流计算领域的职责,但几乎所有有关数据流算法理论的文献都只考虑“经验问题”,其目的是计算数据的函数出现在流中,而不是推断出流的来源。我们的贡献是双重的。首先,我们提供了将空间效率与i.i.d序列的鲁棒统计估计相联系的结果。样品。健壮的统计数据在我们的环境中是一类特别有趣的统计数据,因为根据定义,它们可以抵抗采样数据中的噪声或错误。我们证明了该属性足以确保存在非常节省空间的流算法,以便对其进行估算。相反,“非稳健”统计量的数值会随其他样本的变化而急剧变化,这限制了任何有限长度样本序列的实用性。其次,我们提出一个总体结果,该结果在分布特性测试的背景下捕获了样本与空间复杂度之间的折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号