...
首页> 外文期刊>Communications in Statistics. B, Simulation and Computation >A Modular Cdf Approach For The Approximationof Percentiles
【24h】

A Modular Cdf Approach For The Approximationof Percentiles

机译:百分位数逼近的模块化Cdf方法

获取原文
获取原文并翻译 | 示例
           

摘要

This article describes a method for computing approximate statistics for large data sets, when exact computations may not be feasible. Such situations arise in applications such as climatology, data mining, and information retrieval (search engines). The key to our approach is a modular approximation to the cumulative distribution function (cdf) of the data. Approximate percentiles (as well as many other statistics) can be computed from this approximate cdf. This enables the reduction of a potentially overwhelming computational exercise into smaller, manageable modules. We illustrate the properties of this algorithm using a simulated data set. We also examine the approximation characteristics of the approximate percentiles, using a von Mises functional type approach. In particular, it is shown that the maximum error between the approximate cdf and the actual cdf of the data is never more than 1% (or any other preset level). We also show that under assumptions of underlying smoothness of the cdf, the approximation error is much lower in an expected sense. Finally, we derive bounds for the approximation error of the percentiles themselves. Simulation experiments show that these bounds can be quite tight in certain circumstances.
机译:本文介绍了一种在精确计算可能不可行的情况下针对大型数据集计算近似统计信息的方法。这种情况出现在诸如气候学,数据挖掘和信息检索(搜索引擎)之类的应用中。我们方法的关键是对数据的累积分布函数(cdf)的模块化近似。可以从此近似cdf计算近似百分位数(以及许多其他统计数据)。这样可以将潜在的压倒性的计算工作减少为更小的可管理模块。我们使用模拟数据集说明了该算法的性质。我们还使用von Mises功能类型方法检查了近似百分位数的近似特征。特别是,显示出数据的近似cdf和实际cdf之间的最大误差永远不会超过1%(或任何其他预设级别)。我们还表明,在cdf基本平滑的假设下,在预期的意义上逼近误差要低得多。最后,我们得出百分位数本身的近似误差的范围。仿真实验表明,在某些情况下,这些界限可能非常严格。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号