Adaptive Stratified Sampling for Precision-Recall Estimation

机译：精密召回估计的自适应分层采样

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new algorithm for computing a constant-factor approximation of precisionrecall (PR) curves for massive noisy datasets produced by generative models. Assessing validity of items in such datasets requires human annotation, which is costly and must be minimized. Our algorithm, ADASTRAT, is the first data-aware method for this task. It chooses the next point to query on the PR curve adaptively, based on previous observations. It then selects specific items to annotate using stratified sampling. Under a mild monotonicity assumption, ADASTRAT outputs a guaranteed approximation of the underlying precision function, while using a number of annotations that scales very slowly with N, the dataset size. For example, when the minimum precision is bounded by a constant, it issues only log log N precision queries. In general, it has a regret of no more than log log N w.r.t. an oracle that issues queries at data-dependent (unknown) optimal points. On a scaled-up NLP dataset of 3.5M items, ADASTRAT achieves a remarkably close approximation of the true precision function using only 18 precision queries, 13× fewer than best previous approaches.

机译：我们提出了一种新的算法，用于计算由生成模型生产的大规模噪声数据集的PrecisionRecall（PR）曲线的恒因子近似。评估这些数据集中的物品的有效性需要人类注释，这是昂贵的，并且必须最小化。我们的算法Adastrat是此任务的第一个数据感知方法。它根据先前的观察选择了在PR曲线上查询的下一个点。然后，它选择特定项目以使用分层采样注释。在温和的单调性假设下，Adastrat输出了基础精度函数的保证近似，同时使用多个注释，与n，数据集大小非常缓慢地缩放。例如，当最小精度被常量界定时，它只发出日志日志n精确查询。通常，它的遗憾不超过日志日志n w.r.t.一个Oracle，在数据相关（未知）最佳点处发出查询。在3.5M项目的缩放NLP数据集上，Adastrat使用仅18个精确查询的真正精度函数的显着关闭近似，比最佳先前方法少13倍。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2018年|540-1072p|共10页
会议地点
作者
Ashish Sabharwal; Yexiang Xue;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Estimation of rare and clustered population mean using stratified adaptive cluster sampling [J] . Environmental and ecological statistics . 2020,第1期

机译：使用分层自适应集群采样估计稀有和聚类群体的平均值
2. Adapting a global stratified random sample for regional estimation of forest cover change derived from satellite imagery [J] . Stehman S.V., Hansen M.C., Broich M., Remote Sensing of Environment: An Interdisciplinary Journal . 2011,第2期

机译：调整全局分层随机样本，以根据卫星图像得出的森林覆盖变化的区域估计值
3. Calibration approach estimation of the mean in stratified sampling and stratified double sampling [J] . Nidhi, Sisodia B. V. S., Singh Subedar, Communications in Statistics . 2017,第9a10期

机译：分层采样和分层双采样中平均值的校准方法
4. Adaptive Stratified Sampling for Precision-Recall Estimation [C] . Ashish Sabharwal, Yexiang Xue Conference on Uncertainty in Artificial Intelligence . 2018

机译：精密召回估计的自适应分层采样
5. Efficient Adaptive Importance Sampling Estimation of Time Dependent Probability of Failure with Inspections for Damage Tolerant Aircraft Structures [D] . Crosby, Nathan. 2021

机译：高效的自适应重要性采样估算失效的时间依赖性概率与损伤耐损伤飞机结构的检查
6. Finite population distribution function estimation with dual use of auxiliary information under simple and stratified random sampling [O] . Sardar Hussain, Sohaib Ahmad, Mariyam Saleem, 2020

机译：有限人口分布函数估计简单和分层随机抽样下的双重使用辅助信息
7. Estimation of General Parameter Under Stratified Adaptive Cluster Sampling Based on Dual Use of Auxiliary Information [O] . Faryal Younis, Javid Shabbir 2019

机译：基于双重使用辅助信息分层自适应群集采样下的一般参数估计

Adaptive Stratified Sampling for Precision-Recall Estimation

摘要

著录项

相似文献

相关主题

期刊订阅