首页> 外文会议> >Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

【24h】

Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

机译：大海捞针：减少注释不平衡数据集中的稀有实例的成本

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Crowdsourced data annotation is noisier than annotation from trained workers. Previous work has shown that redundant annotations can eliminate the agreement gap between crowdsource workers and trained workers. Redundant annotation is usually non-problematic because individual crowdsource judgments are inconsequentially cheap in a class-balanced dataset. However, redundant annotation on class-imbalanced datasets requires many more labels per instance. In this paper, using three class-imbalanced corpora, we show that annotation redundancy for noise reduction is very expensive on a class-imbalanced dataset, and should be discarded for instances receiving a single common-class label. We also show that this simple technique produces annotations at approximately the same cost of a metadata-trained, supervised cascading machine classifier, or about 70% cheaper than 5-vote majority-vote aggregation.

机译：众包数据注释比受过训练的工人注释更嘈杂。先前的工作表明，多余的注释可以消除众包工作者与训练有素的工作者之间的共识鸿沟。冗余注释通常是没有问题的，因为在类平衡的数据集中，单独的众包判断不那么便宜。但是，类不平衡数据集上的冗余注释每个实例需要更多标签。在本文中，使用三个类不平衡语料，我们证明了用于减少噪声的注释冗余在类不平衡数据集上非常昂贵，对于接收单个公共类标签的实例应将其丢弃。我们还表明，这种简单的技术可以以与元数据训练的监督级联机器分类器大致相同的成本生成注释，或者比5票多数投票聚集便宜约70％。

著录项

来源
《》|2015年|244-253|共10页
会议地点
作者
Emily K. Jamison; Iryna Gurevych;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Finding a Needle in a QT Interval Big Data Haystack The Role for Orthogonal Datasets [J] . Roden Dan M., Mosley Jonathan D., Denny Joshua C. Journal of the American College of Cardiology . 2016,第16期

机译：在QT间隔大数据干草堆中寻找针头正交数据集的作用
2. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection [J] . Tsai Chih-Fong, Lin Wei-Chao, Hu Ya-Han, Information Sciences: An International Journal . 2019,第期

机译：通过组合群集分析和实例选择，通过在采样类上采样的数据集
3. An Improved Instance Based K-Nearest Neighbor (IIBK) Classification of Imbalanced Datasets with Enhanced Preprocessing [J] . Yun Cao Advances in applied computational mechanics . 2018,第2期

机译：基于实例的基于实例的基于基于邻居（IIBK）分类，具有增强的预处理的不平衡数据集
4. Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets [C] . Emily K. Jamison, Iryna Gurevych Pacific Asia Conference on Language, Information and Computation . 2015

机译：在大海捞针中的针：降低在不平衡数据集中注释稀有类实例的成本
5. Searching for Needles in the Cosmic Haystack [D] . Devine, Thomas Ryan. 2020

机译：在宇宙干草堆中寻找针
6. Finding a Needle in the Haystack: The Costs and Cost-Effectiveness of Syphilis Diagnosis and Treatment during Pregnancy to Prevent Congenital Syphilis in Kalomo District of Zambia [O] . Bruce A. Larson, Deophine Lembela-Bwalya, Rachael Bonawitz, -1

机译：在干草堆中寻找针头：赞比亚卡洛莫区预防梅毒先天性梅毒的诊断和治疗的成本和成本效果
7. Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets [O] . Jamison Emily K., Gurevych Iryna 2014

机译：大海捞针：减少注释不平衡数据集中的稀有实例的成本

Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

摘要

著录项

相似文献

相关主题

期刊订阅