首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models
【24h】

Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

机译:早期收益很重要:在区分性众包模型上优先选择生成性案例

获取原文

摘要

In modern practice, labeling a dataset often involves aggregating annotator judgments obtained from crowdsourcing. State-of-the-art aggregation is performed via inference on probabilistic models, some of which are data-aware, meaning that they leverage features of the data (e.g., words in a document) in addition to annotator judgments. Previous work largely prefers discriminatively trained conditional models. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting. That is, the generative approach is better except when the annotators are highly accurate in which case simple majority vote is often sufficient. Additionally, we present a novel mean-field vari-ational inference algorithm for the generative model that significantly improves on the previously reported state-of-the-art for that model. We validate our conclusions on six text classification datasets with both human-generated and synthetic annotations.
机译:在现代实践中,标记数据集通常涉及汇总从众包获得的注释者判断。最新的汇总是通过对概率模型的推断来执行的,其中一些模型是数据感知的,这意味着除了注释者的判断外,它们还利用了数据的特征(例如文档中的单词)。先前的工作在很大程度上倾向于判别训练有素的条件模型。本文证明,在典型的众包环境中,包含生成多项式数据模型的数据感知众包模型比其判别线性对数模型具有强大的竞争优势。也就是说,生成方法更好,除非注释者非常准确,在这种情况下,简单的多数表决就足够了。此外,我们为生成模型提出了一种新颖的平均场变分推理算法,该算法大大改进了先前报道的该模型的最新技术水平。我们在六个具有人类生成的注释和合成注释的文本分类数据集上验证了我们的结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号