Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

机译：早期收益很重要：在区分性众包模型上优先选择生成性案例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In modern practice, labeling a dataset often involves aggregating annotator judgments obtained from crowdsourcing. State-of-the-art aggregation is performed via inference on probabilistic models, some of which are data-aware, meaning that they leverage features of the data (e.g., words in a document) in addition to annotator judgments. Previous work largely prefers discriminatively trained conditional models. This paper demonstrates that a data-aware crowdsourcing model incorporating a generative multinomial data model enjoys a strong competitive advantage over its discriminative log-linear counterpart in the typical crowdsourcing setting. That is, the generative approach is better except when the annotators are highly accurate in which case simple majority vote is often sufficient. Additionally, we present a novel mean-field vari-ational inference algorithm for the generative model that significantly improves on the previously reported state-of-the-art for that model. We validate our conclusions on six text classification datasets with both human-generated and synthetic annotations.

机译：在现代实践中，标记数据集通常涉及汇总从众包获得的注释者判断。最新的汇总是通过对概率模型的推断来执行的，其中一些模型是数据感知的，这意味着除了注释者的判断外，它们还利用了数据的特征（例如文档中的单词）。先前的工作在很大程度上倾向于判别训练有素的条件模型。本文证明，在典型的众包环境中，包含生成多项式数据模型的数据感知众包模型比其判别线性对数模型具有强大的竞争优势。也就是说，生成方法更好，除非注释者非常准确，在这种情况下，简单的多数表决就足够了。此外，我们为生成模型提出了一种新颖的平均场变分推理算法，该算法大大改进了先前报道的该模型的最新技术水平。我们在六个具有人类生成的注释和合成注释的文本分类数据集上验证了我们的结论。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2015年|882-891|共10页
会议地点
作者
Paul Felt; Eric Ringger; Kevin Seppi; Kevin Black; Robbie Haertel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Discriminative learning of generative models: large margin multinomial mixture models for document classification [J] . Jiang Hui, Pan Zhenyu, Hu Pingzhao Pattern Analysis and Applications . 2015,第3期

机译：生成模型的判别学习：用于文档分类的大幅度多项式混合模型
2. A new hybrid discriminative/generative model using the full-covariance multivariate generalized Gaussian mixture models [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第14期

机译：一种新的混合判别/生成模型，使用全协方差多变量通用高斯混合模型
3. Set2Model networks: Learning discriminatively to learn generative models [J] . Alexander Vakhitov, Andrey Kuzmin, Victor Lempitsky Computer vision and image understanding . 2018,第AUGa期

机译：Set2Model网络：区别学习以学习生成模型
4. Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models [C] . Paul Felt, Eric Ringger, Kevin Seppi, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2015

机译：早期获益事项：以鉴别众包模型更偏好生成的案例
5. Towards Socially Interactive Agents: Learning Generative Models of Social Interactions Via Crowdsourcing [D] . Feng, Dan. 2020

机译：对社会互动代理商：通过众包学习社会互动的生成模式
6. Generative vs. Discriminative Recognition Models for Off-Line Arabic Handwriting [O] . Moftah Elzobi, Ayoub Al-Hamadi 2018

机译：离线阿拉伯手写体的生成识别模型与判别识别模型
7. We develop a model of scientific creativity and test it in the field of rare diseases. Our model is based on the results of an in-depth case study of the Rett syndrome. Archival analysis, bibliometric techniques and expert surveys are combined with network analysis to identify the most creative scientists. First, alternative measures of generative and combinatorial creativity are compared. Then, we generalize our results and present a stochastic model of socio-semantic network evolution. The model predictions are tested with multiple networks of rare disease specialties. We find that new scientific collaborations among experts in a field enhance combinatorial creativity. Instead, high entry rates of novices are negatively related to generative creativity. By extending the set of useful concepts, creative scientists gain in centrality. At the same time, by increasing their centrality in the scientific community, scientists can replicate and generalize their results, thus contributing to a scientific paradigm. [O] . Massimo Riccaboni, Maria Laura Frigotto 100

机译：我们开发了一种科学创造力模型，并在稀有疾病领域进行测试。我们的模型基于对Rett综合征的深入案例研究的结果。档案分析，文献计量技术和专家调查与网络分析相结合，以确定最具创造力的科学家。首先，比较生成和组合创造力的替代措施。然后，我们推广了我们的结果，并提出了社会语义网络演化的随机模型。模型预测用多个罕见疾病专业网络进行测试。我们发现，一个领域的专家之间的新的科学合作增强了组合创造力。相反，新手的高入门率与生成创造力负相关。通过扩展这组有用的概念，创造性的科学家获得了中心地位。同时，通过增加科学界的中心地位，科学家们可以复制和推广他们的结果，从而促进科学范式的发展。

Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models

摘要

著录项

相似文献

相关主题

期刊订阅