A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging

机译：一个完全贝叶斯术语对无监督的术语标记

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.

机译：无监督的语言结构学习是一个难题。常用方法是定义生成模型并最大化给定观察到的数据的隐藏结构的概率。通常，这是使用模型参数的最大似然估计（MLE）完成的。我们展示了一部分叫声的标记，即完全贝叶斯方法可以大大提高性能。贝叶斯方法而不是估计单一参数，而不是估计一组参数。这种差异可确保学习的结构在一系列可能参数上具有很高的概率，并且允许使用有利于典型的自然语言的稀疏分布的前沿。我们的模型具有标准的Trigram HMM的结构，但其准确性更接近最先进的歧视模型（史密斯和Eisner，2005），比MLE更好地为14个百分点。我们在单独培训数据时发现改进，并使用标记字典。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics》|2007年||共8页
会议地点
作者
Sharon Goldwater; Thomas L. Griffiths;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction [J] . LIDAN ZHANG, KWOP-PING CHAN ACM transactions on Asian language information processing . 2012,第3期

机译：完全无监督的汉语词性归纳的自适应贝叶斯HMM
2. Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies [J] . Xia Cui, Frans Coenen, Danushka Bollegala JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：数据不平衡对词性标记和数据透视选择策略的无监督域适应的影响
3. Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches [J] . Barzilay R., Eisenstein J., Naseem T., The Journal of Artificial Intelligence Research . 2009,第5期

机译：多语言词性标记：两种无监督方法
4. A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging [C] . Sharon Goldwater, Thomas L. Griffiths Association for Computational Linguistics Annual Meeting; 20070623-30; Prague(CZ) . 2007

机译：完全贝叶斯方法进行无监督词性标注
5. IITagger: Tagging Wall Street Journal text with part-of-speech information [D] . Kim, Yeongkwun 1996

机译：IITagger：使用词性信息标记“华尔街日报”文本
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages [O] . Scherrer Yves, Sagot Benoît 2014

机译：一种语言无关且完全不受监督的方法，用于对紧密相关的语言进行词汇归纳和词性标记

A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging

摘要

著录项

相似文献

相关主题

期刊订阅