首页> 外文会议>Annual Meeting of the Association for Computational Linguistics >A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging
【24h】

A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging

机译:一个完全贝叶斯术语对无监督的术语标记

获取原文

摘要

Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.
机译:无监督的语言结构学习是一个难题。常用方法是定义生成模型并最大化给定观察到的数据的隐藏结构的概率。通常,这是使用模型参数的最大似然估计(MLE)完成的。我们展示了一部分叫声的标记,即完全贝叶斯方法可以大大提高性能。贝叶斯方法而不是估计单一参数,而不是估计一组参数。这种差异可确保学习的结构在一系列可能参数上具有很高的概率,并且允许使用有利于典型的自然语言的稀疏分布的前沿。我们的模型具有标准的Trigram HMM的结构,但其准确性更接近最先进的歧视模型(史密斯和Eisner,2005),比MLE更好地为14个百分点。我们在单独培训数据时发现改进,并使用标记字典。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号