首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >A Generalized Framework for Hierarchical Word Sequence Language Model
【24h】

A Generalized Framework for Hierarchical Word Sequence Language Model

机译:分层单词序列语言模型的广义框架

获取原文

摘要

Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences, most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-gram models is not robust enough to cope with the data sparseness problem, which affects the final performance of language models. At the point, Hierarchical Word Sequence (abbreviated as HWS) language models can be viewed as an effective alternative to normal n-gram method. In this paper, we generalize HWS models into a framework, where different assumptions can be adopted to rearrange word sequences in a totally unsupervised fashion, which greatly increases the expandability of HWS models. For evaluation, we compare our rearranged word sequences to conventional n-gram word sequences. Both intrinsic and extrinsic experiments verify that our framework can achieve better performance, proving that our method can be considered as a better alternative for n-gram language models.
机译:语言建模是一个基本的研究问题,适用于许多NLP任务。为了估算自然语言句子的概率,大多数关于语言建模的研究都使用基于N-GRAM的因子句子概率。然而,N-GRAM模型下的假设足以应对数据稀疏问题,这影响了语言模型的最终性能。在此时,可以将分层单词序列(缩写为HW)语言模型被视为正常的n-gram方法的有效替代方案。在本文中,我们将HWS模型概括为框架,其中可以采用不同的假设以完全无监视的方式重新排列单词序列,这大大提高了HWS模型的可扩展性。为了评估,我们将重新排列的单词序列与传统的n克字序列进行比较。本质和外在的实验都验证了我们的框架可以实现更好的性能,证明我们的方法可以被认为是N-GRAM语言模型的更好的替代方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号