首页> 外文会议>9th International conference on language resources and evaluation >Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks
【24h】

Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks

机译:使用所选数据增强TED-LIUM语料库以进行语言建模和更多TED演讲

获取原文

摘要

In this paper, we present improvements made to the TED-LIUM corpus we released in 2012. These enhancements fall into two categories. First, we describe how we filtered publicly available monolingual data and used it to estimate well-suited language models (LMs), using open-source tools. Then, we describe the process of selection we applied to new acoustic data from TED talks, providing additions to our previously released corpus. Finally, we report some experiments we made around these improvements.
机译:在本文中,我们介绍了对我们在2012年发布的TED-LIUM语料库所做的改进。这些改进分为两类。首先,我们描述如何使用开源工具过滤公开可用的单语数据,并使用它来估计合适的语言模型(LM)。然后,我们描述了选择过程,该过程适用于TED演讲中的新声学数据,为我们先前发布的语料库提供了补充。最后,我们报告了围绕这些改进所做的一些实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号