首页> 外文会议>INTERSPEECH 2012 >Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction
【24h】

Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction

机译:联合句边界和标点测预测的动态条件随机字段

获取原文

摘要

The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (LCRF) for punctuation prediction on conversational speech texts [1]. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sentence boundary and punctuation prediction task on TDT3 English broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using LCRF or maximum entropy model (MaxEnt). We show the importance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introducing lexical pruning, which reduces model size and improves the Fl-measure. We adopt incremental local training to overcome memory size limitation without incurring significant performance penalty. Our results show that adding prosodic and n-gram score features gives about 20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.
机译:已经显示了使用动态条件随机字段(DCRF)以对会话语音文本进行标点符号预测的线性链条条件随机字段(LCRF)占此垂直的线性链条条件随机字段(LCRF)[1]。在本文中,我们将词汇,韵律和修改的n-gram分数特征与TDT3英语广播新闻中的联合句边界和标点符号预测任务结合到DCRF框架中。我们表明,联合预测方法使用LCRF或最大熵模型(MAXENT)优于传统的两级方法。我们分别展示了各种功能的重要性,分别使用DCRF,LCRF,MaxEnt和隐藏事件N-GRAM模型(HEN)的重要性。此外,我们通过引入词汇修剪来解决特色爆炸的实际问题,这减少了模型尺寸并改善了FL措施。我们采用增量本地培训来克服内存大小限制,而不会产生重大的表现惩罚。我们的研究结果表明,添加韵律和N-GRAM分数特征在所有情况下都具有约20%的相对误差减少。总的来说,DCRF给出了最佳准确性,其次是LCRF,MaxEnt和Hen。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号