首页> 外文会议>International conference on language resources and evaluation >Comparing performance of Different Set-Covering Strategies for Linguistic Content Optimization in Speech Corpora
【24h】

Comparing performance of Different Set-Covering Strategies for Linguistic Content Optimization in Speech Corpora

机译:语言基层语言内容优化不同集涵盖策略的比较

获取原文

摘要

Set covering algorithms are efficient tools for solving an optimal linguistic corpus reduction. The optimality of such a process is directly related to the descriptive features of the sentences of a reference corpus. This article suggests to verify experimentally the behaviour of three algorithms, a greedy approach and a lagrangian relaxation based one giving importance to rare events and a third one considering the Kullback-Liebler divergence between a reference and the ongoing distribution of events. The analysis of the content of the reduced corpora shows that the both first approaches stay the most effective to compress a corpus while guaranteeing a minimal content. The variant which minimises the Kullback-Liebler divergence guarantees a distribution of events close to a reference distribution as expected; however, the price for this solution is a much more important corpus. In the proposed experiments, we have also evaluated a mixed-approach considering a random complement to the smallest coverings.
机译:设置覆盖算法是解决最佳语言语料库的有效工具。这种过程的最优性与参考语料库的句子的描述性特征直接相关。本文建议通过实验验证三种算法,一种贪婪的方法和拉格朗日放松的行为,以重视罕见事件和第三个考虑参考和正在进行的事件分布之间的kullback-Leebler发散。对减少的语料库内容的分析表明,第一种方法在保证最小的内容时,第一种方法都保持压缩语料库。最小化Kullback-Leebler发散的变体保证了根据所预期的接近参考分布的事件的分布;但是,这个解决方案的价格是一个更重要的语料库。在拟议的实验中,我们还考虑了考虑到最小覆盖物的随机补充的混合方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号