...
首页> 外文期刊>Computers and the Humanities >Analyzing and identifying multiword expressions in spoken language
【24h】

Analyzing and identifying multiword expressions in spoken language

机译:分析和识别口语中的多词表达

获取原文
获取原文并翻译 | 示例
           

摘要

The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, I.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring "islands of pronunciation reduction" that contain (potential) MWEs can be identified in a large speech corpus.
机译:本文研究口语中的多词表达(MWE),以及在语音语料库中自动识别MWE的可能方法。分析了以前的研究中出现的两个荷兰语中最常见的MWE,以研究其发音特征,并将其与大型语音语料库中其他话语的特征进行比较。分析表明,这些MWE显示出极端的语音变化和减少,即删除了许多音素,甚至音节。针对这两个MWE和语料库中的所有其他话语,计算了几种发音减少的量度。这些措施中有五项是MWE的两倍以上,因此表明有相当大的减少。然后计算发音偏差的一种总体度量,并将其用于自动识别大型语音语料库中的MWE。结果表明,这种整体测量方法或单独出现的频率都不适合识别MWE。通过使用将整体发音减少与加权频率相结合的度量,可以获得最佳结果。这样,可以在大型语音语料库中识别包含(潜在)MWE的重复“发音减少岛”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号