首页> 外文期刊>Expert Systems >Cascade convolutional neural network-long short-term memory recurrent neural networks for automatic tonal and nontonal preclassification-based Indian language identification
【24h】

Cascade convolutional neural network-long short-term memory recurrent neural networks for automatic tonal and nontonal preclassification-based Indian language identification

机译:级联卷积神经网络长短期内存经常性神经网络,用于自动色调和非统计学预分配的印度语言识别

获取原文
获取原文并翻译 | 示例
           

摘要

This work presents an automatic tonalontonal preclassification-based Indian language identification (LID) system. Languages are firstly classified into tonal and nontonal categories, and then, individual languages are identified from the languages of the respective categories. This work proposes the use of pitch Chroma and formant features for this task, and also investigates how Mel-frequency Cepstral Coefficients (MFCCs) complement these features. It further explores block processing (BP), pitch synchronous analysis (PSA)- and glottal closure regions (GCRs)-based approaches for feature extraction, using syllables as basic units. Cascade convolutional neural network (CNN)-long short-term memory (LSTM) model using syllable-level features has been developed. National Institute of Technology Silchar language database (NITS-LD) and OGI-Multilingual Telephone Speech Corpus (OGI-MLTS) have been used for experimental validation. The proposed system based on the score combination of Cascade CNN-LSTM models of Chroma (extracted from BP method), first two formants and MFCCs (both extracted from GCR method) reports the highest accuracies. In the preclassification stage, the observed accuracies are 91%, 87.3%, and 85.1% for NITS-LD, for 30 s, 10 s, and 3 s test data respectively. For OGI-MLTS database, the respective accuracies are 86.7%, 83.1%, and 80.6%. That amounts to absolute improvements of 11.6%, 12.3%, and 13.9% for NITS-LD, and 12.5%, 11.9%, and 12.6% for OGI-MLTS database with respect to that of the baseline system. The proposed preclassification-based LID system shows improvements of 7.3%, 6.4%, and 7.4% for NITS-LD and 6.1%, 6.7%, and 7.2% for OGI-MLTS database over the baseline system for the three respective test data conditions.
机译:这项工作介绍了一种自动色调/非晶​​预分配的印度语言识别(盖子)系统。语言首先被分类为色调和非州类别,然后,单个语言是从各个类别的语言中识别的。这项工作提出了对该任务的音高色度和中原特征的使用,并研究了敏料谱系谱系数(MFCCS)如何补充这些功能。进一步探讨了使用音节作为基本单元的基于特征提取的基于特征提取的块处理(BP),俯仰同步分析(PSA)和最小的闭合区域(GCR)的方法。已经开发了使用音节级别特征的级联卷积神经网络(CNN)-Long短期内存(LSTM)模型。美国国家技术研究所Silchar语言数据库(NITS-LD)和OGI多语言电话语音语料库(OGI-MLTS)已被用于实验验证。基于Cromade CNN-LSTM模型的谱(从BP方法中提取的CNN-LSTM模型的得分组合,前两种塑料和MFCC(两者从GCR方法中提取)报告了最高的精度。在预分散阶段,分别观察到的精度为91%,87.3%和85.1%,分别为30 s,10 s和3 s测试数据。对于Ogi-MLTS数据库,各自的精度为86.7%,83.1%和80.6%。对于NITS-LD的绝对改善量为11.6%,12.3%和13.9%,对于基线系统的ogi-MLTS数据库的12.5%,11.9%和12.6%。基于预读数的基于Preclasification的盖系统显示出在三个相应的测试数据条件下的基线系统中的终点和6.3%,6.4%和7.4%,6.1%,6.7%和7.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号