...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space
【24h】

MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space

机译:使用从重构相空间提取的似然特征的基于MLP的孤立音素分类

获取原文
获取原文并翻译 | 示例
           

摘要

Nonlinear properties of a complex signal can be represented in reconstructed phase space (RPS). Previously, researchers have developed RPS-based feature extraction approaches to capture nonlinear properties. Typically, these approaches are more computationally demanding - higher run-time - and less accurate than traditional techniques such as Mel-frequency cepstral coefficients (MFCCs) that fail to capture nonlinear properties of signals. To overcome these issues, we propose a new RPS-based feature extraction approach that is based on a previously reported approach. The proposed approach calculates the similarities between the embedded speech signals and a set of predefined speech attractor models in the RPS, and uses the similarities as a set of proper input features for a final phonetic classifier. A set of Gaussian mixture models (GMMs) is trained to represent the variety of all phoneme attractors in the RPS. Using the developed GMMs, for each embedded out-sample speech signal, a feature vector is calculated that consists of the Log-likelihoods. Then, an MLP-based classifier is used to estimate posterior probabilities for the phoneme classes. To test the performance of the proposed approach, we apply the approach to a Persian speech corpus (i.e., FARSDAT). Results show 1.89% absolute classification accuracy improvement in comparison to performance of a baseline system that exploits MFCC features. Combining different classifiers that use the proposed RPS-based features and MFCC features, the classifier gain the highest accuracy of 68.85% phoneme classification rate, with absolute accuracy improvements of 4.78% against a baseline system.
机译:复信号的非线性特性可以在重构相空间(RPS)中表示。以前,研究人员已经开发了基于RPS的特征提取方法来捕获非线性特性。通常,与无法捕获信号非线性特性的传统技术(例如梅尔频率倒谱系数(MFCC))相比,这些方法对计算的要求更高-运行时间更长,并且准确性更低。为了克服这些问题,我们提出了一种新的基于RPS的特征提取方法,该方法基于以前报告的方法。所提出的方法计算RPS中嵌入的语音信号和一组预定义的语音吸引器模型之间的相似度,并将这些相似度用作最终语音分类器的一组适当的输入特征。训练了一组高斯混合模型(GMM),以表示RPS中所有音素吸引子的种类。使用开发的GMM,对于每个嵌入的样本外语音信号,将计算一个包含对数似然的特征向量。然后,基于MLP的分类器用于估计音素类别的后验概率。为了测试所提出方法的性能,我们将该方法应用于波斯语语料库(即FARSDAT)。结果显示,与利用MFCC功能的基准系统的性能相比,绝对分类准确性提高了1.89%。结合使用建议的基于RPS的功能和MFCC功能的不同分类器,该分类器可获得68.85%的音素分类率的最高准确性,相对于基准系统,绝对准确性提高了4.78%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号