首页> 外文学位 >Representation, classification and information fusion for robust and efficient multimodal human states recognition.
【24h】

Representation, classification and information fusion for robust and efficient multimodal human states recognition.

机译:表示,分类和信息融合,用于强大而有效的多峰态人类识别。

获取原文
获取原文并翻译 | 示例

摘要

The goal of this work is to enhance the robustness and efficiency of the multimodal human states recognition task. Human states recognition can be considered as a joint term for identifying/verifing various kinds of human related states, such as biometric identity, language spoken, age, gender, emotion, intoxication level, physical activity, vocal tract patterns, ECG QT intervals and so on. I performed research on the aforementioned states recognition problems and my focus is to increase the performance while reduce the computational cost.;I start by extending the well known total variability i-vector modeling (a factor analysis on the concatenated GMM mean supervectors) to the simplified supervised i-vector modeling to enhance the robustness and efficiency. First, by concatenating the label vector and the linear classifier matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to the label regularized supervised i-vectors. This supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors, thus can make the supervised i-vectors more discriminative in terms of the label information regularized. Second, I perform the factor analysis (FA) on the pre-normalized GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA which reduce the computational cost by a factor of 25.;Inspired by the recent success of sparse representation on face recognition, I explored the possibility to adopt sparse representation for both representation and classification in this multimodal human sates recognition problem. For classification purpose, a sparse representation computed by l1-minimization (to approximate the l0 minimization) with quadratic constraints was proposed to replace the SVM on the GMM mean supervectors and by fusing the sparse representation based classification (SRC) method with SVM, the overall system performance was improved. Second, by adding a redundant identity matrix at the end of the original over-complete dictionary, the sparse representation is made more robust to variability and noise. Third, both the l1 norm ratio and the background normalized (BNorm) l2 residual ratio are used and shown to outperform the conventional l2 residual ratio in the verification task.;I also present an automatic speaker affective state recognition approach which models the factor vectors in the latent factor analysis framework improving upon the Gaussian Mixture Model (GMM) baseline performance. I consider the affective speech signal as the original normal average speech signal being corrupted by the affective channel effects. Rather than reducing the channel variability to enhance the robustness as in the speaker verification task, I directly model the speaker state on the channel factors under the factor analysis framework. Experimental results show that the proposed speaker state factor vector modeling system achieved unweighted and weighted accuracy improvement over the GMM baseline on the intoxicated speech detection task and the emotion recognition task, respectively.;To summarize the methods for representation, I propose a general optimization framework. The aforementioned methods, such as traditional factor analysis, i-vector, supervised i-vector, simplified i-vector and s-vectors, are all special cases of this general optimization problem. In the future, I plan to investigate other kinds of distance measures, cost functions and constraints in this unified general optimization problem. (Abstract shortened by UMI.).
机译:这项工作的目的是增强多模式人类状态识别任务的鲁棒性和效率。可以将人类状态识别视为识别/验证各种与人类相关状态的联合术语,例如生物特征,语言,年龄,性别,情感,陶醉程度,身体活动,声道模式,ECG QT间隔等。上。我对上述状态识别问题进行了研究,我的重点是在提高性能的同时降低计算成本。;我首先将众所周知的总可变性i-vector建模(对连接的GMM平均超矢量进行因子分析)扩展到简化的监督i-vector建模,以增强鲁棒性和效率。首先,通过将标签向量和线性分类器矩阵分别在平均超向量和i-向量因子加载矩阵的末尾进行级联,将传统i-向量扩展到标签正则化监督i-向量。该有监督的i向量经过优化,不仅可以很好地重建均值超向量,而且还可以最大程度地减少原始标签向量与重建后的标签向量之间的均方误差,从而可以使受监督的i向量在正则化标签信息方面更具区分性。其次,我对预归一化的GMM一阶统计量超向量执行因子分析(FA),以确保FA中每个高斯分量的统计量子向量均被平等对待,从而将计算成本降低了25倍。在稀疏表示在人脸识别方面的最新成功中,我探索了在这种多模式人类状态识别问题中采用稀疏表示进行表示和分类的可能性。出于分类目的,提出了通过二次约束的l1-最小化(逼近l0最小化)计算的稀疏表示,以替换GMM平均超向量上的SVM,并将基于稀疏表示的分类(SRC)方法与SVM融合,系统性能得到改善。其次,通过在原始的超完备字典的末尾添加冗余身份矩阵,使稀疏表示对可变性和噪声更鲁棒。第三,在验证任务中使用了l1范数比率和背景归一化(BNorm)l2残差比率,它们表现出优于常规的l2残差比率。;我还提出了一种自动的说话人情感状态识别方法,该方法可以对潜在因素分析框架改善了高斯混合模型(GMM)的基准性能。我认为情感语音信号是被情感通道效应破坏的原始正常平均语音信号。我不像在说话者验证任务中那样减少通道可变性以增强鲁棒性,而是在因素分析框架下直接根据通道因素对说话者状态进行建模。实验结果表明,所提出的说话人状态因子向量建模系统在醉人语音检测任务和情感识别任务上分别在GMM基线上实现了不加权和加权精度的改进。 。前述方法,例如传统因子分析,i向量,监督i向量,简化i向量和s向量,都是该一般优化问题的特例。将来,我计划研究这个统一的一般优化问题中的其他距离度量,成本函数和约束。 (摘要由UMI缩短。)。

著录项

  • 作者

    Li, Ming.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Engineering Electronics and Electrical.;Information Science.;Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 166 p.
  • 总页数 166
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号