首页> 外国专利> VOICE ACTIVITY DETECTION METHOD BASED ON STATISTICAL MODEL EMPLOYING DEEP NEURAL NETWORK AND VOICE ACTIVITY DETECTION DEVICE PERFORMING THE SAME

VOICE ACTIVITY DETECTION METHOD BASED ON STATISTICAL MODEL EMPLOYING DEEP NEURAL NETWORK AND VOICE ACTIVITY DETECTION DEVICE PERFORMING THE SAME

机译：基于统计模型的深层神经网络语音检测方法及性能相同的语音检测装置

页面导航

摘要
著录项
相似文献

摘要

The present invention relates to a statistical model-based speech detection method using an enhanced neural network, and a speech detection apparatus for performing the same. More specifically, the speech detection apparatus includes: (1) (A priori) signal-to-noise ratio (SNR), a posteriori signal-to-noise ratio (SNR), and a likelihood ratio (LR) based on the variance value of the input speech signal. Extracting a vector; (2) in the learning step, using the result of the extracted feature vector, initializing a weight and a bias of a deepening neural network having a plurality of nonlinear hidden layers to pre-learn the deepening neural network; (3) optimizing the deepening neural network based on a slope descent based inversion algorithm, using the result of the extracted feature vector and the labeling value for presence / absence of speech in the learning step; And (4) classifying the input speech signal into a speech section or a noise section based on a determination function using a result obtained through the learned deepening neural network from a feature vector obtained through the feature vector extraction method in a classification step The present invention is not limited to these embodiments. According to the statistical model-based speech detection method and the speech detection apparatus for performing the same, which are proposed in the present invention, the speech signal contaminated by the ambient noise is input in the learning step, and the variance value Extracts a feature vector using a priori signal-to-noise ratio (SNR), a posteriori signal-to-noise ratio (SNR), and likelihood ratio (LR) Using the result, initialization of weights and weights of the deepening neural network having a plurality of nonlinear hidden layers to pre-learn the deepening neural network, and using the result of the extracted feature vector and the labeling value of existence / absence of voice, Based on an inverse gradient-based inverse algorithm, optimizes the deepening neural network, and in a classification step, Classifying the input speech signal into a speech interval or a noise interval based on a determination function using the result obtained through the learned deepening neural network from the feature vector, It is possible to more effectively model the distribution of the likelihood ratio with respect to the case of not doing so, improve the voice detection performance, and reduce the calculation time.

机译：本发明涉及使用增强型神经网络的基于统计模型的语音检测方法，以及用于执行该方法的语音检测设备。更具体地，语音检测装置包括：（1）基于方差值的（先验）信噪比（SNR），后验信噪比（SNR）和似然比（LR）。输入语音信号。提取载体; （2）在学习步骤中，利用提取的特征向量的结果，初始化具有多个非线性隐藏层的加深神经网络的权重和偏差，以预先学习加深神经网络; （3）在学习步骤中使用提取的特征向量的结果和语音存在/不存在的标记值，基于基于斜率下降的反演算法优化加深神经网络;并且（4）在分类步骤中，使用通过学习的加深神经网络从通过特征向量提取方法获得的特征向量获得的结果，基于确定函数将输入语音信号分为语音部分或噪声部分。不限于这些实施例。根据本发明提出的基于统计模型的语音检测方法和执行该方法的语音检测设备，在学习步骤中输入被环境噪声污染的语音信号，并且方差值提取特征。使用先验信噪比（SNR），后验信噪比（SNR）和似然比（LR）的向量。使用该结果，初始化权重和具有多个非线性隐藏层，以预先学习加深的神经网络，并使用提取的特征向量的结果和语音存在/不存在的标记值，基于基于逆梯度的逆算法，对加深的神经网络进行优化，并在分类步骤，使用通过学习的加深神经网络获得的结果，基于确定函数将输入语音信号分类为语音间隔或噪声间隔根据特征向量的网络，可以相对于不这样做的情况更有效地模拟似然比的分布，提高语音检测性能，并减少计算时间。

著录项

公开/公告号KR101640188B1

专利类型
公开/公告日2016-07-15

原文格式PDF
申请/专利权人 서울대학교산학협력단;한양대학교 산학협력단;
展开▼

申请/专利号KR20140182736
发明设计人 장준혁;황인영;김남수;
展开▼

申请日2014-12-17
分类号G10L25/78;G10L25/30;
国家 KR
入库时间 2022-08-21 14:12:14

相似文献

专利
外文文献
中文文献