首页> 外文会议>International Conference on speech and computer >LipsID Using 3D Convolutional Neural Networks
【24h】

LipsID Using 3D Convolutional Neural Networks

机译:使用3D卷积神经网络的LipsID

获取原文

摘要

This paper presents a proposition for a method inspired by iVectors for improvement of visual speech recognition in the similar way iVectors are used to improve the recognition rate of audio speech recognition. A neural network for feature extraction is presented with training parameters and evaluation. The network is trained as a classifier for a closed set of 64 speakers from the UWB-HSCAVC dataset and then the last softmax fully connected layer is removed to gain a feature vector of size 256. The network is provided with sequences of 15 frames and outputs the softmax classification to 64 classes. The training data consists of approximately 20000 sequences of grayscale images from the first 50 sentences that are common to every speaker. The network is then evaluated on the 60000 sequences created from 150 sentences from each speaker. The testing sentences are different for each speaker.
机译:本文提出了一种由iVectors启发的方法的建议,该方法以与iVectors用于提高音频语音识别的识别率类似的方式来改进视觉语音识别。提出了一种用于特征提取的神经网络,其中包含训练参数和评估。该网络被训练为来自UWB-HSCAVC数据集的一组封闭的64个扬声器的分类器,然后移除最后一个softmax全连接层以获得大小为256的特征向量。该网络具有15帧的序列和输出softmax分类为64个类别。训练数据由每个演讲者共有的前50个句子中的大约20000个灰度图像序列组成。然后根据从每个说话者的150个句子创建的60000个序列对网络进行评估。每个说话者的测试句子都不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号