LipsID Using 3D Convolutional Neural Networks

机译：使用3D卷积神经网络的LipsID

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a proposition for a method inspired by iVectors for improvement of visual speech recognition in the similar way iVectors are used to improve the recognition rate of audio speech recognition. A neural network for feature extraction is presented with training parameters and evaluation. The network is trained as a classifier for a closed set of 64 speakers from the UWB-HSCAVC dataset and then the last softmax fully connected layer is removed to gain a feature vector of size 256. The network is provided with sequences of 15 frames and outputs the softmax classification to 64 classes. The training data consists of approximately 20000 sequences of grayscale images from the first 50 sentences that are common to every speaker. The network is then evaluated on the 60000 sequences created from 150 sentences from each speaker. The testing sentences are different for each speaker.

机译：本文提出了一种由iVectors启发的方法的建议，该方法以与iVectors用于提高音频语音识别的识别率类似的方式来改进视觉语音识别。提出了一种用于特征提取的神经网络，其中包含训练参数和评估。该网络被训练为来自UWB-HSCAVC数据集的一组封闭的64个扬声器的分类器，然后移除最后一个softmax全连接层以获得大小为256的特征向量。该网络具有15帧的序列和输出softmax分类为64个类别。训练数据由每个演讲者共有的前50个句子中的大约20000个灰度图像序列组成。然后根据从每个说话者的150个句子创建的60000个序列对网络进行评估。每个说话者的测试句子都不同。

著录项

来源
《International Conference on speech and computer》|2018年|209-214|共6页
会议地点
作者
Miroslav Hlavac; Ivan Gruber; Milos Zelezny; Alexey Karpov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visual speech; Neural network; 3D convolution Deep features;

机译：视觉演讲;神经网络; 3D卷积深度功能;

相似文献

外文文献
中文文献
专利

1. Multi-person 3D pose estimation from 3D cloud data using 3D convolutional neural networks [J] . Vasileiadis Manolis, Bouganis Christos-Savvas, Tzovaras Dimitrios Computer vision and image understanding . 2019,第AUGa期

机译：使用3D卷积神经网络从3D云数据进行多人3D姿势估计
2. Multi-person 3D pose estimation from 3D cloud data using 3D convolutional neural networks [J] . Vasileiadis Manolis, Bouganis Christos-Savvas, Tzovaras Dimitrios Computer vision and image understanding . 2019,第Auga期

机译：使用3D卷积神经网络3D云数据的多人3D姿态估计
3. RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks [J] . Jun Li, Wei Zhu, Jun Wang, PLoS Computational Biology . 2018,第11期

机译：RNA3DCNN：使用3D深度卷积神经网络对RNA 3D结构进行局部和全局质量评估
4. LipsID Using 3D Convolutional Neural Networks [C] . Miroslav Hlavac, Ivan Gruber, Milos Zelezny, International Conference on Speech and Computer . 2018

机译：使用3D卷积神经网络润泽
5. High-Dimensional Convolutional Neural Networks for 3D Perception [D] . Choy, Christopher B. 2020

机译：用于3D感知的高维卷积神经网络
6. 3D Convolutional Neural Networks Initialized from Pretrained 2D Convolutional Neural Networks for Classification of Industrial Parts [O] . Ibon Merino, Jon Azpiazu, Anthony Remazeilles, 2021

机译：3D卷积神经网络从佩带的2D卷积神经网络初始化用于工业部件的分类
7. RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks [O] . Jun Li, Wei Zhu, Jun Wang, 2018

机译：RNA3DCNN：使用3D深度卷积神经网络的RNA 3D结构的本地和全球质量评估

LipsID Using 3D Convolutional Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅