首页> 外文会议>International Conference on Electronics and Nanotechnology >Deep Learning Approaches for Understanding Simple Speech Commands
【24h】

Deep Learning Approaches for Understanding Simple Speech Commands

机译:理解简单语音命令的深度学习方法

获取原文

摘要

Automatic classification of sound commands is becoming increasingly important, especially for embedded and mobile devices. Many of these devices contain both microphones and cameras. The manufacturers that develop and produce them would like to use the same methodology for sound and image classification tasks. It's possible to achieve by representing sound commands as images, and then use convolutional neural networks when classifying images as well as sounds. In this research, we tried several approaches to the problem of sound classification that we applied in TensorFlow Speech Recognition Challenge organized by Google Brain team on the Kaggle platform. Here we show different representations of sounds (Wave frames, Spectrograms, Mel-Spectrograms, MFCCs) and apply several 1D and 2D convolutional neural networks to get the best performance. As a novelty of our work, we developed and trained from scratch two 1d network architectures that are topologically similar to 2d VGG and ResNet network types. These networks show similar performance with 2d networks when sound signal is represented by using melgrams. Our experiments reveal that we found appropriate sound representation and corresponding convolutional neural networks. As a result, we achieved good classification accuracy (91.8%) that allowed us to finish the challenge on 8-th place among 1315 teams.
机译:声音命令的自动分类变得越来越重要,尤其是对于嵌入式和移动设备。这些设备中许多都包含麦克风和摄像头。开发和生产它们的制造商希望对声音和图像分类任务使用相同的方法。可以通过将声音命令表示为图像来实现,然后在对图像和声音进行分类时使用卷积神经网络来实现。在这项研究中,我们尝试了几种方法来解决声音分类问题,这些方法已在Google Brain团队在Kaggle平台上组织的TensorFlow语音识别挑战赛中应用。在这里,我们显示声音的不同表示形式(波形帧,频谱图,梅尔频谱图,MFCC),并应用几个1D和2D卷积神经网络以获得最佳性能。作为我们工作的新颖性,我们从头开始开发和培训了两种1d网络体系结构,它们在拓扑结构上类似于2d VGG和ResNet网络类型。当使用melgram表示声音信号时,这些网络与2d网络表现出相似的性能。我们的实验表明,我们找到了合适的声音表示形式和相应的卷积神经网络。结果,我们达到了良好的分类准确度(91.8%),使我们能够在1315个团队中排名第8位。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号