首页> 外文期刊>Personal and Ubiquitous Computing >Supervised machine learning for audio emotion recognition Enhancing film sound design using audio features, regression models and artificial neural networks
【24h】

Supervised machine learning for audio emotion recognition Enhancing film sound design using audio features, regression models and artificial neural networks

机译:监督机器学习音频情感识别使用音频特征,回归模型和人工神经网络增强电影声音设计

获取原文
获取原文并翻译 | 示例
           

摘要

The field of Music Emotion Recognition has become and established research sub-domain of Music Information Retrieval. Less attention has been directed towards the counterpart domain of Audio Emotion Recognition, which focuses upon detection of emotional stimuli resulting from non-musical sound. By better understanding how sounds provoke emotional responses in an audience, it may be possible to enhance the work of sound designers. The work in this paper uses the International Affective Digital Sounds set. A total of 76 features are extracted from the sounds, spanning the time and frequency domains. The features are then subjected to an initial analysis to determine what level of similarity exists between pairs of features measured using Pearson's r correlation coefficient before being used as inputs to a multiple regression model to determine their weighting and relative importance. The features are then used as the input to two machine learning approaches: regression modelling and artificial neural networks in order to determine their ability to predict the emotional dimensions of arousal and valence. It was found that a small number of strong correlations exist between the features and that a greater number of features contribute significantly to the predictive power of emotional valence, rather than arousal. Shallow neural networks perform significantly better than a range of regression models and the best performing networks were able to account for 64.4% of the variance in prediction of arousal and 65.4% in the case of valence. These findings are a major improvement over those encountered in the literature. Several extensions of this research are discussed, including work related to improving data sets as well as the modelling processes.
机译:音乐情感认同领域已成为和建立了音乐信息检索的研究子域。对音频情感识别的对应领域的关注较少,这侧重于检测非音乐声音引起的情绪刺激。通过更好地了解听众中的声音如何引发情绪反应,可能有可能提升声音设计师的工作。本文的工作采用国际情感数字声音集。从声音中提取总共76个功能,跨越时间和频率域。然后对特征进行初步分析,以确定使用Pearson R相关系数测量的特征对之间存在的相似性级别,并且在用作多元回归模型中以确定其加权和相对重要性。然后将该特征用作两种机器学习方法的输入:回归建模和人工神经网络,以确定它们预测唤醒和价的情绪方面的能力。结果发现,在特征之间存在少量的强烈相关性,并且更多的功能与情绪化价的预测力量显着贡献,而不是唤醒。浅神经网络的表现明显优于一系列回归模型,最佳性能网络能够考虑唤醒预测的64.4%,在价值的情况下为65.4%。这些调查结果是对文献中遇到的结果的重大改进。讨论了该研究的几个扩展,包括与改进数据集以及建模过程相关的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号