...
首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier
【24h】

A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier

机译:一种基于多焦点组合网络分类器的多模式音乐情感分类方法

获取原文
           

摘要

Aiming at the shortcomings of single network classification model, this paper applies CNN-LSTM (convolutional neural networks-long short-term memory) combined network in the field of music emotion classification and proposes a multifeature combined network classifier based on CNN-LSTM which combines 2D (two-dimensional) feature input through CNN-LSTM and 1D (single-dimensional) feature input through DNN (deep neural networks) to make up for the deficiencies of original single feature models. The model uses multiple convolution kernels in CNN for 2D feature extraction, BiLSTM (bidirectional LSTM) for serialization processing and is used, respectively, for audio and lyrics single-modal emotion classification output. In the audio feature extraction, music audio is finely divided and the human voice is separated to obtain pure background sound clips; the spectrogram and LLDs (Low Level Descriptors) are extracted therefrom. In the lyrics feature extraction, the chi-squared test vector and word embedding extracted by Word2vec are, respectively, used as the feature representation of the lyrics. Combining the two types of heterogeneous features selected by audio and lyrics through the classification model can improve the classification performance. In order to fuse the emotional information of the two modals of music audio and lyrics, this paper proposes a multimodal ensemble learning method based on stacking, which is different from existing feature-level and decision-level fusion methods, the method avoids information loss caused by direct dimensionality reduction, and the original features are converted into label results for fusion, effectively solving the problem of feature heterogeneity. Experiments on million song dataset show that the audio classification accuracy of the multifeature combined network classifier in this paper reaches 68%, and the lyrics classification accuracy reaches 74%. The average classification accuracy of the multimodal reaches 78%, which is significantly improved compared with the single-modal.
机译:针对单一网络分类模型的缺点,本文适用于音乐情感分类领域的CNN-LSTM(卷积神经网络 - 长短期内存)组合网络,并提出基于CNN-LSTM的多因素组合网络分类器,其结合2D(二维)功能通过CNN-LSTM和1D(单维)功能输入通过DNN(深神经网络)输入,以弥补原始单个特征模型的缺陷。该模型在CNN中使用多个卷积内核进行2D特征提取,双向化(双向LSTM)用于序列化处理,用于音频和歌词单模态情绪分类输出。在音频特征提取中,音频被精细地划分,并且人声分开以获得纯背景声剪辑;频谱图和LLD(低级描述符)由其提取。在歌词特征提取中,Chi平方测试向量和由Word2VEC提取的字嵌入,用作歌词的特征表示。通过分类模型结合音频和歌词选择的两种异构特征可以提高分类性能。为了使音乐音频和歌词的两种模式的情感信息熔断,本文提出了一种基于堆叠的多模式集合学习方法,其与现有特征级和决策级融合方法不同,方法避免了引起的信息丢失通过直接减少维度,并且原始特征被转换为融合的标签结果,有效解决特征异质性的问题。百万歌曲数据集的实验表明,本文中的多因素组合网络分类器的音频分类准确性达到68%,歌词分类精度达到74%。与单模相比,多模数的平均分类精度达到78%,这与单模相比显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号