首页> 外文学位 >A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio.
【24h】

A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio.

机译:一种使用在合成音频上训练的隐马尔可夫模型从音频进行和弦转录和音调提取的系统。

获取原文
获取原文并翻译 | 示例

摘要

Extracting high-level information of musical attributes such as melody, harmony, key, or rhythm from the raw waveform is a critical process in Music Information Retrieval (MIR) systems. Using one or more of such features in a front end, one can efficiently and effectively search, retrieve and navigate through a large collection of musical audio. Among those musical attributes, harmony is a key element in Western tonal music. Harmony can be characterized by a set of rules stating how simultaneously sounding (or inferred) tones create a single entity commonly known as a chord), how the elements of adjacent chords interact melodically, and how sequences of chords relate to one another in a functional hierarchy. Patterns of chord changes over time allow for the delineation of structural features such as phrases, sections and movements. In addition to structural segmentation, harmony often plays a crucial role in projecting emotion and mood. This dissertation focuses on two aspects of harmony, chord labeling and chord progressions in diatonic functional tonal music.; Recognizing the musical chords from the raw audio is a challenging task. In this dissertation, a system that accomplishes this goal using hidden Markov models is described. In order to avoid the enormously time-consuming and laborious process of manual annotation, which must be done in advance to provide the ground-truth to the supervised learning models, symbolic data like MIDI files are used to obtain a large amount of labeled training data. To this end, harmonic analysis is first performed on noise-free symbolic data to obtain chord labels with precise time boundaries. In parallel, a sample-based synthesizer is used to create audio files from the same symbolic files. The feature vectors extracted from synthesized audio are in perfect alignment with the chord labels, and are used to train the models.; Sufficient training data allows for key- or genre-specific models, where each model is trained on music of specific key or genre to estimate key- or genre-dependent model parameters. In other words, music of a certain key or genre reveals its own characteristics reflected by chord progression, which result in the unique model parameters represented by the transition probability matrix. In order to extract key or identify genre, when the observation input sequence is given, the forward-backward or Baum-Welch algorithm is used to efficiently compute the likelihood of the models, and the model with the maximum likelihood gives key or genre information. Then the Viterbi decoder is applied to the corresponding model to extract the optimal state path in a maximum likelihood sense, which is identical to the frame-level chord sequence.; The experimental results show that the proposed system not only yields chord recognition performance comparable to or better than other previously published systems, but also provides additional information of key and/or genre without using any other algorithms or feature sets for such tasks. It is also demonstrated that the chord sequence with precise timing information can be successfully used to find cover songs from audio and to detect musical phrase boundaries by recognizing the cadences or harmonic closures.; This dissertation makes a substantial contribution to the music information retrieval community in many aspects. First, it presents a probabilistic framework that combines two closely related musical tasks---chord recognition and key extraction from audio---and achieves state-of-the-art performance in both applications. Second, it suggests a solution to a bottleneck problem in machine learning approaches by demonstrating the method of automatically generating a large amount of labeled training data from symbolic music documents. This will help free researchers of laborious task of manual annotation. Third, it makes use of more efficient and robust feature vector called tonal centroid and proves, via a thorough quantitative evaluation,
机译:从原始波形中提取音乐属性(例如旋律,和声,调子或节奏)的高级信息是音乐信息检索(MIR)系统中的关键过程。使用前端中的一个或多个此类功能,可以有效地搜索,检索和导航大量音乐音频。在这些音乐属性中,和声是西方音调音乐的关键元素。和谐的特征在于一组规则,这些规则说明发声(或推断出)的音调如何同时创建一个通常称为和弦的单个实体),相邻和弦的元素如何进行旋律交互以及和弦序列在功能中如何相互关联层次结构。随时间变化的和弦模式可以勾勒出结构特征,例如乐句,乐段和动作。除了结构细分之外,和谐通常在投射情绪和情绪中也起着至关重要的作用。本文着重研究了全音阶功能性音调音乐的和声,和弦标记和和弦进行的两个方面。从原始音频中识别音乐和弦是一项艰巨的任务。本文介绍了一种使用隐马尔可夫模型实现此目标的系统。为了避免手动注释的费时费力的过程,必须提前完成才能为受监督的学习模型提供基础,使用诸如MIDI文件之类的符号数据来获取大量带标签的训练数据。为此,首先对无噪声的符号数据执行谐波分析,以获得具有精确时间边界的和弦标签。同时,基于样本的合成器用于从相同的符号文件创建音频文件。从合成音频中提取的特征向量与和弦标签完全对齐,并用于训练模型。足够的训练数据可用于特定于密钥或流派的模型,其中,每种模型都针对特定密钥或流派的音乐进行训练,以估计依赖于密钥或流派的模型参数。换句话说,某个键或流派的音乐通过和弦进程反映出其自身的特征,这导致由过渡概率矩阵表示的唯一模型参数。为了提取关键或识别体裁,当给出观察输入序列时,使用前向或后向算法或Baum-Welch算法来有效地计算模型的似然性,并且具有最大似然性的模型会给出关键或体裁信息。然后,将维特比解码器应用于相应的模型,以在最大似然意义上提取与帧级和弦序列相同的最优状态路径。实验结果表明,提出的系统不仅产生了与其他先前发布的系统相媲美或更好的和弦识别性能,而且还提供了关键和/或体裁的其他信息,而无需使用任何其他算法或功能集来完成此类任务。还证明,具有精确定时信息的和弦序列可以成功地用于从音频中找到翻唱歌曲,并通过识别节奏或谐音闭合来检测乐句的边界。论文在许多方面为音乐信息检索界做出了重要贡献。首先,它提出了一个概率框架,该框架结合了两个紧密相关的音乐任务-和弦识别和从音频中提取键-并在这两个应用程序中均实现了最先进的性能。其次,它通过演示自动从符号音乐文档中生成大量带标签的训练数据的方法,提出了一种解决机器学习方法中瓶颈问题的方法。这将帮助研究人员摆脱繁重的手工标注任务。第三,它利用称为音调质心的更有效,更强大的特征向量,并通过全面的定量评估证明,

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号