首页> 外文学位 >A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio.

【24h】

A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio.

机译：一种使用在合成音频上训练的隐马尔可夫模型从音频进行和弦转录和音调提取的系统。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting high-level information of musical attributes such as melody, harmony, key, or rhythm from the raw waveform is a critical process in Music Information Retrieval (MIR) systems. Using one or more of such features in a front end, one can efficiently and effectively search, retrieve and navigate through a large collection of musical audio. Among those musical attributes, harmony is a key element in Western tonal music. Harmony can be characterized by a set of rules stating how simultaneously sounding (or inferred) tones create a single entity commonly known as a chord), how the elements of adjacent chords interact melodically, and how sequences of chords relate to one another in a functional hierarchy. Patterns of chord changes over time allow for the delineation of structural features such as phrases, sections and movements. In addition to structural segmentation, harmony often plays a crucial role in projecting emotion and mood. This dissertation focuses on two aspects of harmony, chord labeling and chord progressions in diatonic functional tonal music.; Recognizing the musical chords from the raw audio is a challenging task. In this dissertation, a system that accomplishes this goal using hidden Markov models is described. In order to avoid the enormously time-consuming and laborious process of manual annotation, which must be done in advance to provide the ground-truth to the supervised learning models, symbolic data like MIDI files are used to obtain a large amount of labeled training data. To this end, harmonic analysis is first performed on noise-free symbolic data to obtain chord labels with precise time boundaries. In parallel, a sample-based synthesizer is used to create audio files from the same symbolic files. The feature vectors extracted from synthesized audio are in perfect alignment with the chord labels, and are used to train the models.; Sufficient training data allows for key- or genre-specific models, where each model is trained on music of specific key or genre to estimate key- or genre-dependent model parameters. In other words, music of a certain key or genre reveals its own characteristics reflected by chord progression, which result in the unique model parameters represented by the transition probability matrix. In order to extract key or identify genre, when the observation input sequence is given, the forward-backward or Baum-Welch algorithm is used to efficiently compute the likelihood of the models, and the model with the maximum likelihood gives key or genre information. Then the Viterbi decoder is applied to the corresponding model to extract the optimal state path in a maximum likelihood sense, which is identical to the frame-level chord sequence.; The experimental results show that the proposed system not only yields chord recognition performance comparable to or better than other previously published systems, but also provides additional information of key and/or genre without using any other algorithms or feature sets for such tasks. It is also demonstrated that the chord sequence with precise timing information can be successfully used to find cover songs from audio and to detect musical phrase boundaries by recognizing the cadences or harmonic closures.; This dissertation makes a substantial contribution to the music information retrieval community in many aspects. First, it presents a probabilistic framework that combines two closely related musical tasks---chord recognition and key extraction from audio---and achieves state-of-the-art performance in both applications. Second, it suggests a solution to a bottleneck problem in machine learning approaches by demonstrating the method of automatically generating a large amount of labeled training data from symbolic music documents. This will help free researchers of laborious task of manual annotation. Third, it makes use of more efficient and robust feature vector called tonal centroid and proves, via a thorough quantitative evaluation,

机译：从原始波形中提取音乐属性（例如旋律，和声，调子或节奏）的高级信息是音乐信息检索（MIR）系统中的关键过程。使用前端中的一个或多个此类功能，可以有效地搜索，检索和导航大量音乐音频。在这些音乐属性中，和声是西方音调音乐的关键元素。和谐的特征在于一组规则，这些规则说明发声（或推断出）的音调如何同时创建一个通常称为和弦的单个实体），相邻和弦的元素如何进行旋律交互以及和弦序列在功能中如何相互关联层次结构。随时间变化的和弦模式可以勾勒出结构特征，例如乐句，乐段和动作。除了结构细分之外，和谐通常在投射情绪和情绪中也起着至关重要的作用。本文着重研究了全音阶功能性音调音乐的和声，和弦标记和和弦进行的两个方面。从原始音频中识别音乐和弦是一项艰巨的任务。本文介绍了一种使用隐马尔可夫模型实现此目标的系统。为了避免手动注释的费时费力的过程，必须提前完成才能为受监督的学习模型提供基础，使用诸如MIDI文件之类的符号数据来获取大量带标签的训练数据。为此，首先对无噪声的符号数据执行谐波分析，以获得具有精确时间边界的和弦标签。同时，基于样本的合成器用于从相同的符号文件创建音频文件。从合成音频中提取的特征向量与和弦标签完全对齐，并用于训练模型。足够的训练数据可用于特定于密钥或流派的模型，其中，每种模型都针对特定密钥或流派的音乐进行训练，以估计依赖于密钥或流派的模型参数。换句话说，某个键或流派的音乐通过和弦进程反映出其自身的特征，这导致由过渡概率矩阵表示的唯一模型参数。为了提取关键或识别体裁，当给出观察输入序列时，使用前向或后向算法或Baum-Welch算法来有效地计算模型的似然性，并且具有最大似然性的模型会给出关键或体裁信息。然后，将维特比解码器应用于相应的模型，以在最大似然意义上提取与帧级和弦序列相同的最优状态路径。实验结果表明，提出的系统不仅产生了与其他先前发布的系统相媲美或更好的和弦识别性能，而且还提供了关键和/或体裁的其他信息，而无需使用任何其他算法或功能集来完成此类任务。还证明，具有精确定时信息的和弦序列可以成功地用于从音频中找到翻唱歌曲，并通过识别节奏或谐音闭合来检测乐句的边界。论文在许多方面为音乐信息检索界做出了重要贡献。首先，它提出了一个概率框架，该框架结合了两个紧密相关的音乐任务-和弦识别和从音频中提取键-并在这两个应用程序中均实现了最先进的性能。其次，它通过演示自动从符号音乐文档中生成大量带标签的训练数据的方法，提出了一种解决机器学习方法中瓶颈问题的方法。这将帮助研究人员摆脱繁重的手工标注任务。第三，它利用称为音调质心的更有效，更强大的特征向量，并通过全面的定量评估证明，

著录项

作者
Lee, Kyogu.;
展开▼
作者单位

Stanford University.;

展开▼
授予单位 Stanford University.;
学科 Music.; Engineering Electronics and Electrical.; Computer Science.
学位 Ph.D.
年度 2008
页码 147 p.
总页数 147
原文格式 PDF
正文语种 eng
中图分类音乐;无线电电子学、电信技术;自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio [J] . Lee K., Slaney M. IEEE transactions on audio, speech and language processing . 2008,第2期

机译：使用在合成音频上训练的依赖于键的HMM，从音频中获取和弦转录和键提取
2. Audio-visual sports highlights extraction using Coupled Hidden Markov Models [J] . Ziyou Xiong Pattern Analysis and Applications . 2005,第1a2期

机译：使用耦合隐马尔可夫模型提取视听运动的亮点
3. Hidden Markov Models for Time Series : An Introduction Using R . Walter ? Zucchini , Iain L. ? McDonald , and Roland ? Langrock , Boca Raton , CRC Press Hidden Markov Models for Time Series Hidden Markov Models for Time Series : An Introduction Using R An Introduction Using R . Walter ? Zucchini Walter Walter ? Zucchini Zucchini , Iain L. ? McDonald Iain L. Iain L. ? McDonald McDonald , and Roland ? Langrock Roland Roland ? Langrock Langrock , Boca Raton Boca Raton , CRC Press CRC Press [J] . Patterson Toby Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2019,第2期

机译：隐藏的马尔可夫模型时间序列：使用r的简介。沃尔特？西葫芦，Iain L.？麦当劳和罗兰？ Langrock，Boca Raton，CRC压力隐马尔可夫模型用于时间序列隐藏式马尔可夫型号的时间序列：使用R引言使用R引言。沃尔特？西葫芦沃尔特沃尔特？西葫芦夏南瓜，Iain L.？麦当劳Iain L. Iain L.？麦当劳麦当劳，罗兰？ Langrock Roland Roland？ Langrock Langrock，Boca Raton Boca Raton，CRC按CRC压力机
4. A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models [C] . Kyogu Lee Adaptive Multimedia Retrieval: Retrieval, User, and Semantics . 2008

机译：使用特定于类型的隐马尔可夫模型从音频自动和弦转录的系统
5. Reduced-parameter model of head-related transfer functions for synthesized spatial audio. [D] . Faller, Kenneth John, II. 2009

机译：用于合成空间音频的与头部相关的传递函数的简化参数模型。
6. An Acoustic Sensing Gesture Recognition System Design Based on a Hidden Markov Model [O] . Bruna Salles Moreira, Angelo Perkusich, Saulo O. D. Luiz 2020

机译：基于隐马尔可夫模型的声学传感手势识别系统设计
7. Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio [O] . Kyogu Lee, Malcolm Slaney, Senior Member 2013

机译：使用在合成音频上训练的密钥相关Hmm从音频进行声学和弦转录和密钥提取
8. Measurements and Modeling in Acoustics and Audio. Seminar in Acoustics, Spring 2002 [R] . Karjalainen, M. 2002

机译：声学和音频中的测量和建模。 2002年春季声学研讨会

A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio.

摘要

著录项

相似文献

相关主题

期刊订阅