THESIS
2016
Abstract
A melody is most commonly retrieved by using its metadata as a search query. However, with
the expanding volume of music databases available at our fingertips, finding other efficient music
retrieval methods have become imperative. Query-by-humming is a content-based music retrieval
method that can retrieve melodies using users’ hummings as queries. This allows users to find
melodies only using its tune and does not require any knowledge of its related metadata or even
lyrics.
In this paper we focus on building a humming and monophonic singing transcription system
and a query by humming system.
In this thesis, we utilize deep learning for the humming transcription, which has not been done
before. We use a database of monophonic melodies to train a hybrid model using Convolution...[
Read more ]
A melody is most commonly retrieved by using its metadata as a search query. However, with
the expanding volume of music databases available at our fingertips, finding other efficient music
retrieval methods have become imperative. Query-by-humming is a content-based music retrieval
method that can retrieve melodies using users’ hummings as queries. This allows users to find
melodies only using its tune and does not require any knowledge of its related metadata or even
lyrics.
In this paper we focus on building a humming and monophonic singing transcription system
and a query by humming system.
In this thesis, we utilize deep learning for the humming transcription, which has not been done
before. We use a database of monophonic melodies to train a hybrid model using Convolutional
Neural Network (CNN) with Hidden Markov Model (HMM), which is then used to transcribe the
queries. We also use a note-based retrieval method for candidate melody retrieval. We use standard
datasets to evaluate our transcription system and the overall query by humming system and compare the results against other algorithms.
We also use raw audio data to train the CNN model for humming transcriptions and compare
it against when features were used. We show that using raw audio data directly gives better results
overall and outperforms the system with features by 2
The proposed Query by Humming system first transcribes the notes of the music signals of both
the query and the melodies in the database using the proposed humming transcription system. The
notes of the query is then compared against those of the melody database and the melody most
similar to the query is retrieved.
Our transcription provides a F-measure of 54% which is 11% more than what we get using a
simple HMM-GMM system and is better than other state-of-the-art singing transcription systems.
The overall query by humming system gives an overall MRR of 0.92 using the standard MIREX
dataset, which is also an improvement over other note-based query by humming systems.
Post a Comment