THESIS
2014
Abstract
In this thesis, we aim to classify "live" and "studio" versions of a song using audio
features. We solve this problem using supervised machine learning techniques, and then
address the issue of data scarcity by using a co-training algorithm in a semi-supervised
setting. This issue has rarely been addressed before, but is of paramount importance.
Indeed, many online music databases, such as Youtube videos, are user-generated and
therefore are very mixed in terms of quality. Consequently, the listening experience of the
user of online streaming services is adversely affected. In this work, we are particularly
interested in knowing whether the song played by a potential listener is the original studio
version or a secondary live recording.
As manual labeling can be tedious and cha...[
Read more ]
In this thesis, we aim to classify "live" and "studio" versions of a song using audio
features. We solve this problem using supervised machine learning techniques, and then
address the issue of data scarcity by using a co-training algorithm in a semi-supervised
setting. This issue has rarely been addressed before, but is of paramount importance.
Indeed, many online music databases, such as Youtube videos, are user-generated and
therefore are very mixed in terms of quality. Consequently, the listening experience of the
user of online streaming services is adversely affected. In this work, we are particularly
interested in knowing whether the song played by a potential listener is the original studio
version or a secondary live recording.
As manual labeling can be tedious and challenging in practice, we first propose to
classify automatically a music data set by using machine learning techniques under a supervised
setting, using only the audio content of the songs. We show which segments of the songs are more relevant to distinguishing between "live" and "studio" songs and
discuss the relative importance of audio, acoustic and music features on this classification
task. We then propose to implement a more robust system by using multi-ensemble learning.
Exploiting the diversity of different classifiers, we apply stacked generalization to our
classification task and obtain up to 92.82% in terms of global accuracy, on a 1066-song
data set.
Finally, we tackle this classification problem under a semi-supervised setting. Specifically,
we are interested in cases where very little-annotated training data is available,
and we demonstrate how an original co-training algorithm can alleviate the problem of
data scarcity by using a large, unlabeled data set. This method is proven to give significantly
better results (with 10%-absolute accuracy improvement when only 15 examples
are initially annotated) than classifiers trained only on the initially-annotated data set.
Post a Comment