THESIS
2022
1 online resource (xv, 101 pages) : illustrations (some color)
Abstract
Humans can listen to others in noisy environments, but it is difficult for humanoid robot
to separate the speech of individual persons from a mixture of these sounds. Known as
the cocktail party problem, this challenge has intrigued scientists and engineers for more
than half a century. Using the sparse characteristic of speech signals, this thesis improves
three binaural audio separation algorithms in different scenarios: 1) weak target signal
extraction, 2) fast separation, and 3) under-determined reverberant speech separation (i.e.,
binaural speech separation problem with more than two mixed sources in the presence of
echoes).
First, the thesis improves a previously reported audio cancellation kernel to separate
weak target signals. Our new version of the cancellation kernel achieves...[
Read more ]
Humans can listen to others in noisy environments, but it is difficult for humanoid robot
to separate the speech of individual persons from a mixture of these sounds. Known as
the cocktail party problem, this challenge has intrigued scientists and engineers for more
than half a century. Using the sparse characteristic of speech signals, this thesis improves
three binaural audio separation algorithms in different scenarios: 1) weak target signal
extraction, 2) fast separation, and 3) under-determined reverberant speech separation (i.e.,
binaural speech separation problem with more than two mixed sources in the presence of
echoes).
First, the thesis improves a previously reported audio cancellation kernel to separate
weak target signals. Our new version of the cancellation kernel achieves comparable or
even better results with 3000 times the speed thanks to our analytical solutions. This solution
originated from our realization that the whole extraction process can be performed
in the time-frequency domain by the Short-time Fourier transform.
Second, the thesis improves the degenerate unmixing estimation technique (DUET),
one of the fastest algorithms in speech separation. As a binary masking technique, DUET
cannot completely separate speech signals, resulting in poor performance. We applied
post-filtering with multiple linear spatial filters to improve the mask separation results
and successfully resulted in significantly better separation performance.
Third, the thesis improves the l
1 minimization commonly used in audio separation
algorithms. Speech separation can be converted to an l
1 minimization problem that aims
to minimize the l
1 norm of the reconstructed signal. We derived and test a new weighted l
1
norm and showed that it can outperform the unweighted l
1 norm. The new algorithm can be solved using the same l
1 minimization solver but converges faster than the unweighted
l
1 minimization. The improved l
1 minimization algorithms have been shown to work in
the presence of reverberation and with more than two mixed speech sources.
Post a Comment