THESIS
2022
1 online resource (x, 31 pages) : illustrations (chiefly color)
Abstract
Human localization is a problem widely explored for its practical applications such as autonomous
driving, home and campus security. Many of the prior works focus mainly on visual
cues for localization while some include cues from other domain (e.g., radio frequency, speech).
However, each localization has its own blind spots (such as walking quietly in a dark environment)
and exploring localization cues from different domain to complement each other is of
great importance to robust human localization. Furthermore, the representations of different
localization cues are not easily compatible: Direction-Of-Arrival and visual reference frame are
the common representations for audio cue based and visual cue based localization respectively.
In this thesis, we propose a new task by exploring...[
Read more ]
Human localization is a problem widely explored for its practical applications such as autonomous
driving, home and campus security. Many of the prior works focus mainly on visual
cues for localization while some include cues from other domain (e.g., radio frequency, speech).
However, each localization has its own blind spots (such as walking quietly in a dark environment)
and exploring localization cues from different domain to complement each other is of
great importance to robust human localization. Furthermore, the representations of different
localization cues are not easily compatible: Direction-Of-Arrival and visual reference frame are
the common representations for audio cue based and visual cue based localization respectively.
In this thesis, we propose a new task by exploring the feasibility in using stereo footstep
sound as a human localization cue on a visual reference frame. Using footstep sound as localization
cue is not only relatively less explored but even more so for visual reference frame
representation. In comparison to other audio cues such as music or speech, footstep sound typically
has much lower SNRs, making localization much more challenging. Being the first to
attempt on human localization with stereo footstep sound on a visual reference frame, we have
not only verified the feasibility of the new task but also designed a MHSA-SE module which
has shown to consistently benefit the human localization results. Furthermore, we also built a
new dataset Stereo Footstep Dataset dedicated for this new task, which contains both single and
double person audio and localization coordinates on a visual reference frame across 27 unique individuals.
Post a Comment