THESIS
2022
1 online resource (xiii, 125 pages) : illustrations (some color)
Abstract
People often communicate messages through verbal and non-verbal language expressions, including
voice, words, facial expressions, and body language. Interpreting the multimodal human behavior
in communication has great value for many applications, such as business, healthcare, and
education. For example, if students show signs of boredom or confusion during the courses, teachers
can adjust the teaching methods to improve students’ engagement. With the rapid development
of digital technology and social media, a huge amount of multimodal human communication data
(e.g., opinion videos) is generated and collected. To facilitate the analysis of human communication
data, researchers adopt computational approaches to quantify human behavior with multimodal
features. However, it is still demand...[
Read more ]
People often communicate messages through verbal and non-verbal language expressions, including
voice, words, facial expressions, and body language. Interpreting the multimodal human behavior
in communication has great value for many applications, such as business, healthcare, and
education. For example, if students show signs of boredom or confusion during the courses, teachers
can adjust the teaching methods to improve students’ engagement. With the rapid development
of digital technology and social media, a huge amount of multimodal human communication data
(e.g., opinion videos) is generated and collected. To facilitate the analysis of human communication
data, researchers adopt computational approaches to quantify human behavior with multimodal
features. However, it is still demanding and inefficient to manually extract insights (e.g.,
social meanings of the features) in the large and complex feature space. Furthermore, it remains
challenging to utilize the knowledge distilled from the computational features to enhance human
communication skills. Meanwhile, interactive visual analytics combines computational algorithms
with human-centered visualization to effectively supports information representation, knowledge
discovery, and skills acquisition. It demonstrates great potential to solve the challenges above.
In this thesis, we focus on visual analytics of multimodal human language for conveying messages based on communication videos (e.g., public speaking and opinion videos). And we design
and build novel interactive visual analytics systems to 1) help users discover valuable patterns of
speakers’ multimodal communication behavior in videos and 2) further provide end-users with visual
feedback and guidance to improve their communication skills. In the first work, we present
DeHumor, a visual analytics system that visually decomposes humor speeches into quantifiable
multimodal features and enables humor researchers and communication coaches to systematically
explore humorous verbal content and vocal delivery. In the second work, we further characterize
and investigate the intra- and inter-modal interactions between visual, acoustic, and language
modalities, including dominance, complement, and conflict. Then, we develop
M2Lens, a visual
analytics system that helps model developers and users conduct multi-level and multi-faceted exploration
of the influences of individual modalities and their interplay on model predictions for
multimodal sentiment analysis. Besides understanding multimodal human communication behavior,
we present
VoiceCoach, a visual analytics system that can evaluate speakers’ voice modulation
skills regarding volume, pitch, speed, and pause, and recommend good learning examples of voice
modulation in TED Talks to follow. Moreover, during the practice, the system can provide immediate
visual feedback to speakers for self-awareness and performance improvement.
Post a Comment