THESIS
2018
xvii, 103 pages : illustrations ; 30 cm
Abstract
Temporal event sequences are becoming increasingly important in many application domains
such as website click streams, user interaction logs, electronic health records and car service
records. However, a real-world dataset with a large number of event sequences of varying
lengths is complex and difficult to analyze. Visual analytics has been proven as an effective
approach to understanding such large amounts of data. For example, by visually highlighting
the common behaviors of website click streams, usability issues and user behavior patterns can
be identified to inform better designs of the interface. In this thesis, we follow the research in
the area of event sequence visualization and report three works in developing visual analytics
techniques for temporal event data from...[
Read more ]
Temporal event sequences are becoming increasingly important in many application domains
such as website click streams, user interaction logs, electronic health records and car service
records. However, a real-world dataset with a large number of event sequences of varying
lengths is complex and difficult to analyze. Visual analytics has been proven as an effective
approach to understanding such large amounts of data. For example, by visually highlighting
the common behaviors of website click streams, usability issues and user behavior patterns can
be identified to inform better designs of the interface. In this thesis, we follow the research in
the area of event sequence visualization and report three works in developing visual analytics
techniques for temporal event data from various application domains.
In the first work, we propose a novel visualization technique based on the minimum description length (MDL) principle to construct a coarse-level overview of event sequence data
while balancing the information loss in it. The method addresses a fundamental trade-off in
visualization design: reducing visual clutter vs. increasing the information content in a visualization. The method enables simultaneous sequence clustering and pattern extraction and is
highly tolerant to noises such as missing or additional events in the data. Based on this approach
we propose a visual analytics framework with multiple levels-of-detail to facilitate interactive
data exploration. We demonstrate the usability and effectiveness of our approach through case
studies with two real-world datasets. One dataset showcases a new application domain for event
sequence visualization, i.e., fault development path analysis in vehicles for predictive maintenance. We also discuss the strengths and limitations of the proposed method based on user
feedback.
The second work focus on the stage, that is, a frequently occurring subsequence in the
dataset. We introduce a novel visualization technique to summarize event sequence data into
a set of stage progression patterns. The resulting overview is more concise compared with
event-level summarization and supports level-of-detail exploration. We further present a visual
analytics system with four linked views, which are stage view, overview, tree view and
sequences view to help users explore the data. We also present quantitative experimental results
as well as case studies where the system is used in two different domains and discuss advantages
and limitations of applying StageMap to various application scenarios.
In the third work, we study the temporal event data related to a specific application domain,
i.e., the web click streams in Massive Open Online Courses (MOOCs). To be more specific,
we try to understand the dropout behavior in such data. To tackle this problem, we introduce a
comprehensive visual analytics system which not only helps instructors and education experts
understand the reasons for the dropout, but also allows researchers to identify crucial features
which can further improve the performance of the models. Both the heterogeneous data extracted
from three different kinds of learner activity logs (i.e., clickstream, forum posts and
homework records) and the predicted results are visualized in the proposed system.
Post a Comment