THESIS
2018
xiv, 92 pages : illustrations ; 30 cm
Abstract
As the explosive growth of available multi-dimensional data, many machine learning and data
mining algorithms have been developed to analyze and utilize these data. However, most of these
algorithms are black boxes, which hinders users from understanding and trusting the decisions
made by these algorithms. By taking advantages of human’s strong visual perception capability,
visualization techniques can be utilized to facilitate the interpretation of these algorithms and their decisions. In this thesis, we propose several visualization techniques to tackle with various black box algorithms.
In the first work, we focus on explaining skyline, which is widely applied to facilitate
multi-criteria decision making. By automatically removing incompetent candidates, skyline
queries allow...[
Read more ]
As the explosive growth of available multi-dimensional data, many machine learning and data
mining algorithms have been developed to analyze and utilize these data. However, most of these
algorithms are black boxes, which hinders users from understanding and trusting the decisions
made by these algorithms. By taking advantages of human’s strong visual perception capability,
visualization techniques can be utilized to facilitate the interpretation of these algorithms and their decisions. In this thesis, we propose several visualization techniques to tackle with various black box algorithms.
In the first work, we focus on explaining skyline, which is widely applied to facilitate
multi-criteria decision making. By automatically removing incompetent candidates, skyline
queries allow users to focus on a subset of superior data items (i.e., the skyline). However,
users are still required to interpret and compare these superior items manually before making a
successful choice. We therefore propose SkyLens, a visual analytic system aiming at revealing
the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two usage scenarios and one user study are conducted to demonstrate the
effectiveness of our system.
The second work studies the explanation of random forest algorithms. As an ensemble model
that consists of many independent decision trees, random forests generate predictions by feeding
the input to internal trees and summarizing their outputs. However, random forests suffer from
a poor model interpretability, which significantly hinders the model from being used in fields
that require transparent and explainable predictions, such as medical diagnosis and financial fraud detection. To address this issue, we propose an interactive visualization system aiming at interpreting random forest models and predictions. We carried out two usage scenarios and one user study to evaluate the usefulness of the proposed technique.
The third work investigates the interpretation of outliers, the data instances that do not conform with normal patterns in a dataset. As different domains usually have different considerations about outliers, understanding the defining characteristics of outliers is essential for users to select and filter appropriate outliers based on their domain requirements. However, most existing work focuses on the efficiency and accuracy of outlier detection, while neglecting the importance of outlier interpretation. Hence, we propose a visual analytic system that helps users understand,
interpret, and select the outliers detected by various algorithms. One usage scenario and one user study are carried out to assess the effectiveness of our proposed system.
Post a Comment