THESIS
2021
1 online resource (xiv, 93 pages) : illustrations (some color)
Abstract
The mission of data mining is to discover the knowledge behind the data. Three typical
knowledge is trend, cluster, and change, which derive three typical data mining tasks:
regression, clustering, and detection. Amounts of studies ranging from mathematical
models to deep learning frameworks have been proposed. However, a pure data mining
model without domain or human knowledge might provide results that derail from reality.
This thesis proposes that the combination of “Data + Domain/Human Knowledge”
could potentially offer a better solution. Two major frameworks have been proposed:
(1) Data-to-Data knowledge collaborating framework, and (2) Human-to-Data knowledge
incorporating framework, with three projects conducted.
The first project is to learn the “change” in smart manufacturing,...[
Read more ]
The mission of data mining is to discover the knowledge behind the data. Three typical
knowledge is trend, cluster, and change, which derive three typical data mining tasks:
regression, clustering, and detection. Amounts of studies ranging from mathematical
models to deep learning frameworks have been proposed. However, a pure data mining
model without domain or human knowledge might provide results that derail from reality.
This thesis proposes that the combination of “Data + Domain/Human Knowledge”
could potentially offer a better solution. Two major frameworks have been proposed:
(1) Data-to-Data knowledge collaborating framework, and (2) Human-to-Data knowledge
incorporating framework, with three projects conducted.
The first project is to learn the “change” in smart manufacturing, precisely to detect
anomalies from a cold-start process, and a decomposition-based hybrid transfer learning
framework is proposed to transfer knowledge from experienced domains to the cold-start
domain. The knowledge transfer increases the anomaly detection accuracy in cold-start
data by 20%.
The second project is to learn the “trend” in smart transportation, precisely to predict
the passenger flow in a metro system. Human knowledge about the distances and the
functional similarities between stations have been formulated as graphs and incorporated
into the proposed low-rank tensor completion model. The incorporated graphs improve
the prediction results by more than 30%.
The third project is to learn the “cluster” in smart transportation, precisely to learn
the multiple clusters of origin, destination, time, and passengers from individual trajectory data. A tensor Latent Dirichlet Allocation (LDA) model is proposed with the external
knowledge graphs about locations and functions of stations incorporated. The graph
structure enhances the interpretability of learned clusters by more than 20%.
These essays provide a comprehensive solution for analytical data models coupling
with domain and human knowledge, with detailed implementation in real case studies to
prove the increased model accuracy, efficiency, and interpretability.
Post a Comment