THESIS
2018
ix, 33 pages : illustrations ; 30 cm
Abstract
The Hierarchical Latent Tree Analysis (HLTA) is a recently proposed algorithm for hierarchical topic detection. It takes a collection of unlabeled and unstructured text documents as input and outputs a hierarchy of topics where each topic is a subset of documents. In this thesis, we present an automated document indexing system that automatically builds an index structure for a corpus of documents using the topic hierarchy obtained by HLTA. It also provides tools for visualizing various facts and
relationships that can be extracted from the topic hierarchy. We demonstrate the usefulness of the system on three datasets: (1) a collection of research papers published at major AI conferences and journals from 2000 to 2018, (2) two collections of research outputs from researchers at the Hon...[
Read more ]
The Hierarchical Latent Tree Analysis (HLTA) is a recently proposed algorithm for hierarchical topic detection. It takes a collection of unlabeled and unstructured text documents as input and outputs a hierarchy of topics where each topic is a subset of documents. In this thesis, we present an automated document indexing system that automatically builds an index structure for a corpus of documents using the topic hierarchy obtained by HLTA. It also provides tools for visualizing various facts and
relationships that can be extracted from the topic hierarchy. We demonstrate the usefulness of the system on three datasets: (1) a collection of research papers published at major AI conferences and journals from 2000 to 2018, (2) two collections of research outputs from researchers at the Hong Kong University of Science and Technology, and (3) a collection of Chinese web posts related to migration posted on the social media and e-commerce platform that was collected by the Internal Organization for Migration.
Post a Comment