Spatio-textual data analysis via co-location mining and collective spatial keyword queries

HKUST Electronic Theses

Spatio-textual data analysis via co-location mining and collective spatial keyword queries

by Kai Ho Chan

THESIS 2019

Ph.D. Computer Science and Engineering

xiii, 143 pages : illustrations ; 30 cm

Abstract

With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual data that possess both a geographical location and a textual description are gaining in prevalence. This development gives prominence to spatio-textual data analysis, which is an emerging research field and has both real-world and scientific applications. The research on spatio-textual data analysis consists of many different areas, such as spatial data mining (i.e., knowledge discovery in large spatial databases) and spatial keyword query processing. In the area of spatial data mining, we want to discover interesting, and previously unknown but potentially useful, patterns from large spatial databases. For example, one type of spatial data mining is the spatial association mining, which finds the patterns and rules that describe the implication of one or a set of features from another set of features in spatial databases. In the area of spatial keyword query processing, we want to process the query and return relevant objects as results. A typical query takes a location and a set of keywords as arguments and returns the single spatio-textual object that best matches the keywords and is close to the specified location.

In this thesis, we introduce co-location pattern mining which is one type of spatial data mining and collective spatial keyword query (CoSKQ) which is one type of spatial keyword queries. Both problems find the results from the spatio-textual database and adopt the concept of object set. For the co-location pattern mining problem, we develop a new support measure called Fraction-Score that overcome the weaknesses of the existing support measures for defining co-location patterns. To solve the problem based on Fraction-Score, we develop efficient algorithms which are significantly faster than a baseline that adapts the state-of-the-art.

For the CoSKQ problem, we consider two directions. First, we design a unified cost function which generalizes the majority of existing cost functions for CoSKQ and develop a unified approach which works as well as (and sometimes better than) best-known approaches based on different cost functions. Second, we propose a new cost function called the maximum dot size cost which captures both the distances among objects in a set and a query as existing cost functions do and the inherent costs of the objects. We present an exact algorithm and an approximate algorithm with a provable approximation bound for the problem. We conducted extensive experiments conducted on both real datasets and synthetic datasets, which verified all our proposed approaches and algorithms.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors WONG, Raymond Chi-Wing Authors Chan, Kai Ho Subjects Spatial data mining Querying (Computer science) Keyword searching Language English Call number Thesis CSED 2019 Chan DOI 10.14711/thesis-991012757468703412

Full record

Spatio-textual data analysis via co-location mining and collective spatial keyword queries

by Kai Ho Chan

Post a Comment Cancel reply