Analyzing data with regret minimization query

HKUST Electronic Theses

Analyzing data with regret minimization query

by Min Xie

THESIS 2019

Ph.D. Computer Science and Engineering

xv, 170 pages : illustrations ; 30 cm

Abstract

Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top-k queries and skyline queries. A top-k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a regret minimization query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus, it avoids those deficiencies of top-k queries and skyline queries. Specifically, it returns a small set of tuples such that any user’s favorite tuple in this subset is guaranteed to be not much worse than his/her favorite tuple in the whole database. In a sense, this query finds a small set of tuples that makes the user happy (i.e., not regretful) even if s/he gets the best tuple in the selected set but not the best tuple among all tuples in the database.

In this thesis, we first study two versions of the regret minimization query: (1) the min-error version: we want to minimize the regret level of each user while guaranteeing the output size is at most k where k is the maximum output size. This problem is also known as the k-regret query, and (2) the min-size version: we want to determine the minimum number of tuples needed to keep users happy at a given level. We term this problem as the α-happiness query where α represents a happiness threshold. In both versions, we quantify the user’s regret level (and happiness level) by a criterion, called the regret ratio (and the happiness ratio). A small regret ratio (i.e., a large happiness ratio) indicates the user is satisfied with the set returned. In addition, we study how to enhance the regret minimization query with user interactions: when presented with a small number of tuples (which can be artificial tuples or true tuples inside the database), a user is asked to indicate the tuple s/he favors the most among them. In particular, we are also interested in the special case of determining the favorite tuple for a user in the entire database with a small amount of interaction, measured by the number of questions we ask the user.

Finding the optimal solution for a regret minimization query is an NP-hard problem. We derive efficient approximate solutions for both versions of regret minimization. We also present a generic framework for interactive regret minimization, under which we propose algorithms that ask an asymptotically optimal number of questions in 2-dimensional spaces and algorithms with provable performance guarantees in d-dimensional spaces (d ≥ 2) where each dimension corresponds to a description of a tuple. We performed extensive experiments using both real and synthetic datasets. Our evaluations show that our algorithms outperform the best-known previous approaches in three ways: (i) they answer the regret minimization queries by guaranteeing a smaller regret ratio and returning fewer tuples to users, (ii) they do so much faster (up to two orders of magnitude times improvement), and (iii) they outperform the existing interactive regret minimization algorithms by locating the favorite tuple and guaranteeing a small regret ratio with much fewer questions.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Wong, Raymond Chi-Wing Authors Xie, Min Subjects Data mining Mathematical models Query languages (Computer science) Electronic data processing Language English Call number Thesis CSED 2019 Xie DOI 10.14711/thesis-991012757567603412

Full record

Analyzing data with regret minimization query

by Min Xie

Post a Comment Cancel reply