Some research issues in hash function learning

HKUST Electronic Theses

Some research issues in hash function learning

by Yi Zhen

THESIS 2012

Ph.D. Computer Science and Engineering

xii, 100 p. : ill. ; 30 cm

Abstract

Over the past decade, hashing-based methods for large-scale similarity search have sparked considerable research interest in the database, data mining and information retrieval communities. These methods achieve very fast search speed by indexing data with binary codes. Although lots of hash functions for various similarity metrics have been proposed, they often generate very long codes due to their data independence nature. In recent years, machine learning techniques have been applied to learn hash functions from data, forming a new research topic called hash function learning.

In this thesis, we study two important issues in hash function learning. On one hand, existing supervised or semi-supervised hash function learning methods, which learn hash functions from labeled data, can be regarded to be passive because they assume that the labeled data are provided in advance. Given that the data labeling process can be very costly in practice and the contribution of labeled data to hash function learning can be quite different, it may be more cost effective for the hash function learning methods to select labeled data from which to learn. To this end, we propose a novel framework, termed active hashing, to actively select the most informative data to label for hash function learning. Under the framework, we develop one simple method which queries data labels that the current hash functions are most uncertain about. Experiments conducted on two real data sets show obvious improvement of our active hashing algorithm over previous passive hashing methods. On the other hand, most existing hash function learning methods only work on unimodal data, which are obviously not the case in many applications, e.g., multimedia retrieval and cross-lingual document analysis. To apply hash function learning to multimodal data, we develop three methods under the framework of multimodal hashing which hashes data points of multiple modalities into one common Hamming space. For aligned data, the first method is based on spectral analysis of the correlation of the multimodal data. For graph data, the second method falls into the category of latent feature models and the hash codes can be obtained through Bayesian inference. For general data, we propose a boosted co-regularization model which can be efficiently learned by stochastic gradient-based algorithms. The effectiveness of our models is validated through extensive comparative study on crossmodal multimedia retrieval.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Yeung, Dit-Yan Authors Zhen, Yi Subjects Hashing (Computer science) Data encryption (Computer science) Computer security Language English Call number Thesis CSED 2012 Zhen DOI 10.14711/thesis-b1198629

Full record

Some research issues in hash function learning

by Yi Zhen

Post a Comment Cancel reply