Deep contextual modeling : exploiting context in spatial and temporal domains

HKUST Electronic Theses

Deep contextual modeling : exploiting context in spatial and temporal domains

by Yongyi Lu

THESIS 2018

Ph.D. Computer Science and Engineering

xvi, 79 pages : illustrations ; 30 cm

Abstract

Context plays a critical role in perceptual inference as it provides useful guidance to solve numerous tasks both in spatial and temporal domains (Divvala et al., 2009; Galleguillos et al., 2008; Mottaghi et al., 2014). In this dissertation, we study several fundamental computer vision problems, i.e., object detection, image generation and high-level image understanding, by exploiting different spatial-temporal context to boost their performance.

Driven by the recent development of deep neural nets, we propose deep contextual modeling in spatial and temporal domains. Context here refers to one of the following application scenarios, e.g., (1) temporal coherence and consistence for object detection from video frames; (2) spatial constraint for conditional image synthesis, i.e., generating image from sketch; (3) domain-specific knowledge such as fcial attributes for natural face image generation.

We first study the problem of exploiting temporal context for object detection from video, where applying single frame-based object detector directly in video sequence tends to produce high temporal variation on frame-level output. With the recent advent in sequential modeling, we exploit long-range visual context for temporal coherence and consistence by proposing a novel association LSTM framework, which solves the regression and association tasks in video simultaneously. Next we investigate image generation guided by hand sketch in spatial domain. We design a joint image representation for learning joint distribution and correspondence of sketch-image pair. A contextual GAN framework is proposed to pose image generation as a constrained image completion problem, where sketch serves as weak spatial context. Therefore the output images do not necessarily follow the ugly sketch while still realistic. Finally we explore domain-specific context, i.e., face attribute and attribute-guided face generation: we condition the CycleGAN and propose conditional CycleGAN, which is designed to allow easy control of the appearance of the generated face via the facial attribute or identity context. We demonstrate three applications for identity-guided face generation.

For future research directions, we will study deep nets for jointly learning spatial and temporal context and explore the possibility of solving all applications using one single model.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree Ph.D. Department Computer Science and Engineering Supervisors Tang, Chi-Keung Authors Lu, Yongyi Subjects Computer vision Digital video Data processing Image processing Digital techniques Context-aware computing Language English Call number Thesis CSED 2018 Lu DOI 10.14711/thesis-991012636568703412

Full record

Deep contextual modeling : exploiting context in spatial and temporal domains

by Yongyi Lu

Post a Comment Cancel reply