THESIS
2018
xiv, 71 pages : illustrations ; 30 cm
Abstract
One of the most important problems in artificial intelligence is learning to solve control
problems without human supervision. Recent advances in deep reinforcement learning
methods have achieved significant progress in this domain. Researchers have solved a
number of problems including a subset of Atari games [3], the Go game [4], and several
simple robot control environments [5]. However, a general solution to more realistic
problems is still missing. Most real-world robot control problems have multi-modal state
spaces, which usually consist of both a low-dimensional motion sensor input and a high-dimensional
image sensor input. Apart from that, a smooth and informative reward signal
is usually unavailable, and the agent is only provided a reward signal that is sparse and
dis...[
Read more ]
One of the most important problems in artificial intelligence is learning to solve control
problems without human supervision. Recent advances in deep reinforcement learning
methods have achieved significant progress in this domain. Researchers have solved a
number of problems including a subset of Atari games [3], the Go game [4], and several
simple robot control environments [5]. However, a general solution to more realistic
problems is still missing. Most real-world robot control problems have multi-modal state
spaces, which usually consist of both a low-dimensional motion sensor input and a high-dimensional
image sensor input. Apart from that, a smooth and informative reward signal
is usually unavailable, and the agent is only provided a reward signal that is sparse and
discrete.
We study a set of continuous control problems with multi-modal state space. A subset
of the problems also have sparse reward functions. We propose several techniques
to improve the performance of flat reinforcement learning methods on the multi-modal
state-space problems. The proposed techniques include the Wasserstein actor critic trust-region
method (W-KTR), the exceptional advantage regularization method, and the robust
concentric Gaussian mixture policy model. Experiment results show that the proposed
techniques, especially the exceptional advantage regularization method, lead to considerable performance improvement. A hierarchical reinforcement learning method,
namely the flexible-scheduling hierarchical method, is proposed for the challenging problems
with multi-modal state spaces and sparse rewards. Experiment results show that the
flexible-scheduling hierarchical method can solve these problems without domain-specific
knowledge given a set of pre-defined source tasks.
Post a Comment