THESIS
2020
xiv, 131 pages : illustrations (chiefly color) ; 30 cm
Abstract
Deep learning has achieved significant success in numerous areas of artificial intelligence. However, problems arise when deep learning is employed in real-world autonomous
robotic systems. In this thesis, we try to address these problems in both perception and
motion planning in robotic manipulation, to bring a practical fully-automated intelligent
robotic system into reality.
For perception, firstly, we design a multiscopic vision system to acquire accurate depth
estimation of environments. Images in this system are captured with parallel, coplanar,
and same-parallax cameras, such that more constraints can be enforced to optimize the
depth estimation. A heuristic optimization method and a learning method are proposed
to fuse multiple cost volumes in this structure, which outpe...[
Read more ]
Deep learning has achieved significant success in numerous areas of artificial intelligence. However, problems arise when deep learning is employed in real-world autonomous
robotic systems. In this thesis, we try to address these problems in both perception and
motion planning in robotic manipulation, to bring a practical fully-automated intelligent
robotic system into reality.
For perception, firstly, we design a multiscopic vision system to acquire accurate depth
estimation of environments. Images in this system are captured with parallel, coplanar,
and same-parallax cameras, such that more constraints can be enforced to optimize the
depth estimation. A heuristic optimization method and a learning method are proposed
to fuse multiple cost volumes in this structure, which outperform traditional stereo methods, especially in occluded, reflective, and featureless regions. Then, we propose a self-supervised framework by employing the supervision from multiscopic images, where a
stereo network can be trained end-to-end without ground-truth depth information. By
enforcing proposed multiscopic losses, the network outperforms previous self-supervised
methods. To train this framework and to promote more work in multiscopic vision, we
build a new dataset with synthetic images rendered by 3D engines and real pictures captured by our multiscopic camera. Thirdly, to locate the target objects for manipulation,
we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework
for object tracking. Self-supervision can be performed by taking advantage of the cycle
consistency in the forward and backward tracking. To better leverage the end-to-end
learning of deep networks, we propose to integrate a Siamese region proposal and mask
regression network into our tracking framework so that a fast and accurate tracker can
be learned without the annotation of each frame.
For motion planning, we try to address three problems of deep reinforcement learning in robotics: how to represent the interaction between robot and environments before
policy learning, how to better explore the state space during policy learning, and how
to fill in the reality gap between the simulator and the real world after policy learning. For the state representation issue, we propose to represent global properties of the
robot-environment interaction with topology-based coordinates, the Writhe matrix and
the Laplacian coordinates. For the exploration issue, we introduce a potential field to
propose heuristic action sampling, which guides the robot to explore the action space
more efficiently rather than being stuck in suboptimal regions. For the reality gap issue,
we propose a new transfer learning method in Q-space to transfer the policy learned in
simulation to reality, with a small dataset of real-world episode pairs as supervision.
To evaluate our planning approaches, for state representation, we simulate a whole
arm manipulation scenario and show that the robot can learn an interaction task quickly
in topology space, and the policy can be generalized to unseen scenarios very well. For the
exploration and reality gap, we evaluate our method in both simulation and real setting in
a robotic rearrangement task to show that the proposed approach can effectively improve
the training process in simulation, and efficiently adapt the learned policy to the real-world
application, even when the camera pose is different from the simulation.
Post a Comment