THESIS
2019
ix, 56 pages : illustrations ; 30 cm
Abstract
In this thesis, we study Policy Prediction Network and Policy Tree Network, both are
deep reinforcement learning architectures offering ways to improve sample complexity
and performance on continuous control problems. Furthermore, Policy Tree Network
offers the ability to trade extra computation at test time for improved performance via
decision-time planning. Performance gains are still observed even in the case of not using
decision-time planning(i.e. no extra computation cost relative to the model-free baseline).
Our approach integrates a mix between model-free and model-based reinforcement learning.
Policy Prediction Network is the first to introduce an implicit model-based approach
to Policy Gradient algorithms in continuous action space. Policy Tree Network is the
first t...[
Read more ]
In this thesis, we study Policy Prediction Network and Policy Tree Network, both are
deep reinforcement learning architectures offering ways to improve sample complexity
and performance on continuous control problems. Furthermore, Policy Tree Network
offers the ability to trade extra computation at test time for improved performance via
decision-time planning. Performance gains are still observed even in the case of not using
decision-time planning(i.e. no extra computation cost relative to the model-free baseline).
Our approach integrates a mix between model-free and model-based reinforcement learning.
Policy Prediction Network is the first to introduce an implicit model-based approach
to Policy Gradient algorithms in continuous action space. Policy Tree Network is the
first to leverage an implicit model for decision-time planning in continuous action space.
Learning the implicit model is made possible via the empirically justified clipping scheme
and depth based objectives. Leveraging the implicit model for decision-time planning is
feasible as a result of our tree expansion and backup algorithm. Our experiments are focused
on the MuJoCo environments so that they can be compared with similar work done in this area.
Post a Comment