THESIS
2011
xx, 150 p. : ill. (chiefly col.) ; 30 cm
Abstract
This thesis is devoted to the extension of the recently developed direct comparison approach from the performance optimization of finite Markov decision processes (MDPs) to the optimization of continuous-time continuous-state (CTCS) MDPs and partially observable Markov decision processes (POMDPs). Besides the theoretical contributions, we apply the approach to solve some portfolio management problems....[
Read more ]
This thesis is devoted to the extension of the recently developed direct comparison approach from the performance optimization of finite Markov decision processes (MDPs) to the optimization of continuous-time continuous-state (CTCS) MDPs and partially observable Markov decision processes (POMDPs). Besides the theoretical contributions, we apply the approach to solve some portfolio management problems.
First, by revisiting the completion-of-squares technique for the linear quadratic Gaussian problem, we interpret this technique from a new angle, based on which we extend the direct comparison approach to the CTCS MDPs. Without the introduction of dynamic programming, we derive the optimality equation for the long-run average gain-optimal policy. This approach is simple and direct since the derivation for the gain-optimal policy does not depend on the results of either discounted MDPs or finite-horizon MDPs.
Second, we propose a practical method to obtain a sub-optimal policy of POMDPs. Based on the internal state, we construct a global-state POMDP whose optimal policy is optimal for the original POMDP in a reduced policy space. Then we solve this global-state POMDP by the direct comparison approach. We find that, if including more information in the internal state, the sub-optimal policy obtained will be closer to the real optimal one. Therefore, the approach provides a tradeoff between policy precision and computation consumption.
Furthermore, we apply the approach to portfolio managements in financial engineering. We first consider a market with deterministic parameters and find that the explicit solution of the mean variance portfolio selection in a continuous-time setting can be easily derived by the direct comparison approach. We also consider a more practical market in which the parameters are stochastic and unobservable. We formulate the portfolio management in such an environment as a POMDP optimization and apply the direct comparison approach to solve this POMDP.
Post a Comment