THESIS
2001
xv, 108 leaves : ill. ; 30 cm
Abstract
Markov decision process is usually used as an underlying model for decision-theoretic planning. It models the world as a number of states and represents the uncertainty in effects of actions by transition probabilities. It also incorporates the idea of utility by specifying the reward or cost of taking an action at a certain state. Partially observable Markov decision process (POMDP) extends this model and allows modeling incomplete information about the current state....[
Read more ]
Markov decision process is usually used as an underlying model for decision-theoretic planning. It models the world as a number of states and represents the uncertainty in effects of actions by transition probabilities. It also incorporates the idea of utility by specifying the reward or cost of taking an action at a certain state. Partially observable Markov decision process (POMDP) extends this model and allows modeling incomplete information about the current state.
POMDP is computationally more complex due to the uncertainty of the current state. Exact algorithms have been proposed, but they are capable of solving POMDPs with only small number of states. Research efforts have been put into alleviating this situation mainly using two approaches: (1) using heuristic algorithms to find approximate solutions, and (2) factorizing the state space into a more compact representation.
This thesis focuses on the first approach and proposes a fast heuristic algorithm, called point-based update approximation (PBUA) . Experimental results show that PBUA allows solving POMDPs in much shorter time than exact algorithms, without compromising the quality of the solution much. With the addition of an expansion mechanism, it gives policies that are of similar quality as the optimal policies found by exact algorithms in the smaller POMDP problems. In the larger problems, which are out of the capability of the exact algorithms, PBUA computes policies among the best found in the literature, with solving times not worse than most of the other heuristic algorithms.
Post a Comment