THESIS
2006
xx, 163 leaves : ill. ; 30 cm
Abstract
This thesis is devoted to the study on performance optimization of Markov systems. As is well known, performance optimization plays an important role in both applied and theoretical research. Many practical systems arisen in the physical, biological, and social sciences as well as in engineering and commerce fit Markov models. Hence, performance optimization of Markov systems attracts wide attention in many fields....[
Read more ]
This thesis is devoted to the study on performance optimization of Markov systems. As is well known, performance optimization plays an important role in both applied and theoretical research. Many practical systems arisen in the physical, biological, and social sciences as well as in engineering and commerce fit Markov models. Hence, performance optimization of Markov systems attracts wide attention in many fields.
First, we propose a new sensitivity-based performance optimization approach to theory of finite Markov decision processes (MDPs), including ergodic and multi-chain systems. The criteria to be optimized in this thesis are average opti-mality, bias optimality, and high-order bias optimality. Based on average reward and bias difference formulas derived in this thesis, we develop a complete opti-mization theory for average optimality and nth-bias optimality. We not only show the existence of average optimal policies and nth-bias optimal policies, but also give one-phase policy iteration algorithms leading to nth-bias optimal policies, in a unified way, which are more efficient than the two-phase algorithms in the previous literature. Our approach is simple, direct, natural and intuitive since it depends on neither Laurent series expansion nor discounted MDPs.
Furthermore, we obtain some very interesting properties for high-order bias optimality. More precisely, we establish high-order bias optimality equations and prove that each nth-bias optimal policy has to satisfy the first (n+1) bias optimality equations. We also prove that a policy which satisfies the first (n+2) bias optimality equations must be an nth-bias optimal policy.
With sensitivity point of view of optimization and fiexible construction of sen-sitivity formulas, we also propose an event-based optimization approach. This approach utilizes special feature of a system and illustrates how potentials can be aggregated based on the special feature. Aggregated potentials can be used to build two performance sensitivity formulas which lead to gradient based opti-mization and, with some conditions, event-based policy iteration. This approach applies to many practical problems that do not fit the standard MDP formu-lation very well. It provides structural insights and overcomes some difficulties associated with determining transition probability matrices in large state spaces.
Post a Comment