Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Show full item record


Title: Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Author: Bertsekas, Dimitri P.; Yu, Huizhen
Date: 2010-06-15
Language: en
Belongs to series: Report LIDS - 2831 - Also as: Department of Computer Science Series of Publications C Report C-2010-10
URI: http://hdl.handle.net/10138/17117
Abstract: We consider the classical nite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for fi nding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm requires (possibly inexact) solution of a nonlinear system of equations, involving estimates of state costs as well as Q-factors. This is Bellman's equation for an optimal stopping problem that can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modi ed policy iteration, with lower overhead and more reliable convergence advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal di erence implementations are used, our algorithm resolves e ffectively the inherent difficulties of existing schemes due to inadequate exploration.

Files in this item

Total number of downloads: Loading...

Files Size Format View
Enhanced_Policy_Iteration_BY.pdf 610.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record