Yliopiston etusivulle Suomeksi På svenska In English Helsingin yliopisto

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming (Revised)

Show simple item record

dc.contributor.author Bertsekas, Dimitri P.
dc.contributor.author Huizhen, Yu
dc.date.accessioned 2010-10-20T12:10:04Z
dc.date.available 2010-10-20T12:10:04Z
dc.date.issued 2010-10-20T12:10:04Z
dc.identifier.uri http://hdl.handle.net/10138/17851
dc.description The revised technical report C-2010-10 en
dc.description.abstract We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm requires (possibly inexact) solution of a nonlinear system of equations, involving estimates of state costs as well as Q-factors. This is Bellman's equation for an optimal stopping problem that can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead and/or more reliable convergence advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration. en
dc.language.iso en en
dc.relation.ispartofseries Report LIDS - 2831 en
dc.relation.ispartofseries Also as: Department of Computer Science Series of Publications C Report C-2010-10 en
dc.title Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming (Revised) en
dc.type Technical report en
dc.identifier.laitoskoodi Department of Computer Science en

Files in this item

Files Description Size Format View/Open
Enhanced_Policy_Iteration_rev_BY.pdf 2.642Mb PDF View/Open
This item appears in the following Collection(s)

Show simple item record

Search Helda


Advanced Search

Browse

My Account