Convergence of Least Squares Temporal Difference Methods Under General Conditions

Show full item record

Permalink

http://hdl.handle.net/10138/17116
Title: Convergence of Least Squares Temporal Difference Methods Under General Conditions
Author: Yu, Huizhen
Belongs to series: Department of Computer Science Series of Publications C Report C-2010-1
Abstract: We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD(λ). We establish for the discounted cost criterion that the off-policy LSTD(λ) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces, in particular, the e-chains.
URI: http://hdl.handle.net/10138/17116
Date: 2010-06-15


Files in this item

Total number of downloads: Loading...

Files Size Format View
lstd_offpolicy_Y.pdf 332.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record