Least Squares Temporal Difference Methods: An Analysis Under General Conditions

Show full item record


Title: Least Squares Temporal Difference Methods: An Analysis Under General Conditions
Author: Huizhen, Yu
Date: 2010-10-15
Belongs to series: Department of Computer Science Series of Publications C Report C-2010-39
URI: http://hdl.handle.net/10138/17799
Abstract: We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference algorithm, LSTD(λ), in an explorationenhanced off-policy learning context. We establish for the discounted cost criterion that the off-policy LSTD(λ) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on topological spaces. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of an off-policy TD(λ) algorithm and extensions to MDP with compact action and state spaces.
Description: This technical report is a revised and extended version of the technical report C-2010-1. It contains simplified and improved proofs, as well as extensions of some of the earlier results.
Subject: Markov decision processes
approximate dynamic programming
temporal difference methods
importance sampling
Markov chains

Files in this item

Total number of downloads: Loading...

Files Size Format View
lstd_rev_Y.pdf 652.4Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record