Browsing by Author "Yu, Huizhen"

Sort by: Order: Results:

Now showing items 1-4 of 4
  • Yu, Huizhen; Rousu, Juho (2008)
    We consider structured prediction problems with a parametrized linear prediction function, and the associated parameter optimization problems in large margin type of discriminative training. We propose a dual optimization approach which uses the restricted simplicial decomposition method to optimize a reparametrized dual problem. Our reparametrization reduces the dimension of the space of the dual function to one that is linear in the number of parameters and training examples, and hence independent of the dimensionality of the prediction outputs. This in conjunction with simplicial decomposition makes our approach efficient. We discuss the connections of our approach with related earlier works, and we show its advantages.
  • Yu, Huizhen (2010)
    We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD(λ). We establish for the discounted cost criterion that the off-policy LSTD(λ) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces, in particular, the e-chains.
  • Yu, Huizhen; Bertsekas, Dimitri P. (2008)
    We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markov decision processes (MDP), one of our bounds is always sharper than the standard contraction-based bounds, and another one is often sharper. The former bound is also tight in a worst-case sense. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.
  • Bertsekas, Dimitri P.; Yu, Huizhen (2010)
    We consider the classical nite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for fi nding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm requires (possibly inexact) solution of a nonlinear system of equations, involving estimates of state costs as well as Q-factors. This is Bellman's equation for an optimal stopping problem that can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modi ed policy iteration, with lower overhead and more reliable convergence advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal di erence implementations are used, our algorithm resolves e ffectively the inherent difficulties of existing schemes due to inadequate exploration.