Incremental Multi-Step Q-Learning

Jing Peng, Ronald J. Williams

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

100 Scopus citations

Abstract

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter A is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Qft)learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm is demonstrated through computer simulations of the standard benchmark control problem of learning to balance a pole on a cart.

Original languageEnglish
Title of host publicationProceedings of the 11th International Conference on Machine Learning, ICML 1994
EditorsWilliam W. Cohen, Haym Hirsh
PublisherMorgan Kaufmann Publishers, Inc.
Pages226-232
Number of pages7
ISBN (Electronic)1558603352, 9781558603356
DOIs
StatePublished - 1994
Event11th International Conference on Machine Learning, ICML 1994 - New Brunswick, United States
Duration: 10 Jul 199413 Jul 1994

Publication series

NameProceedings of the 11th International Conference on Machine Learning, ICML 1994

Conference

Conference11th International Conference on Machine Learning, ICML 1994
Country/TerritoryUnited States
CityNew Brunswick
Period10/07/9413/07/94

Fingerprint

Dive into the research topics of 'Incremental Multi-Step Q-Learning'. Together they form a unique fingerprint.

Cite this