Incremental multi-step Q-learning

Jing Peng, Ronald J. Williams

Research output: Contribution to journalArticleResearchpeer-review

145 Citations (Scopus)

Abstract

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovtan effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

Original languageEnglish
Pages (from-to)283-290
Number of pages8
JournalMachine Learning
Volume22
Issue number1-3
DOIs
StatePublished - 1 Jan 1996

Fingerprint

Reinforcement learning
Dynamic programming
Computer simulation

Keywords

  • Reinforcement learning
  • Temporal difference learning

Cite this

Peng, Jing ; Williams, Ronald J. / Incremental multi-step Q-learning. In: Machine Learning. 1996 ; Vol. 22, No. 1-3. pp. 283-290.
@article{9ffc417796a34db1b348d33f471f0c05,
title = "Incremental multi-step Q-learning",
abstract = "This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovtan effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.",
keywords = "Reinforcement learning, Temporal difference learning",
author = "Jing Peng and Williams, {Ronald J.}",
year = "1996",
month = "1",
day = "1",
doi = "10.1007/BF00114731",
language = "English",
volume = "22",
pages = "283--290",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1-3",

}

Incremental multi-step Q-learning. / Peng, Jing; Williams, Ronald J.

In: Machine Learning, Vol. 22, No. 1-3, 01.01.1996, p. 283-290.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Incremental multi-step Q-learning

AU - Peng, Jing

AU - Williams, Ronald J.

PY - 1996/1/1

Y1 - 1996/1/1

N2 - This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovtan effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

AB - This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovtan effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

KW - Reinforcement learning

KW - Temporal difference learning

UR - http://www.scopus.com/inward/record.url?scp=0000955979&partnerID=8YFLogxK

U2 - 10.1007/BF00114731

DO - 10.1007/BF00114731

M3 - Article

VL - 22

SP - 283

EP - 290

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1-3

ER -