TY - GEN
T1 - Incremental Multi-Step Q-Learning
AU - Peng, Jing
AU - Williams, Ronald J.
N1 - Publisher Copyright:
© 1994 Proceedings of the 11th International Conference on Machine Learning, ICML 1994. All rights reserved.
PY - 1994
Y1 - 1994
N2 - This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter A is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Qft)learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm is demonstrated through computer simulations of the standard benchmark control problem of learning to balance a pole on a cart.
AB - This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter A is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Qft)learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm is demonstrated through computer simulations of the standard benchmark control problem of learning to balance a pole on a cart.
UR - http://www.scopus.com/inward/record.url?scp=85152551400&partnerID=8YFLogxK
U2 - 10.1016/B978-1-55860-335-6.50035-0
DO - 10.1016/B978-1-55860-335-6.50035-0
M3 - Conference contribution
AN - SCOPUS:85152551400
T3 - Proceedings of the 11th International Conference on Machine Learning, ICML 1994
SP - 226
EP - 232
BT - Proceedings of the 11th International Conference on Machine Learning, ICML 1994
A2 - Cohen, William W.
A2 - Hirsh, Haym
PB - Morgan Kaufmann Publishers, Inc.
T2 - 11th International Conference on Machine Learning, ICML 1994
Y2 - 10 July 1994 through 13 July 1994
ER -