TY - JOUR
T1 - On-policy concurrent reinforcement learning
AU - Banerjee, Bikramjit
AU - Sen, Sandip
AU - Peng, Jing
PY - 2004/10
Y1 - 2004/10
N2 - When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such algorithms, with the promise of scalability and efficiency. In particular, it is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions. It is also shown, experimentally, that the new techniques can learn (from self-play) better policies than the previous algorithms (also in self-play) during some phases of the exploration.
AB - When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such algorithms, with the promise of scalability and efficiency. In particular, it is proven that these hybrid techniques are guaranteed to converge to their desired fixed points under some restrictions. It is also shown, experimentally, that the new techniques can learn (from self-play) better policies than the previous algorithms (also in self-play) during some phases of the exploration.
KW - Game theory
KW - Multi-agent learning
KW - On-policy reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=9144256373&partnerID=8YFLogxK
U2 - 10.1080/09528130412331297956
DO - 10.1080/09528130412331297956
M3 - Article
AN - SCOPUS:9144256373
SN - 0952-813X
VL - 16
SP - 245
EP - 260
JO - Journal of Experimental and Theoretical Artificial Intelligence
JF - Journal of Experimental and Theoretical Artificial Intelligence
IS - 4
ER -