Fast concurrent reinforcement learners

Bikramjit Banerjee, Sandip Sen, Jing Peng

Research output: Contribution to journalConference articleResearchpeer-review

23 Citations (Scopus)

Abstract

When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.

Original languageEnglish
Pages (from-to)825-830
Number of pages6
JournalIJCAI International Joint Conference on Artificial Intelligence
StatePublished - 1 Dec 2001
Event17th International Joint Conference on Artificial Intelligence, IJCAI 2001 - Seattle, WA, United States
Duration: 4 Aug 200110 Aug 2001

Fingerprint

Reinforcement
Multi agent systems

Cite this

@article{8063249e535443b58a20abb328466425,
title = "Fast concurrent reinforcement learners",
abstract = "When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.",
author = "Bikramjit Banerjee and Sandip Sen and Jing Peng",
year = "2001",
month = "12",
day = "1",
language = "English",
pages = "825--830",
journal = "IJCAI International Joint Conference on Artificial Intelligence",
issn = "1045-0823",

}

Fast concurrent reinforcement learners. / Banerjee, Bikramjit; Sen, Sandip; Peng, Jing.

In: IJCAI International Joint Conference on Artificial Intelligence, 01.12.2001, p. 825-830.

Research output: Contribution to journalConference articleResearchpeer-review

TY - JOUR

T1 - Fast concurrent reinforcement learners

AU - Banerjee, Bikramjit

AU - Sen, Sandip

AU - Peng, Jing

PY - 2001/12/1

Y1 - 2001/12/1

N2 - When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.

AB - When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.

UR - http://www.scopus.com/inward/record.url?scp=84880876200&partnerID=8YFLogxK

M3 - Conference article

SP - 825

EP - 830

JO - IJCAI International Joint Conference on Artificial Intelligence

JF - IJCAI International Joint Conference on Artificial Intelligence

SN - 1045-0823

ER -