Abstract
We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.
Original language | English |
---|---|
Title of host publication | Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers |
Pages | 100-114 |
Number of pages | 15 |
DOIs | |
State | Published - 10 Jul 2006 |
Event | 1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005 - Utrecht, Netherlands Duration: 25 Jul 2005 → 25 Jul 2005 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 3898 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Other
Other | 1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005 |
---|---|
Country | Netherlands |
City | Utrecht |
Period | 25/07/05 → 25/07/05 |
Fingerprint
Cite this
}
Unifying convergence and no-regret in multiagent learning. / Banerjee, Bikramjit; Peng, Jing.
Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers. 2006. p. 100-114 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3898 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Unifying convergence and no-regret in multiagent learning
AU - Banerjee, Bikramjit
AU - Peng, Jing
PY - 2006/7/10
Y1 - 2006/7/10
N2 - We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.
AB - We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.
UR - http://www.scopus.com/inward/record.url?scp=33745635029&partnerID=8YFLogxK
U2 - 10.1007/11691839_5
DO - 10.1007/11691839_5
M3 - Conference contribution
AN - SCOPUS:33745635029
SN - 3540330534
SN - 9783540330530
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 100
EP - 114
BT - Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers
ER -