Unifying convergence and no-regret in multiagent learning

Bikramjit Banerjee, Jing Peng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.

Original languageEnglish
Title of host publicationLearning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers
Pages100-114
Number of pages15
DOIs
StatePublished - 10 Jul 2006
Event1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005 - Utrecht, Netherlands
Duration: 25 Jul 200525 Jul 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3898 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005
CountryNetherlands
CityUtrecht
Period25/07/0525/07/05

Fingerprint

Multiagent Learning
Regret
Nash Equilibrium
Learning algorithms
Game
Convergence to Equilibrium
Arbitrary
Learning Algorithm
Dependent

Cite this

Banerjee, B., & Peng, J. (2006). Unifying convergence and no-regret in multiagent learning. In Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers (pp. 100-114). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3898 LNAI). https://doi.org/10.1007/11691839_5
Banerjee, Bikramjit ; Peng, Jing. / Unifying convergence and no-regret in multiagent learning. Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers. 2006. pp. 100-114 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e691b14f199e4de49ca475b4366c8127,
title = "Unifying convergence and no-regret in multiagent learning",
abstract = "We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.",
author = "Bikramjit Banerjee and Jing Peng",
year = "2006",
month = "7",
day = "10",
doi = "10.1007/11691839_5",
language = "English",
isbn = "3540330534",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "100--114",
booktitle = "Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers",

}

Banerjee, B & Peng, J 2006, Unifying convergence and no-regret in multiagent learning. in Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3898 LNAI, pp. 100-114, 1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005, Utrecht, Netherlands, 25/07/05. https://doi.org/10.1007/11691839_5

Unifying convergence and no-regret in multiagent learning. / Banerjee, Bikramjit; Peng, Jing.

Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers. 2006. p. 100-114 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3898 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Unifying convergence and no-regret in multiagent learning

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2006/7/10

Y1 - 2006/7/10

N2 - We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.

AB - We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.

UR - http://www.scopus.com/inward/record.url?scp=33745635029&partnerID=8YFLogxK

U2 - 10.1007/11691839_5

DO - 10.1007/11691839_5

M3 - Conference contribution

AN - SCOPUS:33745635029

SN - 3540330534

SN - 9783540330530

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 100

EP - 114

BT - Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers

ER -

Banerjee B, Peng J. Unifying convergence and no-regret in multiagent learning. In Learning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers. 2006. p. 100-114. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11691839_5