Unifying convergence and no-regret in multiagent learning

Bikramjit Banerjee, Jing Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


We present a new multiagent learning algorithm, RV σ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive leaning rate of RV σ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV σ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret pay-off against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.

Original languageEnglish
Title of host publicationLearning and Adaption in Multi-Agent Systems - First International Workshop, LAMAS 2005, Revised Selected Papers
Number of pages15
StatePublished - 2006
Event1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005 - Utrecht, Netherlands
Duration: 25 Jul 200525 Jul 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3898 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other1st International Workshop on Learning and Adaption in Multi-Agent Systems, LAMAS 2005


Dive into the research topics of 'Unifying convergence and no-regret in multiagent learning'. Together they form a unique fingerprint.

Cite this