Adaptive Policy Gradient in Multiagent Learning

Bikramjit Banerjee, Jing Peng

Research output: Contribution to conferencePaperResearchpeer-review

30 Citations (Scopus)

Abstract

Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation, We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.

Original languageEnglish
Pages686-692
Number of pages7
StatePublished - 1 Dec 2003
EventProceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03 - Melbourne, Vic., Australia
Duration: 14 Jul 200318 Jul 2003

Other

OtherProceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03
CountryAustralia
CityMelbourne, Vic.
Period14/07/0318/07/03

Keywords

  • Game Theory
  • Gradient Ascent Learning
  • Nash Equilibria

Cite this

Banerjee, B., & Peng, J. (2003). Adaptive Policy Gradient in Multiagent Learning. 686-692. Paper presented at Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03, Melbourne, Vic., Australia.
Banerjee, Bikramjit ; Peng, Jing. / Adaptive Policy Gradient in Multiagent Learning. Paper presented at Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03, Melbourne, Vic., Australia.7 p.
@conference{c56f75ada5074406980d2e59f4cefe97,
title = "Adaptive Policy Gradient in Multiagent Learning",
abstract = "Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation, We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.",
keywords = "Game Theory, Gradient Ascent Learning, Nash Equilibria",
author = "Bikramjit Banerjee and Jing Peng",
year = "2003",
month = "12",
day = "1",
language = "English",
pages = "686--692",
note = "null ; Conference date: 14-07-2003 Through 18-07-2003",

}

Banerjee, B & Peng, J 2003, 'Adaptive Policy Gradient in Multiagent Learning' Paper presented at Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03, Melbourne, Vic., Australia, 14/07/03 - 18/07/03, pp. 686-692.

Adaptive Policy Gradient in Multiagent Learning. / Banerjee, Bikramjit; Peng, Jing.

2003. 686-692 Paper presented at Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03, Melbourne, Vic., Australia.

Research output: Contribution to conferencePaperResearchpeer-review

TY - CONF

T1 - Adaptive Policy Gradient in Multiagent Learning

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation, We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.

AB - Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation, We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.

KW - Game Theory

KW - Gradient Ascent Learning

KW - Nash Equilibria

UR - http://www.scopus.com/inward/record.url?scp=1142280919&partnerID=8YFLogxK

M3 - Paper

SP - 686

EP - 692

ER -

Banerjee B, Peng J. Adaptive Policy Gradient in Multiagent Learning. 2003. Paper presented at Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 03, Melbourne, Vic., Australia.