Convergent gradient ascent in general-sum games

Bikramjit Banerjee, Jing Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

5 Citations (Scopus)

Abstract

In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.

Original languageEnglish
Title of host publicationMachine Learning
Subtitle of host publicationECML 2002 - 13th European Conference on Machine Learning, Proceedings
EditorsTapio Elomaa, Heikki Mannila, Hannu Toivonen
PublisherSpringer Verlag
Pages1-9
Number of pages9
ISBN (Print)9783540440369
StatePublished - 1 Jan 2002
Event13th European Conference on Machine Learning, ECML 2002 - Helsinki, Finland
Duration: 19 Aug 200223 Aug 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2430
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th European Conference on Machine Learning, ECML 2002
CountryFinland
CityHelsinki
Period19/08/0223/08/02

Fingerprint

Learning Rate
Ascent
Game
Gradient
Nash Equilibrium
Convergence Properties
Justify
Converge
Scenarios
Simulation
Policy
Interpretation
Learning
Form

Cite this

Banerjee, B., & Peng, J. (2002). Convergent gradient ascent in general-sum games. In T. Elomaa, H. Mannila, & H. Toivonen (Eds.), Machine Learning: ECML 2002 - 13th European Conference on Machine Learning, Proceedings (pp. 1-9). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2430). Springer Verlag.
Banerjee, Bikramjit ; Peng, Jing. / Convergent gradient ascent in general-sum games. Machine Learning: ECML 2002 - 13th European Conference on Machine Learning, Proceedings. editor / Tapio Elomaa ; Heikki Mannila ; Hannu Toivonen. Springer Verlag, 2002. pp. 1-9 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{35b265c525b844c38715d9e932c118e8,
title = "Convergent gradient ascent in general-sum games",
abstract = "In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.",
author = "Bikramjit Banerjee and Jing Peng",
year = "2002",
month = "1",
day = "1",
language = "English",
isbn = "9783540440369",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "1--9",
editor = "Tapio Elomaa and Heikki Mannila and Hannu Toivonen",
booktitle = "Machine Learning",

}

Banerjee, B & Peng, J 2002, Convergent gradient ascent in general-sum games. in T Elomaa, H Mannila & H Toivonen (eds), Machine Learning: ECML 2002 - 13th European Conference on Machine Learning, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2430, Springer Verlag, pp. 1-9, 13th European Conference on Machine Learning, ECML 2002, Helsinki, Finland, 19/08/02.

Convergent gradient ascent in general-sum games. / Banerjee, Bikramjit; Peng, Jing.

Machine Learning: ECML 2002 - 13th European Conference on Machine Learning, Proceedings. ed. / Tapio Elomaa; Heikki Mannila; Hannu Toivonen. Springer Verlag, 2002. p. 1-9 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2430).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Convergent gradient ascent in general-sum games

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2002/1/1

Y1 - 2002/1/1

N2 - In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.

AB - In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.

UR - http://www.scopus.com/inward/record.url?scp=84945262235&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540440369

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1

EP - 9

BT - Machine Learning

A2 - Elomaa, Tapio

A2 - Mannila, Heikki

A2 - Toivonen, Hannu

PB - Springer Verlag

ER -

Banerjee B, Peng J. Convergent gradient ascent in general-sum games. In Elomaa T, Mannila H, Toivonen H, editors, Machine Learning: ECML 2002 - 13th European Conference on Machine Learning, Proceedings. Springer Verlag. 2002. p. 1-9. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).