Convergent gradient ascent in general-sum games

Bikramjit Banerjee, Jing Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.

Original languageEnglish
Title of host publicationMachine Learning
Subtitle of host publicationECML 2002 - 13th European Conference on Machine Learning, Proceedings
EditorsTapio Elomaa, Heikki Mannila, Hannu Toivonen
PublisherSpringer Verlag
Pages1-9
Number of pages9
ISBN (Print)9783540440369
StatePublished - 1 Jan 2002
Event13th European Conference on Machine Learning, ECML 2002 - Helsinki, Finland
Duration: 19 Aug 200223 Aug 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2430
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th European Conference on Machine Learning, ECML 2002
CountryFinland
CityHelsinki
Period19/08/0223/08/02

Fingerprint Dive into the research topics of 'Convergent gradient ascent in general-sum games'. Together they form a unique fingerprint.

Cite this