Efficient learning of multi-step best response

Bikramjit Banerjee, Jing Peng

Research output: Contribution to conferencePaperResearchpeer-review

5 Citations (Scopus)

Abstract

We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.

Original languageEnglish
Pages209-215
Number of pages7
StatePublished - 1 Dec 2005
Event4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05 - Utrecht, Netherlands
Duration: 25 Jul 200529 Jul 2005

Other

Other4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05
CountryNetherlands
CityUtrecht
Period25/07/0529/07/05

Fingerprint

Step response
Data storage equipment
Experiments

Keywords

  • Efficient Learning
  • Game Theory
  • Multiagent Learning

Cite this

Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. 209-215. Paper presented at 4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05, Utrecht, Netherlands.
Banerjee, Bikramjit ; Peng, Jing. / Efficient learning of multi-step best response. Paper presented at 4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05, Utrecht, Netherlands.7 p.
@conference{89cdb2fdc1ab4453bce2d63887c4ae7b,
title = "Efficient learning of multi-step best response",
abstract = "We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.",
keywords = "Efficient Learning, Game Theory, Multiagent Learning",
author = "Bikramjit Banerjee and Jing Peng",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "209--215",
note = "null ; Conference date: 25-07-2005 Through 29-07-2005",

}

Banerjee, B & Peng, J 2005, 'Efficient learning of multi-step best response' Paper presented at 4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05, Utrecht, Netherlands, 25/07/05 - 29/07/05, pp. 209-215.

Efficient learning of multi-step best response. / Banerjee, Bikramjit; Peng, Jing.

2005. 209-215 Paper presented at 4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05, Utrecht, Netherlands.

Research output: Contribution to conferencePaperResearchpeer-review

TY - CONF

T1 - Efficient learning of multi-step best response

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.

AB - We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.

KW - Efficient Learning

KW - Game Theory

KW - Multiagent Learning

UR - http://www.scopus.com/inward/record.url?scp=33644801368&partnerID=8YFLogxK

M3 - Paper

SP - 209

EP - 215

ER -

Banerjee B, Peng J. Efficient learning of multi-step best response. 2005. Paper presented at 4th International Conference on Autonomous Agents and Multi agent Systems, AAMAS 05, Utrecht, Netherlands.