Efficient no-regret multiagent learning

Bikramjit Banerjee, Jing Peng

Research output: Contribution to conferencePaperResearchpeer-review

15 Citations (Scopus)

Abstract

We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.

Original languageEnglish
Pages41-46
Number of pages6
StatePublished - 1 Dec 2005
Event20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States
Duration: 9 Jul 200513 Jul 2005

Other

Other20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
CountryUnited States
CityPittsburgh, PA
Period9/07/0513/07/05

Fingerprint

Polynomials
Sampling

Cite this

Banerjee, B., & Peng, J. (2005). Efficient no-regret multiagent learning. 41-46. Paper presented at 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States.
Banerjee, Bikramjit ; Peng, Jing. / Efficient no-regret multiagent learning. Paper presented at 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States.6 p.
@conference{22bf73ba88f34302933800ba867e7623,
title = "Efficient no-regret multiagent learning",
abstract = "We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.",
author = "Bikramjit Banerjee and Jing Peng",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "41--46",
note = "null ; Conference date: 09-07-2005 Through 13-07-2005",

}

Banerjee, B & Peng, J 2005, 'Efficient no-regret multiagent learning' Paper presented at 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States, 9/07/05 - 13/07/05, pp. 41-46.

Efficient no-regret multiagent learning. / Banerjee, Bikramjit; Peng, Jing.

2005. 41-46 Paper presented at 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States.

Research output: Contribution to conferencePaperResearchpeer-review

TY - CONF

T1 - Efficient no-regret multiagent learning

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.

AB - We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents, (2) close to the asymptotic optimal payoff against opponents that play a converging sequence of policies, and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents, with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.

UR - http://www.scopus.com/inward/record.url?scp=29344441750&partnerID=8YFLogxK

M3 - Paper

SP - 41

EP - 46

ER -

Banerjee B, Peng J. Efficient no-regret multiagent learning. 2005. Paper presented at 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States.