TY - GEN
T1 - The role of reactivity in multiagent learning
AU - Banerjee, Bikramjit
AU - Peng, Jing
PY - 2004
Y1 - 2004
N2 - In this paper we take a closer look at a recently proposed classification scheme for multiagent learning algorithms. Based on this scheme an exploitation mechanism (we call it the Exploiter) was developed that could beat various Policy Hill Climbers (PHC) and other fair opponents in some repeated matrix games. We show on the contrary that some fair opponents may actually beat the Exploiter in repeated games. This clearly indicates a deficiency in the original classification scheme which we address. Specifically, we introduce a new measure called Reactivity that measures how fast a learner can adapt to an unexpected hypothetical change in the opponent's policy. We show that in some games, this new measure can approximately predict the performance of a player, and based on this measure we explain the behaviors of various algorithms in the Matching Pennies game, which was inexplicable by the original scheme. Finally we show that under certain restrictions, a player that consciously tries to avoid exploitation may be unable to do so.
AB - In this paper we take a closer look at a recently proposed classification scheme for multiagent learning algorithms. Based on this scheme an exploitation mechanism (we call it the Exploiter) was developed that could beat various Policy Hill Climbers (PHC) and other fair opponents in some repeated matrix games. We show on the contrary that some fair opponents may actually beat the Exploiter in repeated games. This clearly indicates a deficiency in the original classification scheme which we address. Specifically, we introduce a new measure called Reactivity that measures how fast a learner can adapt to an unexpected hypothetical change in the opponent's policy. We show that in some games, this new measure can approximately predict the performance of a player, and based on this measure we explain the behaviors of various algorithms in the Matching Pennies game, which was inexplicable by the original scheme. Finally we show that under certain restrictions, a player that consciously tries to avoid exploitation may be unable to do so.
UR - http://www.scopus.com/inward/record.url?scp=4544357282&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:4544357282
SN - 1581138644
T3 - Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
SP - 538
EP - 545
BT - Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
A2 - Jennings, N.R.
A2 - Sierra, C.
A2 - Sonenberg, L.
A2 - Tambe, M.
T2 - Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004
Y2 - 19 July 2004 through 23 July 2004
ER -