Reactivity and safe learning in multi-agent systems

Bikramjit Banerjee, Jing Peng

Research output: Contribution to journalArticle

Abstract

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reactivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to optimize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

Original languageEnglish
Pages (from-to)339-356
Number of pages18
JournalAdaptive Behavior
Volume14
Issue number4
DOIs
StatePublished - 1 Dec 2006

Fingerprint

Learning
Noise
Safety
Research
Reinforcement (Psychology)

Keywords

  • Game theory
  • Multi-agent systems
  • Reinforcement learning

Cite this

Banerjee, Bikramjit ; Peng, Jing. / Reactivity and safe learning in multi-agent systems. In: Adaptive Behavior. 2006 ; Vol. 14, No. 4. pp. 339-356.
@article{1b2fdabd98284ca2ae275ed58084abe9,
title = "Reactivity and safe learning in multi-agent systems",
abstract = "Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reactivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to optimize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.",
keywords = "Game theory, Multi-agent systems, Reinforcement learning",
author = "Bikramjit Banerjee and Jing Peng",
year = "2006",
month = "12",
day = "1",
doi = "10.1177/1059712306072334",
language = "English",
volume = "14",
pages = "339--356",
journal = "Adaptive Behavior",
issn = "1059-7123",
publisher = "SAGE Publications Ltd",
number = "4",

}

Reactivity and safe learning in multi-agent systems. / Banerjee, Bikramjit; Peng, Jing.

In: Adaptive Behavior, Vol. 14, No. 4, 01.12.2006, p. 339-356.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Reactivity and safe learning in multi-agent systems

AU - Banerjee, Bikramjit

AU - Peng, Jing

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reactivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to optimize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

AB - Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reactivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to optimize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.

KW - Game theory

KW - Multi-agent systems

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=33750696671&partnerID=8YFLogxK

U2 - 10.1177/1059712306072334

DO - 10.1177/1059712306072334

M3 - Article

AN - SCOPUS:33750696671

VL - 14

SP - 339

EP - 356

JO - Adaptive Behavior

JF - Adaptive Behavior

SN - 1059-7123

IS - 4

ER -