Function Optimization using Connectionist Reinforcement Learning Algorithms

Ronald J. Williams, Jing Peng

Research output: Contribution to journalArticle

60 Citations (Scopus)

Abstract

Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCEjMENT', represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.

Original languageEnglish
Pages (from-to)241-268
Number of pages28
JournalConnection Science
Volume3
Issue number3
DOIs
StatePublished - 1 Jan 1991

Fingerprint

Reinforcement learning
Learning algorithms
Entropy
Sampling

Keywords

  • Reinforcement learning
  • combinatorial optimization
  • entropy maximization
  • genetic algorithms

Cite this

@article{98489486af244206b256dc2aaedf2478,
title = "Function Optimization using Connectionist Reinforcement Learning Algorithms",
abstract = "Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCEjMENT', represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.",
keywords = "Reinforcement learning, combinatorial optimization, entropy maximization, genetic algorithms",
author = "Williams, {Ronald J.} and Jing Peng",
year = "1991",
month = "1",
day = "1",
doi = "10.1080/09540099108946587",
language = "English",
volume = "3",
pages = "241--268",
journal = "Connection Science",
issn = "0954-0091",
publisher = "Taylor and Francis AS",
number = "3",

}

Function Optimization using Connectionist Reinforcement Learning Algorithms. / Williams, Ronald J.; Peng, Jing.

In: Connection Science, Vol. 3, No. 3, 01.01.1991, p. 241-268.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Function Optimization using Connectionist Reinforcement Learning Algorithms

AU - Williams, Ronald J.

AU - Peng, Jing

PY - 1991/1/1

Y1 - 1991/1/1

N2 - Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCEjMENT', represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.

AB - Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCEjMENT', represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.

KW - Reinforcement learning

KW - combinatorial optimization

KW - entropy maximization

KW - genetic algorithms

UR - http://www.scopus.com/inward/record.url?scp=0041154467&partnerID=8YFLogxK

U2 - 10.1080/09540099108946587

DO - 10.1080/09540099108946587

M3 - Article

AN - SCOPUS:0041154467

VL - 3

SP - 241

EP - 268

JO - Connection Science

JF - Connection Science

SN - 0954-0091

IS - 3

ER -