Optimizing Reinforcement Learning Using Failure Data

Suzeyu George Cui, Jesse Parron, Garrett Modery, Weitian Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Learning from both successes and failures is key to developing robust and efficient policies in reinforcement learning (RL). Traditional RL excels at learning from rewards but often neglects non-rewarding states, especially those leading to negative outcomes. This paper introduces a novel approach that integrates a modified Gaussian distribution into a Deep Q-Network (DQN) framework to learn from failures. By penalizing state-action pairs near historical failure points, the model guides the agent away from pitfalls. The optimized DQN shows improved learning speed and stability, achieving higher and more consistent scores than a standard DQN. This approach highlights the potential of hybrid RL models that combine value-based methods with failure-aware mechanisms to accelerate learning and enhance decision-making.

Original languageEnglish
Title of host publicationURTC 2024 - 2024 IEEE MIT Undergraduate Research Technology Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331531003
DOIs
StatePublished - 2024
Event2024 IEEE MIT Undergraduate Research Technology Conference, URTC 2024 - Hybrid, Cambridge, United States
Duration: 11 Oct 202413 Oct 2024

Publication series

NameURTC 2024 - 2024 IEEE MIT Undergraduate Research Technology Conference, Proceedings

Conference

Conference2024 IEEE MIT Undergraduate Research Technology Conference, URTC 2024
Country/TerritoryUnited States
CityHybrid, Cambridge
Period11/10/2413/10/24

Keywords

  • learning from failure
  • machine learning
  • reinforcement learning
  • robotics

Fingerprint

Dive into the research topics of 'Optimizing Reinforcement Learning Using Failure Data'. Together they form a unique fingerprint.

Cite this