Abstract
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial - when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.
Original language | English |
---|---|
Title of host publication | The Thirty-Second AAAI Conference on Artificial Intelligence |
Editors | Sheila McIlraith, Kilian Weinberger |
Place of Publication | Palo Alto CA USA |
Publisher | Association for the Advancement of Artificial Intelligence (AAAI) |
Pages | 3302-3309 |
Number of pages | 8 |
ISBN (Electronic) | 9781577358008 |
Publication status | Published - 1 Jan 2018 |
Externally published | Yes |
Event | AAAI Conference on Artificial Intelligence 2018 - New Orleans, United States of America Duration: 2 Feb 2018 → 7 Feb 2018 Conference number: 32nd https://aaai.org/Conferences/AAAI-18/ |
Conference
Conference | AAAI Conference on Artificial Intelligence 2018 |
---|---|
Abbreviated title | AAAI 2018 |
Country/Territory | United States of America |
City | New Orleans |
Period | 2/02/18 → 7/02/18 |
Internet address |