Abstract
Multi-step (also called n-step) methods in Reinforcement Learning (RL) have been shown to be more efficient than the 1-step method due to faster propagation of the reward signal, both theoretically and empirically, in tasks exploiting tabular representation of the value-function. Recently, research in Deep Reinforcement Learning (DRL) also shows that multi-step methods improve learning speed and final performance in applications where the value-function and policy are represented with deep neural networks. However, there is a lack of understanding about what is actually contributing to the boost of performance. In this work, we analyze the effect of multi-step methods on alleviating the overestimation problem in DRL, where multi-step experiences are sampled from a replay buffer. Specifically building on top of Deep Deterministic Policy Gradient (DDPG), we propose Multi-step DDPG (MDDPG), where different step sizes are manually set, and a variant called Mixed Multi-step DDPG (MMDDPG) where an average over different multi-step backups is used as an update target for the Q-value function. Empirically, we show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than DDPG with 1-step backup, which consequently results in better final performance and learning speed. We also discuss the advantages and disadvantages of different ways to do multi-step expansion in order to reduce approximation error, and expose the tradeoff between overestimation and underestimation that underlies offline multi-step methods. Finally, we compare the computational resource needs of MDDPG and MMDDPG with those of Twin Delayed Deep Deterministic Policy Gradient (TD3), a state-of-the-art algorithm proposed to address overestimation in actor-critic methods, since they show comparable final performance and learning speed.
Original language | English |
---|---|
Title of host publication | Proceedings of ICPR 2020, 25th International Conference on Pattern Recognition |
Editors | Kim Boyer, Brian C.Lovell, Marcello Pelillo, Nicu Sebe, Rene Vidal, Jingyi Yu |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 347-353 |
Number of pages | 7 |
ISBN (Electronic) | 9781728188089 |
ISBN (Print) | 9781728188096 |
DOIs | |
Publication status | Published - 2021 |
Event | International Conference on Pattern Recognition 2020 - Virtual , Milano, Italy Duration: 10 Jan 2021 → 15 Jan 2021 Conference number: 25th https://ieeexplore-ieee-org.ezproxy.lib.monash.edu.au/xpl/conhome/9411940/proceeding (Proceedings) https://www.micc.unifi.it/icpr2020/ (Website) |
Publication series
Name | Proceedings - International Conference on Pattern Recognition |
---|---|
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
ISSN (Print) | 1051-4651 |
Conference
Conference | International Conference on Pattern Recognition 2020 |
---|---|
Abbreviated title | ICPR 2020 |
Country/Territory | Italy |
City | Milano |
Period | 10/01/21 → 15/01/21 |
Internet address |