TY - JOUR
T1 - End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer
AU - Yuan, Weihao
AU - Hang, K.
AU - Kragic, Danica
AU - Wang, Michael Y.
AU - Stork, Johannes A.
N1 - Funding Information:
This work was supported by the HKUST SSTSP project RoMRO (FP802), HKUST IGN project (16EG09), HKUST PGS Fund of Office of Vice-President (Research & Graduate Studies), Knut and Alice Wallenberg Foundation (2014.0011 and 2017.0426), Foundation for Strategic Research (GMT14-0082) and Örebro University Vice-chancellor's Fellowship Development Program. The authors declare that there are no known conflicts of interest associated with this article.
Funding Information:
This work was supported by the HKUST SSTSP project RoMRO ( FP802 ), HKUST IGN project ( 16EG09 ), HKUST PGS Fund of Office of Vice-President (Research & Graduate Studies), Knut and Alice Wallenberg Foundation ( 2014.0011 and 2017.0426 ), Foundation for Strategic Research ( GMT14-0082 ) and Örebro University Vice-chancellor’s Fellowship Development Program .
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/9
Y1 - 2019/9
N2 - Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process.
AB - Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process.
KW - Deep reinforcement learning
KW - Nonprehensile rearrangement
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85068467713&partnerID=8YFLogxK
U2 - 10.1016/j.robot.2019.06.007
DO - 10.1016/j.robot.2019.06.007
M3 - Article
AN - SCOPUS:85068467713
SN - 0921-8890
VL - 119
SP - 119
EP - 134
JO - Robotics and Autonomous Systems
JF - Robotics and Autonomous Systems
ER -