TY - JOUR
T1 - JRDB
T2 - A dataset and benchmark of egocentric robot visual perception of humans in built environments
AU - Martin-Martin, Roberto
AU - Patel, Mihir
AU - Rezatofighi, Hamid
AU - Shenoi, Abhijeet
AU - Gwak, Junyoung
AU - Frankel, Eric
AU - Sadeghian, Amir
AU - Savarese, Silvio
N1 - Publisher Copyright:
IEEE
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360 RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360 spherical image from a fisheye camera and encoder values from the robot's wheels. Our dataset incorporates data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, all from the ego-perspective of the robot, both stationary and navigating. The dataset has been annotated with over 2.4 million bounding boxes spread over 5 individual cameras and 1.8 million associated 3D cuboids around all people in the scenes totaling over 3500 time consistent trajectories. Together with our dataset and the annotations, we launch a benchmark and metrics for 2D and 3D person detection and tracking. With this dataset, which we plan on extending with further types of annotation in the future, we hope to provide a new source of data and a test-bench for research in the areas of egocentric robot vision, autonomous navigation, and all perceptual tasks around social robotics in human environments.
AB - We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360 RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360 spherical image from a fisheye camera and encoder values from the robot's wheels. Our dataset incorporates data from traditionally underrepresented scenes such as indoor environments and pedestrian areas, all from the ego-perspective of the robot, both stationary and navigating. The dataset has been annotated with over 2.4 million bounding boxes spread over 5 individual cameras and 1.8 million associated 3D cuboids around all people in the scenes totaling over 3500 time consistent trajectories. Together with our dataset and the annotations, we launch a benchmark and metrics for 2D and 3D person detection and tracking. With this dataset, which we plan on extending with further types of annotation in the future, we hope to provide a new source of data and a test-bench for research in the areas of egocentric robot vision, autonomous navigation, and all perceptual tasks around social robotics in human environments.
KW - Person Detection
KW - Person Tracking
KW - Robot Navigation
KW - Social Robotics
UR - http://www.scopus.com/inward/record.url?scp=85103769454&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3070543
DO - 10.1109/TPAMI.2021.3070543
M3 - Article
C2 - 33798067
AN - SCOPUS:85103769454
SN - 0162-8828
VL - 45
SP - 6748
EP - 6765
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 6
ER -