TY - JOUR
T1 - Human Action Recognition from various data modalities
T2 - a review
AU - Sun, Zehua
AU - Ke, Qiuhong
AU - Rahmani, Hossein
AU - Bennamoun, Mohammed
AU - Wang, Gang
AU - Liu, Jun
N1 - Funding Information:
This work was supported in part by National Research Foundation, Singapore under its AI Singapore Programme under Grants AISG Award No. AISG- 100E-2020-065, and in part by SUTD SRG. This work was supported in part by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under Grant 952215.
Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/3/1
Y1 - 2023/3/1
N2 - Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this article, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.
AB - Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this article, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.
KW - data modality
KW - deep learning
KW - Human action recognition
KW - multi-modality
KW - single modality
UR - http://www.scopus.com/inward/record.url?scp=85136334360&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2022.3183112
DO - 10.1109/TPAMI.2022.3183112
M3 - Review Article
C2 - 35700242
AN - SCOPUS:85136334360
SN - 0162-8828
VL - 45
SP - 3200
EP - 3225
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 3
ER -