Weakly supervised pain localization using multiple instance learning

Karan Sikka, Abhinav Dhall, Marian Bartlett

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

32 Citations (Scopus)

Abstract

Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms like MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of our approach by achieving promising results on the problem of pain detection in videos.

Original languageEnglish
Title of host publication2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
DOIs
Publication statusPublished - 20 Aug 2013
Externally publishedYes
Event2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013 - Shanghai, China
Duration: 22 Apr 201326 Apr 2013

Publication series

Name2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013

Conference

Conference2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
CountryChina
CityShanghai
Period22/04/1326/04/13

Cite this

Sikka, K., Dhall, A., & Bartlett, M. (2013). Weakly supervised pain localization using multiple instance learning. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013 [6553762] (2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013). https://doi.org/10.1109/FG.2013.6553762
Sikka, Karan ; Dhall, Abhinav ; Bartlett, Marian. / Weakly supervised pain localization using multiple instance learning. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013. 2013. (2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013).
@inproceedings{6290d5ca7a8b45c29452e1214ad60416,
title = "Weakly supervised pain localization using multiple instance learning",
abstract = "Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms like MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of our approach by achieving promising results on the problem of pain detection in videos.",
author = "Karan Sikka and Abhinav Dhall and Marian Bartlett",
year = "2013",
month = "8",
day = "20",
doi = "10.1109/FG.2013.6553762",
language = "English",
isbn = "9781467355452",
series = "2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013",
booktitle = "2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013",

}

Sikka, K, Dhall, A & Bartlett, M 2013, Weakly supervised pain localization using multiple instance learning. in 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013., 6553762, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013, Shanghai, China, 22/04/13. https://doi.org/10.1109/FG.2013.6553762

Weakly supervised pain localization using multiple instance learning. / Sikka, Karan; Dhall, Abhinav; Bartlett, Marian.

2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013. 2013. 6553762 (2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Weakly supervised pain localization using multiple instance learning

AU - Sikka, Karan

AU - Dhall, Abhinav

AU - Bartlett, Marian

PY - 2013/8/20

Y1 - 2013/8/20

N2 - Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms like MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of our approach by achieving promising results on the problem of pain detection in videos.

AB - Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms like MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of our approach by achieving promising results on the problem of pain detection in videos.

UR - http://www.scopus.com/inward/record.url?scp=84881511003&partnerID=8YFLogxK

U2 - 10.1109/FG.2013.6553762

DO - 10.1109/FG.2013.6553762

M3 - Conference Paper

AN - SCOPUS:84881511003

SN - 9781467355452

T3 - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013

BT - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013

ER -

Sikka K, Dhall A, Bartlett M. Weakly supervised pain localization using multiple instance learning. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013. 2013. 6553762. (2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013). https://doi.org/10.1109/FG.2013.6553762