TY - JOUR
T1 - Arbitrarily-oriented multi-lingual text detection in video
AU - Khare, Vijeta
AU - Shivakumara, Palaiahnakote
AU - Paramesran, Raveendran
AU - Blumenstein, Michael
N1 - Funding Information:
The work is also partly supported by the University of Malaya HIR under Grant No: UM.C/625/1/HIR/MOHE/ENG/42. The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which helped us to improve the quality and to clarify the paper significantly.
Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.
AB - Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.
KW - Arbitrarily-oriented text detection
KW - Caption text
KW - Higher order moments
KW - Multi-lingual text detection
KW - Region growing
KW - Stroke width distance, dynamic window
UR - http://www.scopus.com/inward/record.url?scp=84988662262&partnerID=8YFLogxK
U2 - 10.1007/s11042-016-3941-x
DO - 10.1007/s11042-016-3941-x
M3 - Article
AN - SCOPUS:84988662262
SN - 1380-7501
VL - 76
SP - 16625
EP - 16655
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 15
ER -