自然场景中的文本检测与识别综述

自然场景中的文本检测与识别综述 张树业 2014.6.6

ICDAR 2003 Robust Reading Competition • Partition into 3 sub-problems: Text Locating, Word Recognition, Character Recognition. • Text Locating: • Word Recognition: • Character Recognition: Figure 1 (a)Original Image; (b)Location Result. Figure 2 Image contained a word. Figure 3 Image contained a character. [1] http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions

ICDAR 2003 Robust Reading Competition Table 1 ICDAR 2003 Text Locating Competition Dataset • Dataset • Basic Information of each task • Quick Impression of Data Table 2 Measures of interest in each problem. Figure 4 Sample Image in Task 1.

ICDAR 2003 Robust Reading Competition Evaluation Scheme: (1) (3) (2) (4) Figure 5 Different positional relationships between two rectangles. Competition Result: Table 3 ICDAR 2003 Text Locating Competition Results [1] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading competitions,” in Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 682–687.

ICDAR 2005 Robust Reading Competition Table 4 Text locating results for the 2005 (top) and the 2003 (bottom) entries. • Task, Dataset, Evaluation scheme keep the same with ICDAR 2003. • Competition Results:

ICDAR 2011 Robust Reading Competition Table 5 Some drawback of previous dataset and its improvement • Text Localization Task, Word Recognition Task • Dataset • Evaluation Scheme: Wolf’s Scheme [2] and DetEval [1] Shahab, Asif, Faisal Shafait, and Andreas Dengel. "ICDAR 2011 robust reading competition challenge 2: Reading text in scene images." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011. [2] C. Wolf and J. Jolion, “Object count/area graphs for the evaluation of object detection and segmentation algorithms,” Int. Jour. on Document Analysis and Recognition, vol. 8, no. 4, pp. 280–296, Sep. 2006.

ICDAR 2011 Robust Reading Competition Table 6 Text Localization Results Brief Introduction of Kim’s method: First, blobs in an image are extracted by MSER approach the neighboring blobs are merged. Next, minimize the false positives with gradient feature. Then, Adaboost learning method is used for deciding the location and size of the rectangle in the oriented gradient image. Finally, a cascade classifier is used to discriminate text from non-text. [1] D. Karatzas, S. R. Mestre, J. Mas, F. Nourbakhsh, and P. P. Roy, “ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email),” in Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011, pp. 1485–1490.

ICDAR 2013 Robust Reading CompetitionChallenge 2: “Reading Text in Scene Images” Task 1-Text Localization: the objective is to obtain a rough estimation of the text areas in the image, in terms of bounding boxes that correspond to parts of text (words or text lines). Task 2-Text Segmentation: the objective is the pixel level separation of text from the background. Task 3-Word Recognition: the locations (bounding boxes) of words in the image are assumed to be known and the corresponding text transcriptions are sought. Note that there are many short words and even single letters in this dataset. [1] http://dag.cvc.uab.es/icdar2013competition/?ch=2&com=tasks

ICDAR 2013 Robust Reading Competition Table 7. Results for the ICDAR 2013 Robust Reading Competition (Challenge2: Text Localization in Real Scenes). Table 8. Performance (%) comparison of text localization algorithms for the multilingual dataset. Table 9. Performance (%) comparison of text localization algorithms for the ICDAR 2011 Robust Reading Competition dataset. [1] http://prir.ustb.edu.cn/TexStar/scene-text-detection/ [2] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), to appear, 2013.

Demo Neumann L., Matas J.: A Real-Time Scene Text to Speech System, Demo @ ECCV 2012.

围绕场景文字检测的创新 • 应用（1）残疾人辅助工具；（2）图像的理解和检索；（3）实时场景翻译系统；（4）无人驾驶；（5）地图标注… • 算法（1）将检测与识别做成一个End-to-End的系统；（2）利用语言模型作为后处理进一步提高系统的性能…

谢谢！

自然场景中的文本检测与识别综述

自然场景中的文本检测与识别综述

Presentation Transcript