Aks: A Database for Detection and Extraction of Devanagari Text in Camera Based Images

International Journal of Computer Trends and Technology (IJCTT)          
© 2016 by IJCTT Journal
Volume-39 Number-1
Year of Publication : 2016
Authors : Ganesh K Sethi, Rajesh K Bawa


Ganesh K Sethi, Rajesh K Bawa "Aks: A Database for Detection and Extraction of Devanagari Text in Camera Based Images". International Journal of Computer Trends and Technology (IJCTT) V39(1):32-39, September 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
With the advent of digital cameras and other hand held imaging devices a new type of text containing images have emerged that are unable to handle with traditional optical character recognition (OCR) technology. These camera based images imposes a number of challenges that are absent in scanned or born-digital images. To detect and extract text from camera based images, Text Information Extraction (TIE) process is carried out that detects presence of text in an image and separates it from the background. In this paper a detailed comparison between camera based, born digital and scanned images is presented. A database of camera based images containing text particularly in Devanagari script is created. A survey of various available bench mark databases is done and keeping in view the challenges of camera based images an exhaustive dataset of images is prepared. The paper also discusses the evaluation metrics used to compute the accuracy of text detection and extraction from camera based images.

[1] J. Liang, D. Doermann, H. Li, “Camera-based analysis of text and documents: a survey”, International Journal on Document Analysis and Recognition (IJDAR), Volume (7), Issue 2-3, pp. 84-104, 2005.
[2] A. Antonacopoulos, D. Karatzas, J. Ortiz Lopez, “Accessing Textual Information Embedded in Internet Images”, Proc. of SPIE, Internet Imaging II, San Jose, USA, Vol. 4311, pp. 198-205, January 2001.
[3] T. Kanungo and C. Lee “What fraction of images on the web contain text? International Workshop on Web Document Analysis (WDA), pp 43-46, 2001.
[4] D. Karatzas, A. Antonacopoulos, “Colour Text Segmentation in Web Images Based on Human Perception”, Image and Vision Computing, Vol. 25, Issue 5, Elsevier, pp. 564-577, May 2007.
[5] D. Lopresti, J. Zhou, “Locating and recognizing text in WWW images”, Information Retrieval, 2, pp. 177–206, 2000.
[6] S.J. Perantonis, B. Gatos and V. Maragos, “A novel Web image processing algorithm for text area identification that helps commercial OCR engines to improve their Web image recognition efficiency”, Second Int. Workshop on Web Document Analysis (WDA2003), pp. 61-64, Edinburgh, Scotland, August 2003
[7] J. Ohya, A. Shio, and S. Akamatsu, “Recognizing characters in scene images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp. 214–220, Feb. 1994.
[8] Y. Zhong, K. Karu, and A. Jain, “Locating text in complex color images,” Pattern Recognition, vol. 28, no. 10, pp. 1523–1535, Oct. 1995.
[9] P. Clark and M. Mirmehdi, “Recognising text in real scenes,”Int. Jour. on Document Analysis and Recognition, vol. 4, no. 4, pp. 243–257, 2002.
[10] C. Mancas Thillou and B. Gosselin, “Color text extraction with selective metric-based clustering,” Computer Vision and Image Understanding, vol. 107, no. 1-2, pp. 97–107, Jul.2007.
[11] J. Weinman, E. Learned Miller, and A. Hanson, “Scene text recognition using similarity and a lexicon with sparse belief propagation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1733–1746, Oct. 2009.
[12] J. Park, G. Lee, E. Kim, J. Lim, S. Kim, H. Yang, M. Lee, and S. Hwang, “Automatic detection and recognition of korean text in outdoor signboard images,” Pattern Recognition Letters, vol. 31, no. 12, pp. 1728–1739, Sep. 2010.
[13] Y. Pan, X. Hou, and C. Liu, “A hybrid approach to detect and localize texts in natural scene images,” IEEE Trans. on Image Processing, vol. 20, no. 3, pp. 800–813, Mar. 2011.
[14] S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and, R. Young, “ICDAR 2003 robust reading competitions” In Proceedings of International Conference on Document Analysis and Recognition, pages 682 – 687. IEEE Computer Society, 2003.
[15] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images”, In Proceedings of the International Conference on Document Analysis and Recognition, pages 1491–1496. IEEE Computer Society, 2011.
[16] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i. Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and D. L. Heras, “ICDAR 2013 Robust Reading Competition”, In Proceedings of the 12th nternational Conference on Document Analysis and Recognition, pages 1115–1124, 2013.
[17] K. Wang and S. Belongie, “Word spotting in the wild”, In Proceedings of 11th ECCV, pages 591–604, 2010, http://vision.ucsd.edu/~kai/svt/.
[18] C. Yao, X. Bai, W. Liu, Y. Ma and Z. Tu, Detecting Texts of Arbitrary Orientations in Natural Images, CVPR 2012 [19] R. Nagy, A. Dicker and K. Meyer-Wegener, "NEOCR: A Configurable Dataset for Natural Image Text Recognition". In Proceedings of CBDAR Workshop 2011 at ICDAR 2011. pp. 53-58, September 2011.
[20] R. Nagy, A. Dicker, and K. Meyer-Wegener, "Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition". University of Erlangen, Dept. of Computer Science, Technical Reports, CS-2011, 07, September 2011.
[21] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: A database and web-based tool for image annotation,” IJCV, vol. 77, pp. 157–173, May 2008.
[22] “LabelMe Dataset.” [Online]. Available: http://labelme. csail.mit.edu/
[23] C. Wolf and J.M. Jolion, "Object Count / Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms", International Journal of Document Analysis, vol. 8, no. 4, pp. 280-296, 2006.
[24] T. Kasar, D. Kumar, M.N. Anil Prasad, D. Girish and A.G. Ramakrishnan, “MAST: Multi-Script Annotation toolkit for Scenic Text,” Proc. Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data (J-MOCRAND), Sept. 17, 2011. Beijing, China.

Born Digital Images, Camera Images, Scanned Images, Text Information Extraction (TIE), optical character recognition (OCR).