An Efficient Approach for Script Identification

International Journal of Computer Trends and Technology (IJCTT)          
© - June Issue 2013 by IJCTT Journal
Volume-4 Issue-6                           
Year of Publication : 2013
Authors :Om Prakash , Mr.Vineet Shrivastava, Dr. Ashish Kumar


Om Prakash , Mr.Vineet Shrivastava, Dr. Ashish Kumar "An Efficient Approach for Script Identification "International Journal of Computer Trends and Technology (IJCTT),V4(6):1626-1631 June Issue 2013 .ISSN Published by Seventh Sense Research Group.

Abstract: -There are a large number of different approaches to recognize the scripts currently available in OCR System. In this report we look to identify the script of multi-languages. In the proposed script identification system, we have considered four Indian languages such as Hindi (Devanagari), Bangla, Telugu, Kannada. This system will let document images to accurate scan with higher accuracy. In this context, we modeled script identification of multilingual document using horizontal projection profile based analysis with head line features. A database of 450 text words of Hindi, 450 text words of Bangla, 450 text words of Telugu and 450 text words of Kannada are used for experimentation. The proposed system yields the 97.83 accuracy with four specified languages. Since script identification plays an important role in analyzing the printed documents.


[1] Priyanka P. Yeotikar, P.R. Deshmukh, “Script Identification of Text Words from Indian Document through Discriminating Features” International Journal of Computer Applications, (0975 – 8887), 2013.
[2] Huanfeng Ma and David Doermann, “Word Level Script Identification for Scanned Document Images”, PP. 135-140, 2012.
[3] Sunilkumar K. Sangame, R. J. Ramteke, Shivkumar Andure and Yogesh V. Gundge, “Script identification of text words from a bilingual document using voting Techniques”, World Journal of Science and Technology, 2(5):114-119, ISSN: 2231 – 2587, 2012.
[4] M Swamy Das, D Sandhya Rani, C R K Reddy, A Govardhan, “Script identification from Multilingual Telugu, Hindi and English Text Documents”, International Journal of Wisdom Based Computing, Vol. 1 (3), 2011.
[5] M. C Padma, P. A Vijaya, “Script Identification from Trilingual Documents Using Profile Based Features”, International Journal of Computer Science and Applications, Technomathematics Research Foundation, Vol. 7 No. 4, pp. 16 - 33, 2010.
[6] Gopal Datt Joshi, Saurabh garg, and Jayanti Saraswat,”Script Identification of Indian Documents”, LNCS 3872, PP.255-267, 2006.
[7] U.Pal, “Automatic Script Identification: A survey”, VOL 16, PP- 26-35, 2006.
[8] U.Pal, S.Sinha and B.B chaudhary,” Multiscript Line Identification from Indian documents, Published in seventh international conference on Document Analysis and Recognition, ICDAR, 2003.
[9] U.pal and B.B chaudhary,”Automatic Separation of different script Documents”, Published in Indian Conference on Computer-vision, Graphics and Image processing, PP 141-146, 1998.
[10] Santanu Chaudhary,Rabindra Seth ,”Trainable Script Identification Strategies Of Indian Languages”, Published in fifth International Conference on Document Analysis and Recognition, 1999.

Keywords — OCR, Multi-script recognition, Binarization, Line Segmentation, Horizontal projection profile.