An Efficient Approach for Script Identification

Om Prakash; Mr.Vineet Shrivastava; Dr. Ashish Kumar

doi:10.14445/22312803/IJCTT-V4I6P121

Research Article | Open Access | Download PDF

Volume 4 | Issue 6 | Year 2013 | Article Id. IJCTT-V4I6P121 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I6P121

An Efficient Approach for Script Identification

Om Prakash , Mr.Vineet Shrivastava, Dr. Ashish Kumar

Citation :

Om Prakash , Mr.Vineet Shrivastava, Dr. Ashish Kumar, "An Efficient Approach for Script Identification," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 6, pp. 1626-1631, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I6P121

Abstract

There are a large number of different approaches to recognize the scripts currently available in OCR System. In this report we look to identify the script of multi-languages. In the proposed script identification system, we have considered four Indian languages such as Hindi (Devanagari), Bangla, Telugu, Kannada. This system will let document images to accurate scan with higher accuracy. In this context, we modeled script identification of multilingual document using horizontal projection profile based analysis with head line features. A database of 450 text words of Hindi, 450 text words of Bangla, 450 text words of Telugu and 450 text words of Kannada are used for experimentation. The proposed system yields the 97.83 accuracy with four specified languages. Since script identification plays an important role in analyzing the printed documents.

Keywords

OCR, Multi-script recognition, Binarization, Line Segmentation, Horizontal projection profile.

References

[1] Priyanka P. Yeotikar, P.R. Deshmukh, “Script Identification of Text Words from Indian Document through Discriminating Features” International Journal of Computer Applications, (0975 – 8887), 2013.
[2] Huanfeng Ma and David Doermann, “Word Level Script Identification for Scanned Document Images”, PP. 135-140, 2012.
[3] Sunilkumar K. Sangame, R. J. Ramteke, Shivkumar Andure and Yogesh V. Gundge, “Script identification of text words from a bilingual document using voting Techniques”, World Journal of Science and Technology, 2(5):114-119, ISSN: 2231 – 2587, 2012.
[4] M Swamy Das, D Sandhya Rani, C R K Reddy, A Govardhan, “Script identification from Multilingual Telugu, Hindi and English Text Documents”, International Journal of Wisdom Based Computing, Vol. 1 (3), 2011.
[5] M. C Padma, P. A Vijaya, “Script Identification from Trilingual Documents Using Profile Based Features”, International Journal of Computer Science and Applications, Technomathematics Research Foundation, Vol. 7 No. 4, pp. 16 - 33, 2010.
[6] Gopal Datt Joshi, Saurabh garg, and Jayanti Saraswat,”Script Identification of Indian Documents”, LNCS 3872, PP.255-267, 2006.
[7] U.Pal, “Automatic Script Identification: A survey”, VOL 16, PP26-35, 2006.
[8] U.Pal, S.Sinha and B.B chaudhary,” Multiscript Line Identification from Indian documents, Published in seventh international conference on Document Analysis and Recognition, ICDAR, 2003.
[9] U.pal and B.B chaudhary,”Automatic Separation of different script Documents”, Published in Indian Conference on Computer-vision, Graphics and Image processing, PP 141-146, 1998.
[10] Santanu Chaudhary,Rabindra Seth ,”Trainable Script Identification Strategies Of Indian Languages”, Published in fifth International Conference on Document Analysis and Recognition, 1999.
[11] Aspitz,” Determination of the script and language context of document images”, Published in IEEE Trans PAMI, VOL 19 NO3, PP 235-245, 1997.
[12] U.pal and B.B chaudhary,”Automatic Separation of Roman, Devanagri and Telugu Script lines”, Published in advances in pattern Recognition and digital techniques, PP 447-451, 1999.
[13] Arvind Kumar Patel, Ashok Kumar Dubay and Vineet Shrivastava, “Developing an optimized solution for script identification processes in a multilingual document using OCR”, ISSN-2278-6643, 2012.
[14] www.mathworks.com
[15] www.matlabcentral.com