An Efficient Approach for Script Identification

Authors :Om Prakash , Mr.Vineet Shrivastava, Dr. Ashish Kumar


Abstract: -There are a large number of different approaches to recognize the scripts currently available in OCR System. In this report we look to identify the script of multi-languages. In the proposed script identification system, we have considered four Indian languages such as Hindi (Devanagari), Bangla, Telugu, Kannada. This system will let document images to accurate scan with higher accuracy. In this context, we modeled script identification of multilingual document using horizontal projection profile based analysis with head line features. A database of 450 text words of Hindi, 450 text words of Bangla, 450 text words of Telugu and 450 text words of Kannada are used for experimentation. The proposed system yields the 97.83 accuracy with four specified languages. Since script identification plays an important role in analyzing the printed documents.


Keywords — OCR, Multi-script recognition, Binarization, Line Segmentation, Horizontal projection profile.