State of the art in Nastaleeq Script Recognition

International Journal of Computer Trends and Technology (IJCTT)          
© 2016 by IJCTT Journal
Volume-39 Number-1
Year of Publication : 2016
Authors : Harmohan Sharma, Dharam Veer Sharma


Harmohan Sharma, Dharam Veer Sharma "State of the art in Nastaleeq Script Recognition". International Journal of Computer Trends and Technology (IJCTT) V39(1):40-46, September 2016. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
OCR of Nastaleeq script has gained a lot of importance during recent past owing to the requirements of preserving historic manuscripts and making such manuscripts searchable besides other applications of OCR. Nastaleeq, being a complex script, has largely remained untouched for automation till now. Whatever little work has been done so far, it has proved insufficient to fulfil the needs. Developing OCR for Urdu script based languages becomes even more complex than other languages like Latin and Chinese due to complexities of Urdu scripts, i.e. cursive nature of writing Urdu, context sensitive shapes, overlapping between ligatures, use of joiners, formation of ligatures within the words and space between the ligatures. Moreover, this paper analyses understanding of Urdu language, characteristics of Nastaleeq script and the complexities involved in developing the Urdu OCR.

[1] Gurpreet Singh Lehal, “A Word Segmentation System for Handling Space Omission Problem in Urdu Script” in the Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), the 23rd International Conference on Computational Linguistics (COLING), Beijing, pp 43–50, August 2010.
[2] M. Asad, A. S. Butt, S. Chaudhry and S. Hussain, “Rulebased Expert System for Urdu Nastaleeq justification”, in the Proceedings of 8th International Multitopic Conference (INMIC 2004), pp 591–596, 2004.
[3] Prof (Dr) Syed M. Abdul Khair Kashfi, “Noori Nastaliq Revolution in Urdu Composing”, Elite Publishers Limited, D -118, SITE, Karachi, Pakistan, 2008.
[4] M. G. A. Malik, C. Boitet and P. Bhattacharyya, “Analysis of Noori Nasta'leeq for Major Pakistani Languages”, in the Proceedings of the 2nd Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'2010), Penang, Malaysia, pp 95-103, 2010.
[5] S. Mori, C. Y. Suen and K. Yamamoto, “Historical review of OCR Research and Development”, in Proceedings of the IEEE, vol 80, issue 7, pp 1029-1058, 1992.
[6] G. Nagy, “Chinese Character Recognition - A twenty five years retrospective”, in Proceedings of the ICPR, pp 109 - 114, 1988.
[7] Atallah Mahmoud AL-Shatnawi, Safwan AL-Salaimeh, Farah Hanna AL-Zawaideh and Khairuddin Omar, “Offline Arabic Text Recognition – An Overview”, in World of Computer Science and Information Technology Journal (WCSIT), vol 1(5), pp 184-192, 2011.
[8] B. B. Chaudhuri, U. Pal and M. Mitra, “Automatic Recognition of Printed Oriya Script”, Sadhana, vol 27, part 1, pp 23-34, 2002.
[9] B. B. Chaudhuri and U. Pal, “A Complete Printed Bangla OCR System”, in Pattern Recognition, vol 31, pp 531-549, 1998.
[10] Md. Mahbub Alam and Dr. M. Abul Kashem, “A Complete Bangla OCR System for Printed Characters”, in JCIT, vol 1, issue 01, pp 30-35, 2010.
[11] U. Pal and B. B. Chaudhuri, “Printed Devnagari Script OCR System”, Vivek, vol 10, pp 12-24, 1997.
[12] Vikas J. Dongre and Vijay H. Mankar, “A Review of Research on Devnagari Character Recognition”, in the International Journal of Computer Applications, vol 12(2), pp 8 -15, 2010.
[13] G S Lehal and Chandan Singh, “A Gurmukhi Script Recognition System”, in Proceedings of the 15th International Conference on Pattern Recognition, vol 2, pp 557- 560, 2000.
[14] A. Negi, C. Bhagvati and B. Krishna, “An OCR System for Telugu”, in the Proceedings of 6th ICDAR, pp 1110 - 1114, 2001.
[15] G. Sirmony, R Chandrasekaran and M. Chandrasekaran, “Computer Recognition of Printed Tamil Charcters”, in Pattern Recognition, vol 10, issue 4, pp 243-247, 1978.
[16] Saeeda Naz, Khizar Hayat, Muhammad Imran Razzak, Muhammad Waqas Anwar, Sajjad A. Madani and Samee U. Khan, “The Optical Character Recognition of Urdu-like Cursive Scripts”, in Pattern Recognition, vol. 47, Issue 3, pp 1229–1248, 2014.
[17] Farah Adeeba, “Urdu 2430 Most Frequently Used Ligatures” Center for Language Engineering Al-Khwarizmi Institute of Computer Science University of Engineering and Technology Lahore, Pakistan qLigature.htm).
[18] Malik Waqas Sagheer, Chun Lei He, Nicola Nobile and Ching Y. Suen, “A New Large Urdu Database for Off-Line Handwriting Recognition”, in Image Analysis and Processing (ICIAP 2009) vol 5716, pp 538–546, 2009.
[19] Muhammad Imran Razzak, Syed Afaq Husain, Abdulrahman A. Mirza and Abdel Belaïd, “Fuzzy Based Preprocessing using Fusion of Online and Offline trait for Online Urdu Script based languages Character Recognition”, in International Journal of Innovative Computing, Information and Control, vol 8, number (5(A)), pp 3149–3161, 2012.
[20] U. Pal and A. Sarkar, “Recognition of Printed Urdu Script”, in Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR), pp 1183- 1187, 2003.
[21] Aamir Wali, Atif Gulzar, Ayesha Zia, Muhammad Ahmad Ghazali, Muhammad Irfan Rafiq, Muhammad Saqib Niaz, Sara Hussain, and Sheraz Bashir “Contextual Shape Analysis of Nastaleeq”, CRULP Annual Student Report, pp 288-302, 2001-2002.
[22] Qurat ul Ain Akram, Sarmad Hussain and Zulfiqar Habib, “Font Size Independent OCR for Noori Nastaleeq” in the Proceedings of Graduate Colloquium on Computer Sciences, Department of Computer Science, FAST-NU Lahore, vol 1, 2010
[23] Sohail A. Sattar,Shamsul Haque, Mahmod K. Pathan and Quintin Gee, “Implementation Challenges for Nastaliq Character Recognition”, in Wireless Networks, Information Processing and Systems, ser. Communications in Computer and Information Science, vol 20, Springer, Berlin, Heidelberg, pp 279-285, 2009.
[24] S. A. Sattar, “A Technique for the Design and Implementation of an OCR for Printed Nastalique Text” (Ph.D. dissertation), NED University of Engineering & Technology, Karachi, Pakistan, 2009.
[25] Sohail Abdul Sattar, Shams-ul Haque and Mahmood Khan Pathan, “A Finite State Model for Urdu Nastalique Optical Character Recognition”, in International Journal of Computer Science and Network Security (IJCSNS) vol 9(9), 2009.
[26] Gurpreet Singh Lehal, “Ligature Segmentation for Urdu OCR,” in 12th International Conference on Document Analysis and Recognition (ICDAR), pp 1130 -1134, 2013.
[27] Gurpreet Singh Lehal, “Choice of Recognizable Units for URDU OCR,” in Proceeding of the workshop on Document Analysis and Recognition (DAR), pp 79-85, 2012.
[28] Gurpreet Singh Lehal and Ankur Rana, “Recognition of Nastalique Urdu Ligatures”, in Proceedings of the 4th International Workshop on Multilingual OCR, USA, 2013.
[29] Safia Shabbir and Imran Siddiqi, “Optical Character Recognition System for Urdu Words in Nastaliq Font”, in International Journal of Advanced Computer Science and Applications (IJACSA), vol 7, No. 5, pp 567-576, 2016.
[30] S. A. Husain, “A Multi-tier Holistic approach for Urdu Nastaliq Recognition”, International Multitopic Conference INMIC, Karachi, 2002,
[31] S. A. Husain, Asma Sajjad and Fareeha Anwar, “Online Urdu Character Recognition System”, in the IAPR Conference on Machine Vision Applications, Tokyo, Japan, pp 98-102, 2007.
[32] Ihtesham Haider and Kamran Ullah Khan, “Online Recognition of Single Stroke Handwritten Urdu Characters”, in Proceedings of the 13th International Multi topic IEEE Conference (INMIC'09) , pp 1–6, 2009.
[33] Israr Uddin Khattak, Imran Siddiqi, Shehzad Khalid and Chawki Djeddi, “Recognition of Urdu Ligatures - A Holistic Approach”, in 13th International Conference on Document Analysis and Recognition (ICDAR), pp 71-75, 2015.
[34] Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia Asloob, Sehrish Jamil and Huma Moin, “Segmentation Free Nastalique Urdu OCR”, World Academy of Science, Engineering and Technology, issue 70, pp 457-462, 2010.

Optical Character Recognition, Nastaleeq, Ligature recognition.