Analyzing Word Error Rate on Optical Character Recognition (OCR) for Myanmar Printed Document Image

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2019 by IJCTT Journal
Volume-67 Issue-8
Year of Publication : 2019
Authors : Thin Thin Hlaing, May Phyo Oo, Thaint Zarli Myint
DOI :  10.14445/22312803/IJCTT-V67I8P109

MLA

MLA Style:Thin Thin Hlaing, May Phyo Oo, Thaint Zarli Myint"Analyzing Word Error Rate on Optical Character Recognition (OCR) for Myanmar Printed Document Image" International Journal of Computer Trends and Technology 67.8 (2019):51-57.

APA Style Thin Thin Hlaing, May Phyo Oo, Thaint Zarli Myint. Analyzing Word Error Rate on Optical Character Recognition (OCR) for Myanmar Printed Document ImageInternational Journal of Computer Trends and Technology, 67(8),51-57.

Abstract
The printed document is used Myanmar language in Myanmar. Sometime, we want to convert this printed document to text document easily. So, this paper describes an effective recognition and calculate error rate for Myanmar printed document image to editing text. Myanmar language contains many words, and most of them are similar, especially for small fonts, the accuracy of the Optical Character Recognition, OCR system for Myanmar may be low. In order to get more accurate system, enhance the input image by removing noise and making some correction on variants. A method for isolation of the character image is proposed by using connected component analysis for wrongly segmented characters produced by projection only. So, this paper proposes a method for obtaining more detail about actual translation errors in the generated output by using word error rate (WER) based the neural network classifier for recognition of the character image. We investigate the use of WER for automatic error analysis using a dynamic programming algorithm like Levenshtein distance over segmentation. This paper gives a better overview of the nature of translation errors. Finally, the proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can reduce the segmentation error rate as well as translation rates.

Reference
[1] H. P. P. Win and K. N. N. Tun, ?Image Enhancement Processes for Myanmar Printed Documents?, the fifth Conference on Parallel & Soft Computing, University of Computer Studies, Yangon, Myanmar, December 16, 2010.
[2] D. Achaya U, N. V. S. Reddy and Krishnamoorthi, ?Hierarchical Recognition System for Machine Printed Kannada Characters?, IJCSNS International Journal of Computer Science and Network Security, Vol. 8 No.11, November 2008.
[3] T.Z.N. Myint ?Analyzing Word Error Rate Using Semantic Oriented Approach on Bing Search Engine?, IJERT Internal Journal of Engineering Research and Technologies, Vol 2, Issue 11, November 2014, pp. 1094-1102.
[4] R. Singh and M. Kaur, ?OCR for Telugu Script Using Back-Propagation Based Classifier?, International Journal of Information Technology and Knowledge Management, July-December 2010, Vol. 2, No. 2, pp. 639-643.
[5] G.Vamvakas, B.Gatos, N. Stamatopoulos, and S. J. Perantonis, ?A Complete Optical Character Recognition Methodology for Historical Documents?, The Eighth IAPR Workshop on Document Analysis Systems, 2008.
[6] M. Jaderberg,K. Simonyan, A. Vedaldi, and A. Zisserman,? Reading text in the world convolutional neural networks,?Int J.comput. Vis., vol. 116, no. 1,pp. 1-20,2016, http//dx.doi.org/10.107/s11263-015-0823-z
[7] S. loffe and C. Szegedy, ?Bath normalization: Accelerating deep network training by reducing internal covariate shift? in Proc. Int. Conf. Mach. Learn, 2015, pp. 448-456.
[8] B. Shi and X. Bai, ?An end to end Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition?, IEEE Transaction on Pattern Analysis and Machine Intelligence Vol. 39, No. 11, November 2017.
[9] J. Almazan, A Gordo, A. Fornes, and E. Valveny, ? Word Spotting and recognition with embedded attributes?, IEEE Trans. Pattern Anal. Mach. Intell, vol.36, no.12,pp. 2552-2566, Dec 2014.
[10] Z.Zuo, et al. ?Convolution recurrent neural network Learning spatial dependences for image Representation? in Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshops,2015,pp. 18-26.
[11] Y. Zhu, C. Yao, and X. Bai, ? Scene text detection and recognition: Recent advances and future trends?, Frontiers Comput. Sci., vol.10, no1, pp19-36, 2016.
[12] C. Lee, A Bhardwai, W.Di, V.Jagadeesh and R. Piramuthu,? Region based discriminative feature pooling for scene text recognition,? in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2014, pp.4050-4057.
[13] B. Chaulagain, B. B. Rai and S. K. Raya, ?Final Report on Nepali Optical Character Recognition, NepaliOCR?, July 29, 2009.
[14] ?Myanmar Orthography?. Department of the Myanmar Language Commission, Ministry of Education, Union of Myanmar, June, 2003.
[15] Y. Thein and M. M. Sein, ?Myanmar Intelligent Character Recognition for Handwritten?, University of Computer Studies, Yangon, Myanmar, 2006.
[16] H. P. P. Win and K. N. N. Tun, ?Image Enhancement Processes for Myanmar Printed Documents?, the fifth Conference on Parallel & Soft Computing, University of Computer Studies, Yangon, Myanmar, December 16, 2010.
[17] M. Agrawal and D. Doermann, ?Re-targetable OCR with Intelligent Character Segmentation?, The Eight IAPR Workshop on Document Analysis Systems, 2008.
[18] S. Chen, B. Mulgrew, and P. M. Grant, ?A clustering technique for digital communications channel equalization using radial basis function networks,? IEEE Trans. on Neural Networks, vol. 4, pp. 570-578, July 1993.
[19] J. U. Buncombe, ?Infrared navigation—Part I: An assessment of feasibility,? IEEE Trans. Electron Devices, vol. ED-11, pp. 34-39, Jan. 1959.
[20] C. Y. Lin, M. Wu, J. A. Bloom, I. J. Cox, and M. Miller, ?Rotation, scale, and translation resilient public watermarking for images,? IEEE Trans. Image Process., vol. 10, no. 5, pp. 767-782, May 2001.

Keywords
Neural Network, OCR, Printed Document, WER