A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks

  IJCTT-book-cover
 
         
 
© 2022 by IJCTT Journal
Volume-70 Issue-1
Year of Publication : 2022
Authors : Shaima A Abushaala, Mohammed M Elsheh
DOI :  10.14445/22312803/IJCTT-V70I1P101

How to Cite?

Shaima A Abushaala, Mohammed M Elsheh, "A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks," International Journal of Computer Trends and Technology, vol. 70, no. 1, pp. 1-3, 2022. Crossref, https://doi.org/10.14445/22312803/IJCTT-V70I1P101

Abstract
It is well known that there are three basic tasks in Natural language processing(NLP) (Tokenization, Part-Of-Speech tagging, Named Entity Recognition), which in turn can be divided into two levels, lexical and syntactic. The former level includes tokenization. The latter level includes part of speech (POS) and the named entity recognition (NER) tasks. Recently, deep learning has been shown to perform well in various natural language processing tasks such as POS, NER, sentiment analysis, language modelling, and other tasks. In addition, it performs well without the need for manually designed external resources or time-consuming feature engineering. In this study, the focus is on using Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BLSTM), Bidirectional Long Short-Term Memory with Conditional Random Field (BLSTM-CRF), and Long Short-Term Memory with Conditional Random Field (LSTM-CRF) deep learning techniques for tasks in Syntactic level and comparing their performance. The models are trained and tested by using the KALIMAT corpus. The obtained results show that a BLSTM-CRF model overcame the other models in the NER task. As for the POS task, the BLSTM-CRF model obtained the highest F1-score compared to the other models.

Keywords
Natural Language Processing, Deep learning, Part-of-Speech tagging, Named-Entity Recognition.

Reference

[1] Jurafsky, D. and H. James, Martin: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics., (2008) Prentice-Hall, Englewood Cliffs.
[2] Huang, Z., W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, (2015).
[3] Cheriet, M. and M. Beldjehem, , Visual processing of Arabic handwriting: challenges and new directions, Summit on Arabic and Chinese handwriting (SACH’06), Washington-DC, USA, (2006) 129-136.
[4] Liddy, E.D., Natural language processing, (2001).
[5] Ma, X. and E. Hovy, End-to-end sequence labelling via bi-directional lstm-cnns-crf, arXiv preprint arXiv:1603.01354, (2016).
[6] Dr Mo El-Haj, R.K., KALIMAT a Multipurpose Arabic Corpus, 20132015-04-09Available from: https://sourceforge.net/projects/kalimat/files/?source=navbar.
[7] Habash, N., A. Soudi, and T. Buckwalter, On Arabic transliteration