A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks

Shaima A Abushaala; Mohammed M Elsheh

doi:https://doi.org/10.14445/22312803/IJCTT-V70I1P101

Research Article | Open Access | Download PDF

Volume 70 | Issue 1 | Year 2022 | Article Id. IJCTT-V70I1P101 | DOI : https://doi.org/10.14445/22312803/IJCTT-V70I1P101

A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks

Shaima A Abushaala, Mohammed M Elsheh

Received	Revised	Accepted
07 Dec 2021	09 Jan 2022	21 Jan 2022

Citation :

Shaima A Abushaala, Mohammed M Elsheh, "A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks," International Journal of Computer Trends and Technology (IJCTT), vol. 70, no. 1, pp. 1-3, 2022. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V70I1P101

Abstract

It is well known that there are three basic tasks in Natural language processing(NLP) (Tokenization, Part-Of-Speech tagging, Named Entity Recognition), which in turn can be divided into two levels, lexical and syntactic. The former level includes tokenization. The latter level includes part of speech (POS) and the named entity recognition (NER) tasks. Recently, deep learning has been shown to perform well in various natural language processing tasks such as POS, NER, sentiment analysis, language modelling, and other tasks. In addition, it performs well without the need for manually designed external resources or time-consuming feature engineering. In this study, the focus is on using Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BLSTM), Bidirectional Long Short-Term Memory with Conditional Random Field (BLSTM-CRF), and Long Short-Term Memory with Conditional Random Field (LSTM-CRF) deep learning techniques for tasks in Syntactic level and comparing their performance. The models are trained and tested by using the KALIMAT corpus. The obtained results show that a BLSTM-CRF model overcame the other models in the NER task. As for the POS task, the BLSTM-CRF model obtained the highest F1-score compared to the other models.

Keywords

Natural Language Processing, Deep learning, Part-of-Speech tagging, Named-Entity Recognition.

References

[1] Jurafsky, D. and H. James, Martin: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics., (2008) Prentice-Hall, Englewood Cliffs.
[2] Huang, Z., W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, (2015).
[3] Cheriet, M. and M. Beldjehem, , Visual processing of Arabic handwriting: challenges and new directions, Summit on Arabic and Chinese handwriting (SACH’06), Washington-DC, USA, (2006) 129-136.
[4] Liddy, E.D., Natural language processing, (2001).
[5] Ma, X. and E. Hovy, End-to-end sequence labelling via bi-directional lstm-cnns-crf, arXiv preprint arXiv:1603.01354, (2016).
[6] Dr Mo El-Haj, R.K., KALIMAT a Multipurpose Arabic Corpus, 20132015-04-09Available from: https://sourceforge.net/projects/kalimat/files/?source=navbar.
[7] Habash, N., A. Soudi, and T. Buckwalter, On Arabic transliteration