International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 70 | Issue 1 | Year 2022 | Article Id. IJCTT-V70I1P101 | DOI : https://doi.org/10.14445/22312803/IJCTT-V70I1P101

A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks


Shaima A Abushaala, Mohammed M Elsheh

Received Revised Accepted
07 Dec 2021 09 Jan 2022 21 Jan 2022

Citation :

Shaima A Abushaala, Mohammed M Elsheh, "A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks," International Journal of Computer Trends and Technology (IJCTT), vol. 70, no. 1, pp. 1-3, 2022. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V70I1P101

Abstract

It is well known that there are three basic tasks in Natural language processing(NLP) (Tokenization, Part-Of-Speech tagging, Named Entity Recognition), which in turn can be divided into two levels, lexical and syntactic. The former level includes tokenization. The latter level includes part of speech (POS) and the named entity recognition (NER) tasks. Recently, deep learning has been shown to perform well in various natural language processing tasks such as POS, NER, sentiment analysis, language modelling, and other tasks. In addition, it performs well without the need for manually designed external resources or time-consuming feature engineering. In this study, the focus is on using Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BLSTM), Bidirectional Long Short-Term Memory with Conditional Random Field (BLSTM-CRF), and Long Short-Term Memory with Conditional Random Field (LSTM-CRF) deep learning techniques for tasks in Syntactic level and comparing their performance. The models are trained and tested by using the KALIMAT corpus. The obtained results show that a BLSTM-CRF model overcame the other models in the NER task. As for the POS task, the BLSTM-CRF model obtained the highest F1-score compared to the other models.

Keywords

Natural Language Processing, Deep learning, Part-of-Speech tagging, Named-Entity Recognition.

References

[1] Jurafsky, D. and H. James, Martin: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics., (2008) Prentice-Hall, Englewood Cliffs.
[2] Huang, Z., W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, (2015).
[3] Cheriet, M. and M. Beldjehem, , Visual processing of Arabic handwriting: challenges and new directions, Summit on Arabic and Chinese handwriting (SACH’06), Washington-DC, USA, (2006) 129-136.
[4] Liddy, E.D., Natural language processing, (2001).
[5] Ma, X. and E. Hovy, End-to-end sequence labelling via bi-directional lstm-cnns-crf, arXiv preprint arXiv:1603.01354, (2016).
[6] Dr Mo El-Haj, R.K., KALIMAT a Multipurpose Arabic Corpus, 20132015-04-09Available from: https://sourceforge.net/projects/kalimat/files/?source=navbar.
[7] Habash, N., A. Soudi, and T. Buckwalter, On Arabic transliteration