Extractive Summarization Method for Arabic Text - ESMAT

Mohammed Salem Binwahlan

doi:https://doi.org/10.14445/22312803/IJCTT-V21P119

Research Article | Open Access | Download PDF

Volume 21 | Number 1 | Year 2015 | Article Id. IJCTT-V21P119 | DOI : https://doi.org/10.14445/22312803/IJCTT-V21P119

Extractive Summarization Method for Arabic Text - ESMAT

Mohammed Salem Binwahlan

Citation :

Mohammed Salem Binwahlan, "Extractive Summarization Method for Arabic Text - ESMAT," International Journal of Computer Trends and Technology (IJCTT), vol. 21, no. 1, pp. 103-109, 2015. Crossref, https://doi.org/10.14445/22312803/IJCTT-V21P119

Abstract

Due to the huge and rapid growth of online data makes search such massive data collections and finding the relevant information a tough task and time consumption. For this reason, research on automatic summarization techniques has received much attention from industry and academia. Unlike English text which has received much attention of the researchers in this field, Arabic text is still lake to such serious investigations. This reason gave the author of this paper, strong motivation to participate in a pushing Arabic language into the concern domain of automatic text summarization researchers by proposing an extractive summarization method. The proposed method generates a summary of an original document based on a linear combination of text features having different structures. Five summarizers (AQBTSS, Gen–Summ, LSA–Summ, Sakhr and Baseline–1) are used in this study as benchmarks. The proposed method and the benchmarks are evaluated using EASC – the Essex Arabic Summaries Corpus. The results showed that the proposed method performs well, based on recall, precision and average scores, more than the five benchmarks. A good performance achieved by the proposed method proved that the focus on those more complicated features, rather than simple ones, could guide to the most important content of any document.

Keywords

Automatic text summarization, summary, sentence similarity, term frequency, text feature.

References

[1] Mani, I. (2001). Automatic Summarization. (1st ed.). Amsterdam: John Benjamins Publishing Company.
[2] Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development. 2(92), 159-165.
[3] Edmundson, H. P. (1969). New Methods in Automatic Extracting. Journal of the Association for Computing Machinery. 16(2), 264-285.
[4] Baxendale, P. (1958). Machine-made Index for Technical Literature - an Experiment. IBM Journal of Research Development. 2(4), 354-361.
[5] Zechner, K. (1996). Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences. In Proceedings of the 16th International Conference on Computational Linguistics. 986–989, Copenhagen, Denmark.
[6] Binwahlan, M. S., Salim, N., & Suanmali, L. (2009b). Swarm based features selection for text summarization. IJCSNS International Journal of Computer Science and Network Security, 9(1), 175–179.
[7] Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek. Query Based Arabic Text Summarization. International Journal of Computer Science And Technology. 4(2), 2013, Pp. 35-39
[8] Imam I. , Hamouda A., Abdul Khalek H, A. An Ontology-based Summarization System for Arabic Documents. International Journal of Computer Applications Volume 74– No.17, 2013, pp.0975 – 8887
[9] D’Avanzo E., Magnini B., Valli A. Keyphrase Extraction for Summarization Purposes: The LAKE System at DUC2004. In Proceedings of the 4th Document Understanding Conferences. DUC.
[10] HIRAO T., SUZUKI J., ISOZAKI H. and MAEDA E.. NTT`s Multiple Document Summarization System for DUC 2004. In Proceedings of the 4th Document Understanding Conferences. DUC.
[11] Gulcin. M., Ilyas O., Cicekli F., Alpaslan N. Text Summarization of Turkish Texts using Latent Semantic Analysis. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 869–876, Beijing, August 2010
[12] Ko Y., Seo J. An effective sentence-extraction technique using contextual information and statistical approaches for text summarization. Pattern Recognition Letters 29 (2008) 1366– 1371
[13] Kim, J., Kim, J., Hwang, D., 2001. Korean text summarization using an Aggregation Similarity. In: Proc. 5th Internat. Workshop Information Retrieval with Asian Languages, pp. 111–118.
[14] Summarisation Corpora, [Online] Available: http privatewww.essex.ac.uk/~melhaj/easc.htm, (14-01- 2013
[15] F. Douzidia and G. Lapalme. 2004. Lakhas, an Arabic Summarising System. In Proceedings of the 4th Document Understanding Conferences , pages 128–135. DUC.
[16] J. Conroy, J. Schlesinger, D. O’Leary, and J. Goldstein Back to Basics: CLASSY 2006. In Proceedings of the 6th Document Understanding Conferences. DUC.
[17] El-Haj et. al. ] M. El-Haj, U. Kruschwitz, and C. Fox. Experimenting with Automatic Text Summarization for Arabic. In Zygmunt Vetulani, editor, 4th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, LTC`09, ”Lecture Notes in Artificial Intelligence”, pages 490–499, Poznan, Poland, 2009. Springer.
[18] Salton, G., Wong A., and Yang, S. 1975. A Vector Space Model for Automatic Indexing. Communications of the ACM, vol. 18, no. 11, (pp. 613–620).
[19] Elhaj M. Multi-document Arabic Text Summarisation. PhD thesis, 2012, University of Essex
[20] Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6):391–407, 1990.
[21] Sophia Ananiadou, Hideki Mima,“An Application and Evaluation of the C/NC-value Approach for the Automatic term Recognition of Multi-Word units in Japanese International Journal of Terminology, Vol. 6, No. 2, pp 175– 194, 2000.
[22] Ibrahim Sobh, Nevin Darwish, Magda Fayek. A Trainable Arabic Bayesian Extractive Generic Text Summarizer.
[23] Ibrahim Sobh, Nevine Darwish , Magda Fayek. Evaluation Approaches for an Arabic Extractive Generic Text Summarization System.
[24] Ahmed Ibrahim, Tarek Elghazaly, Mervat Gheith. A Novel Arabic Text Summarization Model Based on Rhetorical Structure Theory and Vector Space Model. International Journal of Computational Linguistics and Natural Language Processing. Vol 2 Issue 8 August 2013
[25] Aqil M. Azmi, Suha Al-Thanyyan. Computer Speech and Language 26 (2012) 260–273 A text summarizer for Arabic
[26] Fahad Alotaiby, Salah Foda and Ibrahim Alkharashi. New approaches to automatic headline generation for Arabic documents. Journal of Engineering and Computer Innovations Vol. 3(1), pp. 11-25, February 2012
[27] Mahmoud El-Haj and Paul Rayson. Using a Keyness Metric for Single and Multi Document Summarisation Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, pages 64–71 Sofia, Bulgaria, August 9 2013. C 2013 Association for Computational Linguistics
[28] Salton, G., 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Publishing Company.
[29] Horacio Saggion and Robert Gaizauskas. Multi-document summarization by cluster/profile relevance and redundancy removal. In Proceedings of the 4th Document Understanding Conferences. DUC.
[30] Chikashi Nobata and Satoshi Sekine.CRL/NYU Summarization System at DUC-2004. In Proceedings of the 4th Document Understanding Conferences. DUC.
[31] Lin, C. Y. and Hovy, E. (1997). Identifying Topics by Position. In Proceedings of the Fifth conference on applied natural language processing. March. San Francisco, CA, USA, 283- 290.
[32] Hovy, E. H. and C-Y. Lin. (1999). Automated Text Summarization in SUMMARIST. In Mani I. and Maybury M. (Eds.). Advances in Automated Text Summarization. (pp. 81– 94). Cambridge: MIT Press.
[33] Enrique Alfonseca, Jos´e Mar´?a Guirao, Antonio Moreno- Sandoval. Description of the UAM system for generating very short summaries at DUC-2004. In Proceedings of the 4th Document Understanding Conferences. DUC.
[34] Lin, C. Y. (2004). Rouge: A Package for Automatic Evaluation of Summaries. Proceedings of the Workshop on Text Summarization Branches Out, 42nd Annual Meeting of the Association for Computational Linguistics. 25–26 July. Barcelona, Spain, 74-81.