Automatic Document Summarization System Based on Natural Language Processing and Artificial Intelligent Techniques
M. I. Elalami, A. E. Amin, M. G. Doweidar, "Automatic Document Summarization System Based on Natural Language Processing and Artificial Intelligent Techniques". International Journal of Computer Trends and Technology (IJCTT) V58(1):46-57, April 2018. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract
Extract summary optimization is the process of creating a small version from the original text Satisfy user requirements. Extraction approach is one of way of extracting the most important sentences in document, t his approach is used to select sentences after calculating the score for each sentence, and based on user defined summary ratio the top n sentences are selected as summary. The selection of the informative sentence is a challenge for extraction based autom atic text summarization researchers. This research applied extraction based automatic single document text summarization method using the particle swarm optimization algorithm to find the best feature weight score to differentiate between important and non important feature. The Recall - Oriented Understanding for Gusting Evaluation (F - measure) toolkit was used for measuring performance. DUC 2007 data sets provided by the Document Understanding Conference 2007 were used in the evaluation process. The summary that generated by Particle Swarm Optimization algorithm was compared with other algorithms namely Latent Semantic Analysis, Gong&lui, and Vector Space Model, and used Particle Swarm Optimization algorithm as benchmark. Experimental results showed that the summaries produced by the Particle Swarm Optimization algorithm are better than another algorithm
Reference
[1] Beel, Joeran, et al. "paper recommender systems: a literature survey." International Journal on Digital Libraries 17.4 (2016): 305-338.
[2] Zhu, Xiaojin. "Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing." IJCAI. 2013.
[3] Cambria, Erik, et al. "New avenues in opinion mining and sentiment analysis." IEEE Intelligent Systems 28.2 (2013): 15-21.
[4] Di Fabbrizio, Giuseppe, Ahmet Aker, and Robert Gaizauskas. "Summarizing online reviews using aspect rating distributions and language modeling." IEEE Intelligent Systems 28.3 (2013): 28-37.
[5] Das, Dipanjan, and André FT Martins. "A survey on automatic text summarization." Literature Survey for the Language and Statistics II course at CMU 4 (2007): 192-195.
[6] Das, Dipanjan, and André FT Martins. "A survey on automatic text summarization." Literature Survey for the Language and Statistics II course at CMU 4 (2007): 192-195.
[7] Nenkova, Ani. "Automatic text summarization of newswire: Lessons learned from the document understanding conference." AAAI. Vol. 5. 2005.
[8] Yeh, Jen-Yuan, et al. "Text summarization using a trainable summarizer and latent semantic analysis." Information processing & management 41.1 (2005): 75-95.
[9] Agarwal, Shashank, and Hong Yu. "Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion." Bioinformatics 25.23 (2009): 3174-3180.
[10] Nenkova, Ani, and Kathleen McKeown. "Automatic summarization." Foundations and Trends® in Information Retrieval 5.2–3 (2011): 103-233.
[11] Ramos, Juan. "Using tf-idf to determine word relevance in document queries." Proceedings of the first instructional conference on machine learning. Vol. 242. 2003.
[12] Metzler, Donald, and Tapas Kanungo. "Machine learned sentence selection strategies for query-biased summarization." Sigir learning to rank workshop. 2008.
[13] Gupta, Vishal, and Gurpreet Singh Lehal. "A survey of text summarization extractive techniques." Journal of emerging technologies in web intelligence 2.3 (2010): 258-268.
[14] Li, Chen, Xian Qian, and Yang Liu. "Using Supervised Bigram-based ILP for Extractive Summarization." ACL (1). 2013.
[15] Cheng, Jianpeng, and Mirella Lapata. "Neural summarization by extracting sentences and words." arXiv preprint arXiv:1603.07252 (2016).
[16] Torres-Moreno, Juan-Manuel, ed. Automatic text summarization. John Wiley & Sons, 2014.
[17] Ferreira, Rafael, et al. "A multi-document summarization system based on statistics and linguistic treatment." Expert Systems with Applications 41.13 (2014): 5780-5787.
[18] Hellendoorn, Hans, and Dimiter Driankov, eds. Fuzzy model identification: selected approaches. Springer Science & Business Media, 2012.
[19] Krishnamoorthy, Niveda, et al. "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge." AAAI. Vol. 1. 2013.
[20] Aggarwal, Charu C., and ChengXiang Zhai, eds. Mining text data. Springer Science & Business Media, 2012.
[21] Poibeau, Thierry, et al., eds. Multi-source, multilingual information extraction and summarization. Springer Science & Business Media, 2012.
[22] Ganesan, Kavita, ChengXiang Zhai, and Jiawei Han. "Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions." Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 2010.
[23] Turney, Peter D., and Patrick Pantel. "From frequency to meaning: Vector space models of semantics." Journal of artificial intelligence research 37 (2010): 141-188.
[24] Celikyilmaz, Asli, and Dilek Hakkani-Tur. "A hybrid hierarchical model for multi-document summarization." Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
[25] Shen, Chao, and Tao Li. "Multi-document summarization via the minimum dominating set." Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010.
[26] Wang, Dingding, Shenghuo Zhu, and Tao Li. "SumView: A Web-based engine for summarizing product reviews and customer opinions." Expert Systems with Applications 40.1 (2013): 27-33.
[27] Niazi, Muaz, and Amir Hussain. "Agent-based computing from multi-agent systems to agent-based models: a visual survey." Scientometrics 89.2 (2011): 479.
[28] Ramage, Daniel, Susan T. Dumais, and Daniel J. Liebling. "Characterizing microblogs with topic models." ICWSM 10 (2010): 1-1.
[29] Bates, Douglas, et al. "lme4: Linear mixed-effects models using Eigen and S4." R package version 1.7 (2014): 1-23.
[30] Lin, Hui, and Jeff Bilmes. "A class of submodular functions for document summarization." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
[31] Ozsoy, Makbule Gulcin, Ilyas Cicekli, and Ferda Nur Alpaslan. "Text summarization of turkish texts using latent semantic analysis." Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 2010.
[32] He, Zhanying, et al. "Document Summarization Based on Data Reconstruction." AAAI. 2012.
[33] Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.
[34] Saggion, Horacio, et al. "Multilingual summarization evaluation without human models." Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010.
[35] Gelman, Andrew, et al. Bayesian data analysis. Vol. 2. Boca Raton, FL: CRC press, 2014.
[36] Hoffman, Matthew, Francis R. Bach, and David M. Blei. "Online learning for latent dirichlet allocation." advances in neural information processing systems. 2010.
[37] Yang, Guangbing, et al. "A novel contextual topic model for multi-document summarization." Expert Systems with Applications 42.3 (2015): 1340-1352.
[38] Delort, Jean-Yves, and Enrique Alfonseca. "DualSum: a topic-model based approach for update summarization." Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2012.
[39] Pan, Li, et al. "A two?stage win–win multiattribute negotiation model: optimization and then concession." Computational Intelligence 29.4 (2013): 577-626.
[40] Lin, Hui, and Jeff Bilmes. "Multi-document summarization via budgeted maximization of submodular functions." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010.
[41] Mihalcea, Rada, and Paul Tarau. "A language independent algorithm for single and multiple document summarization." Proceedings of IJCNLP. Vol. 5. 2005.
[42] Hermann, Karl Moritz, et al. "Teaching machines to read and comprehend." Advances in Neural Information Processing Systems. 2015.
[43] Hu, Minqing, and Bing Liu. "Mining opinion features in customer reviews." AAAI. Vol. 4. No. 4. 2004.
[44] Joachims, Thorsten. Learning to classify text using support vector machines: Methods, theory and algorithms. Kluwer Academic Publishers, 2002.
[45] Hearst, Marti A., et al. "Support vector machines." IEEE Intelligent Systems and their applications 13.4 (1998): 18-28.
[46] Seymore, Kristie, Andrew McCallum, and Roni Rosenfeld. "Learning hidden Markov model structure for information extraction." AAAI-99 workshop on machine learning for information extraction. 1999.
[47] Shen, Dou, et al. "Document Summarization Using Conditional Random Fields." IJCAI. Vol. 7. 2007.
[48] Alguliev, Rasim M., Ramiz M. Aliguliyev, and Chingiz A. Mehdiyev. "Sentence selection for generic document summarization using an adaptive differential evolution algorithm." Swarm and Evolutionary Computation 1.4 (2011): 213-222.
[49] Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents". Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.
[50] Wang, Dingding, et al. "Comparative document summarization via discriminative sentence selection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.3 (2012): 12.
[51] Lin, Hui, and Jeff Bilmes. "A class of submodular functions for document summarization." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
Keywords
Artificial Intelligent, Natural Language Processing, automatic text summarization techniques, particle swarm optimization.