Extracting Conjunction Patterns in Relation Triplets from Complex Requirement Sentence
Veera Prathap Reddy M, Prasad P.V.R.D, Manjunath Chikkamath, Karthikeyan Ponnalagu "Extracting Conjunction Patterns in Relation Triplets from Complex Requirement Sentence". International Journal of Computer Trends and Technology (IJCTT) V60(3):133-143 June 2018. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract
Automatically extracting knowledge from complex unstructured software requirement sentences is an important research challenge. The objective would be reducing human interpretation errors that contribute to more than 50% of overall software defects. In this paper, we propose pattern based open information extraction (OIE) approach towards addressing this challenge. Our proposed approach extracts meaningful relations from natural language sentences that are considered complex with conjunctive (correlative, coordinating and subordinating) structures. Our proposed approach exploits linguistic knowledge about English language grammar to identify pattern in requirement sentence and subsequently extract information according to the grammatical function of its constituents. We propose MRAlgo, an automated multiple-relation Verb centric information extraction algorithm specifically for software requirement engineering domain that can detect every action, subject and object when linked with conjunctions. We have evaluated MRAlgo by a random sample of sentences selected from public dataset of requirement sentences having conjunctive nature and few sentences from web, and obtained high precision and recall when compared to other Open information extraction approaches
Reference
[1] Shah, Unnati, and Jinwala, Devesh, “Resolving ambiguity in natural language specification to generate UML diagrams for requirements specification”, International Journal of Software Engineering, Technology and Applications, vol. 1, no. 2-4, pp. 308-334, 2015.
[2] Kotonya, Gerald and Sommerville, Ian, Requirements engineering: processes and techniques, 1998.
[3] Klaus Pohl, Requirements engineering: fundamentals, principles, and techniques, 2010.
[4] ClaesWohlin, Per Runeson, Host Martin, Magnus COhlsson, Bjorn Regnelland AndersWesslen, Experimentation in Software Engineering, 2012.
[5] Chung Lawrence, Nixon Brian A, Yu Eric and Mylopoulos, JohnNon-functional requirements in software engineering, vol. 5, 2012.
[6] Yarowsky, David, “Unsupervised word sense disambiguation rivaling supervised methods”, Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pp. 189-196, 1995.
[7] Brzeski, Vadim and Kraft, Reiner, Word sense disambiguation, 2007
[8] Jenny Rose Finkel, Trond Grenager, and Christopher Manning,“Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling”. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, pp.363-370, 2005. http://nlp.stanford.edu/ manning/papers/gibbscrf3.pdf
[9] Won, Miguel and Murrieta-Flores, Patricia and Martins, Bruno, “Ensemble Named Entity Recognition (NER): evaluating NER Tools in the identification of Place names in historical corpora”,Frontiers in Digital Humanities, vol. 5, p.2, 2018.
[10] D. Nadeau and S. Sekine, “A survey of named entity recognition and classification”, Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007.
[11] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The stanford corenlp natural language processing toolkit”, Proceedings of 52ndACL (System Demonstrations), pp. 55-60, 2014.
[12] J. Kottmann, B. Margulies, G. Ingersoll, I. Drost, J. Kosin, J. Baldridge, T. Goetz, T. Morton, W. Silva, A. Autayeu, et al., Apache OPENNLP, Online, (May 2011), www. opennlp. apache. org, 2011.
[13] Shu, Xiaokui and Cohen, Ron and others, Natural Language Toolkit (NLTK), 2010.
[14] Andrzej Bia?ecki, Robert Muir, Grant Ingersoll Lucid Imagination, “Apache lucene 4”, SIGIR 2012 workshop on open source information retrieval, p. 17, 2012.
[15] Kluegl, Peter and Toepfer, Martin and Beck, Philip-Daniel and Fette, Georg and Puppe, Frank, “UIMA Ruta: Rapid development of rule-based information extraction applications”, Natural Language Engineering, vol.22, no. 1, pp. 1-40, 2016.
[16] Honnibal, Matthew and Johnson, Mark, “An Improved Non-monotonic Transition System for Dependency Parsing”, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1373-1378, September-2015.
[17] Salama, Amr Rekaby and Menzel, Wolfgang, “Learning Context-Integration in a Dependency Parser for Natural Language”, Intelligent Natural Language Processing: Trends and Applications, pp.545-569, 2018.
[18] Gelbukh A., Calvo H. “Evaluation of the Dependency Parser. In: Automatic Syntactic Analysis Based on Selectional Preferences”, Studies in Computational Intelligence, vol. 765. Springer, Cham, 2018.
[19] Geagea, S and Zhang, S and Sahlin, N and Hasibi, F and Hameed, F and Rafiyan, E and Ekberg, M, Software Requirement Specification-Amazing Lunch Indicator, 2010.
[20] Danis, Bruno, Renaudier, Sylvain, Software Requirements Specification (SRS) for the Nodes Portal Toolkit (NPT), September,2011.
[21] DeWilde, Burton, “Textacy Documentation, Release 0.4.1”, 2017.
[22] A. Skusa, A. Regg, and J. Khler, “Extraction of biological interaction networks from scientific literature”, Briefings in Bioinformatics, vol. 6, no. 3, pp. 263 - 276, 2005.
[23] P. Zweigenbaum, D. Demner-Fushman, H. Yu, and K. B. Cohen, “Frontiers of biomedical text mining: current progress”, Briefings in bioinformatics, vol. 8, no. 5, pp. 358 - 375, 2007.
[24] Choi, Jinho D and Tetreault, Joel and Stent, Amanda, “It depends: Dependency parser comparison using a web-based evaluation tool”, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume: 1, pp. 387-396, 2015.
[25] Google’s new artificial intelligence can’t understand these sentences. Can you?. Washington Post. Retrieved 2016-12-18.
[26] A. M. Cohen and W. R. Hersh, “A survey of current work in biomedical text mining”, Briefings in bioinformatics, vol. 6, no. 1, pp. 57 -71, 2005.
[27] James R. Curran , Marc Moens, “Scaling context space”, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July 07-12, Philadelphia, Pennsylvania, 2002.
[28] Bunescu R, Mooney R, Ramani A, et al. “Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from MEDLINE”,Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL 06, pp 49–56, New York City, June 2006
[29] P. Srinivasan, “Text mining: generating hypotheses from medline”, Journal of the American Society for Information Science and Technology, vol. 55, no. 5, pp. 396 - 413, 2004.
[30] N. Collier, C. Nobata, and J.-i. Tsujii, “Extracting the names of genes and gene products with a hidden markov model”, Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, pp. 201-207, 2000.
[31] M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H.-P. Kriegel, “Extraction of semantic biomedical relations from text using conditional random fields”, BMC bioinformatics, vol. 9, no. 1, p. 207, 2008.
[32] D. Gildea and D. Jurafsky, “Automatic labeling of semantic roles”, Computational linguistics, vol. 28, no. 3, pp. 245 - 288, 2002.
[33] R. Feldman, Y. Regev, M. Finkelstein-Landau, E. Hurvitz, and B. Kogan, “Mining biomedical literature using information extraction”, Current Drug Discovery, vol. 2, no. 10, pp. 1923, 2002.
[34] J.-J. Kim, Z. Zhang, J. C. Park, and S.-K. Ng, “Biocontrasts: extracting and exploiting proteinprotein contrastive relations from biomedical literature”, Bioinformatics, vol. 22, no. 5, pp. 597605, 2006.
[35] A. Sharma, R. Swaminathan, and H. Yang, “A verb-centric approach for relationship extraction in biomedical text”, Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on. IEEE, pp.377385, 2010.
[36] Gambhir, Mahak, and Vishal Gupta. “Recent automatic text summarization techniques: a survey”. Artificial Intelligence Reviewpp 1-66, 47.1 (2017)
[37] Desai, Jayraj M., and Swapnil R. Andhariya. “Sentiment analysis approach to adapt a shallow parsing based sentiment lexicon”,Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on. IEEE, 2015
[38] Verborgh, Ruben, et al. “Triple Pattern Fragments: a low-cost knowledge graph interface for the Web”,Web Semantics: Science, Services and Agents on the World Wide Webpp 184 – 206, 37 (2016).
[39] Gangemi, Aldo and Presutti, Valentina and Reforgiato Recupero, Diego and Nuzzolese, Andrea Giovanni and Draicchio, Francesco and Mongiov`?, Misael,,”Semantic web machine reading with FRED”, ,Semantic Web, pp 873-893,2017.
[40] Cafarella, Michael J., Michele Banko, and Oren Etzioni. “Open information extraction from the web”. U.S. Patent No. 8,938,410. 20 Jan. 2015.
[41] Clark, Kevin, and Christopher D. Manning, “Improving coreference resolution by learning entity-level distributed representations”, Association for Computational Linguistics (ACL),arXiv 2016.
[42] Alan Akbik and Jurgen Brob. Wanderlust,“Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns”,1st Workshop on Semantic Search at 18th. WWWW Conference, 2009
[43] Banko, Michele and Cafarella, Michael J and Soderland, Stephen and Broadhead, Matthew and Etzioni, Oren, “Open information extraction from the web”, IJCAI,pp 2670-2676, Vol 7, 2007.
[44] Fader, Anthony and Soderland, Stephen and Etzioni, Oren, “Identifying relations for open information extraction”, Proceedings of the conference on empirical methods in natural language processing, pp 1535-1545, 2011.
[45] Schmitz, Michael and Bart, Robert and Soderland, Stephen and Etzioni, Oren and others, “Open language learning for information extraction”,Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 523-534, 2012.
[46] Del Corro, Luciano and Gemulla, Rainer, “Clausie: clause-based open information extraction”, Proceedings of the 22nd international conference on World Wide Web, pp 355-366, 2013.
[47] Bast, Hannah and Haussmann, Elmar, “Open information extraction via contextual sentence decomposition”, Semantic Computing (ICSC), 2013 IEEE Seventh International Conference, pp 154-159, 2013.
[48] Alan Akbik and Alexander Loser,“Kraken: N-ary facts in open information extraction”, In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pp 52-56, 2012.
[49] Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. “Open information extraction: The second generation”In Proceedings of the Conference on Articial Intelligence, pp 3-10, 2011.
[50] Pablo Gamallo, Marcos Garcia, and Santiago Fernandez-Lanza. “Dependency-based open information extraction”, In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pp 10-18, 2012.
[51] Fei Wu and Daniel S. Weld. “Open information extraction using wikipedia”, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 118-127, 2010.
Keywords
Multiple-relation extraction, Natural Language Processing (NLP), dependency parser, verbbased algorithm.