Meanings are more than just words: A Cross-Domain Question Answering Tool based on Unsupervised Semantic Feature Learning
Nripa Chetry, Debanjan Choudhary, Arindam Chatterjee, Dr. Gopichand Agnihotram "Meanings are more than just words: A Cross-Domain Question Answering Tool based on Unsupervised Semantic Feature Learning". International Journal of Computer Trends and Technology (IJCTT) V53(2):64-67, November 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
The state-of-the-art Question Answering (QA) systems, either focus on the similarity in occur-rences of words, or similarity of passages. In this paper, we present a novel QA system, which strikes a harmony between word based similarity and context similarity based on short texts, rather than entire passages. Our system can be vaguely categorized as a cross domain FAQ based QA system. It tracks word based similarity through Latent Semantic Indexing (LSI) and then re-ranks the LSI results using an eXtreme Gradient Boosting(xgboost) classifier model. Sev-eral features trained from word embedding vectors, learned from the domain corpus are fed into the xgboost classifier. These features capture the semantic understanding of the questions or headings/sub-headings in our knowledge base. We have observed from our experiments that us-ing latent semantic indexing and re-ranking these results using the classifier gives better MRR at top 3 than information retrieval techniques. Its performance is comparable (if not better) to most state-of-the-art QA systems across domains.
1) Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990.
2) Indexing by latent semantic analysis. JASIS, 41(6):391–407.
3) Tom Kenter and Maarten de Rijke. 2015. Short text similarity with word embeddings. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ?15, pages 1411–1420, New York, NY, USA. ACM.
4) Zhiguo Wang and Abraham Ittycheriah. 2015. Faq-based question answering via word alignment. CoRR, abs/1507.02628.
Its performance is comparable (if not better) to most state-of-the-art QA systems across domains.