Meanings are more than just words: A Cross-Domain Question Answering Tool based on Unsupervised Semantic Feature Learning

Nripa Chetry; Debanjan Choudhary; Arindam Chatterjee; Dr. Gopichand Agnihotram

doi:10.14445/22312803/IJCTT-V53P113

Research Article | Open Access | Download PDF

Volume 53 | Number 1 | Year 2017 | Article Id. IJCTT-V53P113 | DOI : https://doi.org/10.14445/22312803/IJCTT-V53P113

Meanings are more than just words: A Cross-Domain Question Answering Tool based on Unsupervised Semantic Feature Learning

Nripa Chetry, Debanjan Choudhary, Arindam Chatterjee, Dr. Gopichand Agnihotram

Citation :

Nripa Chetry, Debanjan Choudhary, Arindam Chatterjee, Dr. Gopichand Agnihotram, "Meanings are more than just words: A Cross-Domain Question Answering Tool based on Unsupervised Semantic Feature Learning," International Journal of Computer Trends and Technology (IJCTT), vol. 53, no. 1, pp. 64-67, 2017. Crossref, https://doi.org/10.14445/22312803/IJCTT-V53P113

Abstract

The state-of-the-art Question Answering (QA) systems, either focus on the similarity in occur-rences of words, or similarity of passages. In this paper, we present a novel QA system, which strikes a harmony between word based similarity and context similarity based on short texts, rather than entire passages. Our system can be vaguely categorized as a cross domain FAQ based QA system. It tracks word based similarity through Latent Semantic Indexing (LSI) and then re-ranks the LSI results using an eXtreme Gradient Boosting(xgboost) classifier model. Sev-eral features trained from word embedding vectors, learned from the domain corpus are fed into the xgboost classifier. These features capture the semantic understanding of the questions or headings/sub-headings in our knowledge base. We have observed from our experiments that us-ing latent semantic indexing and re-ranking these results using the classifier gives better MRR at top 3 than information retrieval techniques. Its performance is comparable (if not better) to most state-of-the-art QA systems across domains.

Keywords

Its performance is comparable (if not better) to most state-of-the-art QA systems across domains.

References

1) Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990.
2) Indexing by latent semantic analysis. JASIS, 41(6):391–407.
3) Tom Kenter and Maarten de Rijke. 2015. Short text similarity with word embeddings. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ?15, pages 1411–1420, New York, NY, USA. ACM.
4) Zhiguo Wang and Abraham Ittycheriah. 2015. Faq-based question answering via word alignment. CoRR, abs/1507.02628.