Comparison of Apache SOLR Search Spellcheck String Distance Measure – Levenshtein, Jaro Winkler, and N-Gram

  IJCTT-book-cover
 
         
 
© 2021 by IJCTT Journal
Volume-69 Issue-3
Year of Publication : 2021
Authors : Parameswara Rao Kandregula
DOI :  10.14445/22312803/IJCTT-V69I3P101

How to Cite?

Parameswara Rao Kandregula, "Comparison of Apache SOLR Search Spellcheck String Distance Measure – Levenshtein, Jaro Winkler, and N-Gram," International Journal of Computer Trends and Technology, vol. 69, no. 3, pp. 1-4, 2021. Crossref, 10.14445/22312803/IJCTT-V69I3P101

Abstract

String Distance is one of the key metrics for string comparison used in spell correction, and Levenshtein, JaroWinkler, and N-Gram are famous string distance and similarity measuring algorithms. Spelling mistakes are often not more than two or three characters for the normal user when typing on a website search functionality. In this article, in the context of the e-commerce website, we will test and compare the results of spellcheck distance measure implementations provided by apache SOLR search, which are Levenshtein, JaroWinkler, and N-Gram

Keywords
Search engine, SOLR, Natural language processing, String distance, Levenshtein.

Reference
[1] Levenshtein distance. [Online]. Available: https://en.wikipedia.org/wiki/Levenshtein_distance
[2] Jaro Winkler distance. [Online]. Available: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
[3] N-Gram distance. [Online]. Available: https://lucene.apache.org/core/8_0_0/suggest/org/apache/lucene/searc h/spell/NGramDistance.html
[4] Solr StringDistance [Online]. Available: https://lucene.apache.org/core/8_0_0/suggest/org/apache/lucene/searc h/spell/StringDistance.html
[5] Kaggle Flipkart Products Data [Online]. Available: https://www.kaggle.com/PromptCloudHQ/flipkart-products
[6] Common spelling mistakes [Online]. Available: https://www.lexico.com/grammar/common-misspelling