Comparison of Apache SOLR Search Spellcheck String Distance Measure – Levenshtein, Jaro Winkler, and N-Gram
|© 2021 by IJCTT Journal|
|Year of Publication : 2021|
|Authors : Parameswara Rao Kandregula|
|DOI : 10.14445/22312803/IJCTT-V69I3P101|
How to Cite?
Parameswara Rao Kandregula, "Comparison of Apache SOLR Search Spellcheck String Distance Measure – Levenshtein, Jaro Winkler, and N-Gram," International Journal of Computer Trends and Technology, vol. 69, no. 3, pp. 1-4, 2021. Crossref, 10.14445/22312803/IJCTT-V69I3P101
String Distance is one of the key metrics for string comparison used in spell correction, and Levenshtein, JaroWinkler, and N-Gram are famous string distance and similarity measuring algorithms. Spelling mistakes are often not more than two or three characters for the normal user when typing on a website search functionality. In this article, in the context of the e-commerce website, we will test and compare the results of spellcheck distance measure implementations provided by apache SOLR search, which are Levenshtein, JaroWinkler, and N-Gram
Search engine, SOLR, Natural language processing, String distance, Levenshtein.
 Levenshtein distance. [Online]. Available: https://en.wikipedia.org/wiki/Levenshtein_distance
 Jaro Winkler distance. [Online]. Available: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
 N-Gram distance. [Online]. Available: https://lucene.apache.org/core/8_0_0/suggest/org/apache/lucene/searc h/spell/NGramDistance.html
 Solr StringDistance [Online]. Available: https://lucene.apache.org/core/8_0_0/suggest/org/apache/lucene/searc h/spell/StringDistance.html
 Kaggle Flipkart Products Data [Online]. Available: https://www.kaggle.com/PromptCloudHQ/flipkart-products
 Common spelling mistakes [Online]. Available: https://www.lexico.com/grammar/common-misspelling