A Comprehensive Analysis of Machine Learning Techniques for Churn Prediction in E-Commerce: A Comparative Study

Saurabh Kumar; Suman Deep; Pourush Kalra

doi:10.14445/22312803/ IJCTT-V72I5P119

Research Article | Open Access | Download PDF

Volume 72 | Issue 5 | Year 2024 | Article Id. IJCTT-V72I5P119 | DOI : https://doi.org/10.14445/22312803/IJCTT-V72I5P119

A Comprehensive Analysis of Machine Learning Techniques for Churn Prediction in E-Commerce: A Comparative Study

Saurabh Kumar, Suman Deep, Pourush Kalra

Received	Revised	Accepted	Published
20 Mar 2024	23 Apr 2024	08 May 2024	17 May 2024

Citation :

Saurabh Kumar, Suman Deep, Pourush Kalra, "A Comprehensive Analysis of Machine Learning Techniques for Churn Prediction in E-Commerce: A Comparative Study," International Journal of Computer Trends and Technology (IJCTT), vol. 72, no. 5, pp. 163-170, 2024. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V72I5P119

Abstract

In the fiercely competitive landscape of e-commerce, understanding and mitigating customer churn has become paramount for sustainable business growth. This paper presents a thorough investigation into the application of machine learning techniques for churn prediction in e-commerce, aiming to provide actionable insights for businesses seeking to enhance customer retention strategies. We conduct a comparative study of various machine learning algorithms, including traditional statistical methods and ensemble techniques, leveraging a rich dataset sourced from Kaggle. Through rigorous evaluation, we assess the predictive performance, interpretability, and scalability of each method, elucidating their respective strengths and limitations in capturing the intricate dynamics of customer churn. We identified the XGBoost classifier to be the best performing. Our findings not only offer practical guidelines for selecting suitable modeling approaches but also contribute to the broader understanding of customer behavior in the e-commerce domain. Ultimately, this research equips businesses with the knowledge and tools necessary to proactively identify and address churn, thereby fostering long-term customer relationships and sustaining competitive advantage.

Keywords

Customer churn, E-commerce, Machine learning techniques, Predictive performance, Sustainable business growth.

References

[1] Shahriar Akter et al., “How to Improve Firm Performance Using Big Data Analytics Capability and Business Strategy Alignment?,” International Journal of Production Economics, vol. 182, pp. 113-131, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[2] J. Burez, and D. Van Den Poel, “Handling Class Imbalance in Customer Churn Prediction,” Expert Systems with Applications, vol. 36, no. 3, pp. 4626-4636, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[3] N.V. Chawla et al., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, 321-357, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Paulo Cortez, and Alice Silva, “Using Data Mining to Predict Secondary School Student Performance,” PortoProceedings of 5th Annual Future Business Technology Conference, Porto, pp. 1-8, 2008.
[Google Scholar] [Publisher Link]
[5] Jerome H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.
[Google Scholar] [Publisher Link]
[6] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Model Assessment and Selection, The Elements of Statistical Learning, Springer, New York, pp. 219-259, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Haibo He, and Edwardo A. Garcia, “Learning from Imbalanced Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Japkowicz Nathalie, and Shaju Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429-449, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Max Kuhn, and Kjell Johnson, Measuring Performance in Classification Models, Applied Predictive Modeling, pp. 247-273, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Aurélie Lemmens, and Christophe Croux, “Bagging and Boosting Classification Trees to Predict Churn,” Journal of Marketing Research, vol. 43, no. 2, pp. 276-286, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Charles X. Ling, and Chenghui Li, “Data Mining for Direct Marketing: Problems and Solutions,” Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York City, pp. 73-79, 1998.
[Google Scholar] [Publisher Link]
[12] Haibo He et al., “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, pp. 1322-1328, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Young Hoon Kim, and D.J. Kim, “A Study of Online Transaction Self-Efficacy, Consumer Trust, and Uncertainty Reduction in Electronic Commerce Transaction,” Proceedings of the 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, pp. 170c-170c, 2005.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Christoph Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Leanpub, pp. 1-320, 2020.
[Google Scholar] [Publisher Link]
[15] Kevin P. Murphy, Probabilistic Machine Learning Advanced Topics, MIT Press, pp. 1-1360, 2023.
[Google Scholar] [Publisher Link]
[16] Guillermo Navas-Palencia, “Optimal Binning: Mathematical Programming Formulation,” Arxiv, pp. 1-22, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Ke Peng, Yan Peng, and Wenguang Li, “Research on Customer Churn Prediction and Model Interpretability Analysis,” Plos One, vol. 18, no. 12, pp. 1-26, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Ankit Verma, ECommerce Customer Churn Analysis and Prediction, Kaggle, 2021. [Online]. Available: https://www.kaggle.com/datasets/ankitverma2010/ecommerce-customer-churn-analysis-and-prediction/data
[19] Naeem Siddiqi, Credit Risk Scorecards Developing and Implementing Intelligent Credit Scoring, Wiley, pp. 1-208, 2012.
[Google Scholar] [Publisher Link]