Predicting Used Car Prices with Regression Techniques

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-6
Year of Publication : 2024
Authors : Saurabh Kumar, Avinash Sinha
DOI :  10.14445/22312803/IJCTT-V72I6P118

How to Cite?

Saurabh Kumar, Avinash Sinha, "Predicting Used Car Prices with Regression Techniques," International Journal of Computer Trends and Technology, vol. 72, no. 6, pp. 132-141, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I6P118

Abstract
This paper explores the predictive modeling of used car prices using regression techniques, focusing on the Indian automotive market. Utilizing historical data from CarDekho.com, the goal of this paper is to identify key predictors of used car prices and develop a robust multiple linear regression model. The dataset includes various features such as model, year of manufacture, kilometers are driven, fuel type, seller type, transmission type, number of previous owners, mileage, engine size, and maximum power. Data preprocessing involved converting units to numerical values and calculating the car’s age. The exploratory data analysis revealed that car age, brand, and power are significant determinants of price, while the number of seats and engine size had less impact. Multiple models were tested, including transformations and variable selection methods. The final model, employing the Weighted Least Squares (WLS) method, explained 90% of the variation in used car prices. Model validation showed a high correlation between actual and predicted prices, with a mean absolute percentage error (MAPE) of approximately 20%. The results highlight the efficacy of regression techniques in price prediction and provide valuable insights for consumers and sellers in the used car market. This study demonstrates the importance of data-driven approaches in understanding market dynamics and optimizing pricing strategies.

Keywords
Predictive modeling, Artificial intelligence, Used car prices, Regression analysis, Machine learning.

Reference

[1] [Online]. Available: https://www.cardekho.com/
[2] Gareth James et al., An Introduction to Statistical Learning, New York: Springer, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Used Vehicle Dataset from Kaggle. [Online]. Available: https://www.kaggle.com/datasets/nehalbirla/motorcycle-dataset
[4] Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining, Introduction to Linear Regression Analysis, John Wiley & Sons, United States, 2021.
[Google Scholar] [Publisher Link]
[5] Albert Cohen, and Giovanni Migliorati, “Optimal Weighted Least-Squares Methods,” The SMAI Journal of Computational Mathematics, vol. 3, pp. 181-203, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Doan Van Thai et al., “Prediction Car Prices Using Quantify Qualitative Data and Knowledge-Based System,” 11th International Conference on Knowledge and Systems Engineering (KSE), Da Nang, Vietnam, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Nabarun Pal et al., “How Much is My Car Worth? A Methodology for Predicting Used Cars’ Prices Using Random Forest,” Advances in Information and Communication Networks: Future of Information and Communication Conference (FICC), vol. 886, pp. 413-422, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Andrés Camero et al., “Evolutionary Deep Learning for Car Park Occupancy Prediction in Smart Cities,” Learning and Intelligent Optimization: 12th International Conference, LION 12, Kalamata, Greece, pp. 386-401, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Sayan Sinha, Riazul Azim, and Sourav Das, “Linear Regression on Car Price Prediction,” 2020.
[Google Scholar]
[10] Luis A. San-José et al., “Optimal Price and Quantity under Power Demand Pattern and Non-Linear Holding Cost,” Computers & Industrial Engineering, vol. 129, pp. 426-434, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Ali Umut Guler, Kanishka Misra, and Vishal Singh, “Heterogeneous Price Effects of Consolidation: Evidence from the Car Rental Industry,” Marketing Science, vol. 39, no. 1, pp. 52-70, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Ahmed Fathalla et al., “Deep End-to-End Learning for Price Prediction of Second-Hand Items,” Knowledge and Information Systems, vol. 62, no. 12, pp. 4541-4568, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Enis Gegic et al., “Car Price Prediction Using Machine Learning Techniques,” TEM Journal, vol. 8, no. 1, pp. 113-118, 2019.
[Google Scholar] [Publisher Link]
[14] Robert H. Shumway, and David S. Stoffer, “ARIMA Models,” Time Series Analysis and Its Applications, pp. 75-163, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins, “Learning to Forget: Continual Prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451-2471, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Ali Shehadeh et al., “Machine Learning Models for Predicting the Residual Value of Heavy Construction Equipment: An Evaluation of Modified Decision Tree, Lightgbm, and Xgboost Regression,” Automation in Construction, vol. 129, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Kenneth P. Burnham, and David R. Anderson, “Multimodel Inference: Understanding AIC and BIC in Model Selection,” Sociological Methods & Research, vol. 33, no. 2, pp. 261-304, 2004.
[CrossRef] [Google Scholar] [Publisher Link]
[18] R.M. Sakia, “The Box-Cox Transformation Technique: A Review,” Journal of the Royal Statistical Society Series D: The Statistician, vol. 41, no. 2, pp. 169-178, 1992.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Aylin Alin, “Multicollinearity,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 3, pp. 370-374, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[20] James P. Stevens, “Outliers and Influential Data Points in Regression Analysis,” Psychological Bulletin, vol. 95, no. 2, pp. 334-344, 1984.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Amand F. Schmidt, and Chris Finan, “Linear Regression and the Normality Assumption,” Journal of Clinical Epidemiology, vol. 98, pp. 146-151, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Michael A. Poole, and Patrick N. O'Farrell, “The Assumptions of the Linear Regression Model,” Transactions of the Institute of British Geographers, pp. 145-158, 1971.
[CrossRef] [Google Scholar] [Publisher Link]