Customer Churn Analysis in the Telecom Sector: Prediction and Evaluation with a Machine Learning and Data Science Approach
Abstract
This study presents a comprehensive data analysis conducted for customer churn prediction in the telecom sector. Using IBM’s TELCO dataset, various machine learning libraries were employed. Three different models (Logistic Regression, Random Forest, and XGBoost) were developed on data from 7,043 customers and compared through a hybrid ensemble approach. Class imbalance was addressed with the SMOTE technique, and the best performance was obtained from the ensemble model (Accuracy: 0.8042, F1 Score: 0.6344). In addition, 15+ advanced feature engineering techniques and multiple feature selection algorithms were applied to boost model success. The experimental results include a detailed analysis of the hybrid system’s outcomes under different conditions and constraints.
References
W. Verbeke, D. Martens, C. Mues, and B. Baesens, “Building comprehensible customer churn prediction models,” Expert Systems with Applications, vol. 39, no. 12, pp. 10091–10101, 2012.
A. Ahmad, A. Jafar, and K. Aljoumaa, “Customer churn prediction in telecom using machine learning in big data platform,” Journal of Big Data, vol. 6, no. 1, pp. 1–24, 2019.
A. Lemmens and C. Croux, “Bagging and boosting classification trees to predict churn,” Journal of Marketing Research, vol. 43, no. 2, pp. 276–286, 2006.
Y. Huang, B. Huang, and M.T. Kechadi, “A rulebased method for customer churn prediction in telecommunication services,” Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 411–422, 2012.
A. Keramati and S.M.S. Ardabili, “Churn analysis for an Iranian mobile operator,” Telecommunications Policy, vol. 35, no. 4, pp. 344–356, 2011.
E. Stripling, S. vanden Broucke, K. Antonio, B. Baesens, and M. Snoeck, “Profit maximizing logistic regression modeling for customer churn prediction,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 5, pp. 895–905, 2018.
T. Vafeiadis, K.I. Diamantaras, G. Sarigiannidis, and K.Ch. Chatzisavvas, “A comparison of machine learning techniques for customer churn prediction,” Simulation Modelling Practice and Theory, vol. 55, pp. 1–9, 2015.
A. De Caigny, K. Coussement, and K.W. De Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” European Journal of Operational Research, vol. 269, no. 2, pp. 760–772, 2018.
Y. Xie, X. Li, E.W.T. Ngai, and W. Ying, “Customer churn prediction using improved balanced random forests,” Expert Systems with Applications, vol. 36, no. 3, pp. 5445–5449, 2009.
K. Coussement and D. Van den Poel, “Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques,” Expert Systems with Applications, vol. 34, no. 1, pp. 313–327, 2008.
J. Burez and D. Van den Poel, “Handling class imbalance in customer churn prediction,” Expert Systems with Applications, vol. 36, no. 3, pp. 4626– 4636, 2009.
J. Hadden, A. Tiwari, R. Roy, and D. Ruta, “Computer assisted customer churn management: State-of-theart and future trends,” Computers & Operations Research, vol. 34, no. 10, pp. 2902–2917, 2007.
W. Buckinx and D. Van den Poel, "Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting," European Journal of Operational Research, vol. 164, no. 1, pp. 252–268, 2005.