Telecom Churn Prediction using Machine Learning

.


Introduction
In many countries, the telecommunications industry has grown to be important. The degree of competition increased as a result of both technological advancement and an increase in operators. Businesses are putting a lot of effort into surviving in this cutthroat market by utilizing various techniques. A vast amount of data is available as a result of the rapid expansion of the data transmission network and improvements in information technology. Customer churn is an important factor in the survival and growth of the telecom business because customers are the main source of profit. Here, businesses need to flourish, to survive in the cutthroat market by putting new plans and procedures in place for expanding their consumer base globally. The marketing team of the organization uses a variety of tactics to draw in new clients, encourage current clients to upgrade to new services offered by the same business, and ultimately maintain a clientele for an extended period of time.
Due to intense market competition, customers who are inclined to leave the telecom industry are a significant problem. On the other hand, if it is done early on, evaluating the consumers who are most likely to quit the business could offer a sizable extra revenue source. Numerous studies have demonstrated the effectiveness of machine learning algorithms in predicting churning and non-churning events by learning from historical corporate data. All consumer data collected over time is included in the data used in this analysis. This paper primarily focuses on various machine learning methods and algorithms for predicting churn in the telecom industries that are model-based and regression-based.

Understanding and Defining Churn
Customers in the telecom sector have access to a variety of service providers and can actively switch from one operator to another. The telecoms business has an average annual churn rate of 15 to 25 percent in this fiercely competitive market. Customer retention has now surpassed customer acquisition in importance due to the fact that it is 5-10 times more expensive to gain new customers than to keep existing ones. For many business owners, retaining highly profitable consumers is their top priority.

Prepaid and Postpaid
In the telecom sector, there are two primary payment methods: postpaid (where consumers pay a monthly or annual fee after using the services) and prepaid (where users pay/recharge with a set amount in advance and then use the services).
In the postpaid model, consumers who desire to migrate to another operator typically notify the current operator to cancel the services, which might be identified as a case of churn.
In the prepaid model, users who desire to migrate to another network can abruptly stop using the services, making it difficult to determine if a client has genuinely left or is just temporarily not using services.

Build Predictive Models
Telecom businesses must identify the consumers who are most likely to migrate in order to reduce customer churn. This paper examines customer-level data from a top telecom company, builds predictive models to find customers who are likely to leave, and identifies the key churn factors. Therefore, for prepaid clients, churn prediction is typically more important and complex. This paper is focused on the telecom industry in India.

Revenue-based Churn Customers
Customers who, for a certain amount of time, have not used any revenue generating services, such as mobile internet, outgoing calls, SMS, etc. The use of aggregate metrics is another option. For example, "A customers who have generated less than INR 3 per month in total/average/median revenue." This definition's critical problem is that some customers only get calls or SMS from their wage-earning equivalents. Customers in this case only consume the services, not create revenue.

Usage-based Churn
Customers that have not made any calls or used the internet in a while, either incoming or outgoing.

High-value Churn
About 80This paper builds its definition of high-value consumers on specific criteria and limits its attrition prediction to high-value clients. This definition may have the drawback that it may be too late to take corrective measures to retain the client once they have stopped utilizing the services for a period. Predicting churn may be ineffective if it is predicated on a "two-month zero usage" timeframe because by then the client will have already transferred to another operator.

Motivation and Goals
The objective of this paper is to provide a technique for telecom industry churn prediction. The expected outcome is an algorithm that pinpoints the customers who are most likely to switch operators. Despite the fact that numerous methods have been tested in recent years. The existing studies and methodology for predicting churn in the telecommunications industry still have a lot of scope for improvement.

Related Work
The paper [1], describes the steps involved in creating a decision support system using data mining. Qureshi team in [2] Lazarov team in [3] presented widely used data mining techniques for churn prediction. [4] suggests a new set of features to increase the recognition rates of potential churners.
[5] focuses on data mining strategies for reducing error ratio and customer churn. [6] analyses the many customer data classifications accessible in open datasets, as well as the performance measures and predictive models used in the domain in churn prediction in the telecom business. [7] established a customer classification and misclassification costbased research model for customer churn. Additionally, a telecom company's customer behavior data was analyzed using this model. Furthermore, information on churn prediction can be found in the studies included in the [8], [9], [10], [11].
In [12], Kiran et al presented a state-of-art review of various methods and researches involve in churn prediction, have done assessments on frequently used data mining procedures to categorize customer churn patterns in telecom industry. The contemporary literature in the expanse of predictive data mining techniques in customer churn comportment is reviewed and categorized in terms of method used and an argument on the future research directions is presented. One of the latest papers, [13] worked on classification models using techniques like Logistic Regression, Random Forest, and Lazy Learning for predicting customer churn. Another latest work on Churn prediction [14] explained how to recommend bank customers using AI and ANN. They supported prior research that highlights the possibility of client loyalty.

Proposed Work
The proposed method entails a thorough examination and analysis of telecom datasets. The likelihood of that proportion churning is exceedingly difficult to estimate. Our solution makes it easier to understand why customers want to churn, and it will quickly display the data as bar plots and pie charts. The telecommunications business will benefit from the study of foreseeing who is going to depart the network and identify who will do so. The effectiveness of prediction results is measured using the techniques Logistic Regression, Random Forest, Gradient Boosted Machine Tree, Extreme Gradient Boosting, and Decision Tree. The proposed approach explains the system's work flow and the procedures involved.

Logistic Regression
One of the most crucial statistical methods used in data mining to conduct data analysis is logistic regression. Logistic regression is a generalized class notion of linear regression. To determine the likelihood of a target variable, a supervised learning classification algorithm is needed. LR belongs to a class of regression analysis techniques that are generally employed to identify and quantify correlations among dataset features. Regression analysis should be performed using the correct model when the dependent variable is binary. A predictive analysis called logistic regression is used to explain the relationship between a group of independent binary variables and a dependent binary variable. For customer churn, logistic regression has been used to calculate the likelihood of churn as a function of the traits or characteristics of the customers. Finding the likelihood of client churn also uses logistic regression. It is based on an approach to studying how variables affect other variables that is mathematically oriented.

Decision Tree
A supervised learning method called a Decision Tree can be used to solve classification and regression problems, but it is typically preferred for classification. The given dataset's features are used to execute the test or make the decisions. It is a graphical depiction for obtaining all potential answers to an issue based on predetermined parameters. The CART algorithm, which stands for Classification and Regression Tree algorithm, is used to construct a tree.

Random Forest
To forecast if a consumer will terminate his subscription, Random Forest is utilized. Decision trees are used by Random Forest to classify whether a consumer would cancel their subscription. Many different decision trees make up the random forest. A decision tree identifies a particular class. The classifier for a specific client will be the class with the most votes. The data that decision trees are educated on can affect how they behave. The use of bagging prevents this. Taking a random sample from the dataset to train the decision trees is a technique known as bagging.

XGBoost
Extreme Gradient Boosting is referred to as XGBoost. The main reason for adopting XGBoost is because of how quickly it executes and how well the model's function. XGBoost employs ensemble earning techniques, which combine a number of different algorithms to obtain results from a single model. XGBoost provides optimal memory consumption while supporting distributed and parallel processing.

Data Preprocessing
The Telecom Churn dataset contains 226 columns Number of rows in the csv file = 99,999 It contains a lot of missing values data set available for this research is too large and hence needed to be sampled. In order to prevent the loss of valuable information and keep the final results more accurate. train and test sets are built by randomly splitting in the ratio of 70:30.

Elimination of Unique Values
Check for unique values and remove any columns that have them because they are not needed for analysis

Handling of Missing Values
In the missing value columns, there are three types: Boolean Columns: Must be filled by 1 or 0. Columns with Boolean values are night pack, fbuser Date Columns: Must be filled by dates. Columns with dates are dateoflastrecharge Numerical Columns: Must be filled by numbers. Remaining all missing value columns Further, derivation of features is one of the most crucial steps in data preprocessing since good features may frequently distinguish between good and bad models. Utilize your knowledge of business to derive characteristics that are thought to be key churn indicators.

Filtration of high-value customers
Churn prediction is only performed for high-value clients, as was already mentioned. As an example of a high-value client, consider those who recharged with a quantity greater than or equal to X, where X is the 70th percentile of the typical recharge amount during the first two months, which is the favorable phase. About 29.9k rows were obtained after excluding the high-value clients. Mark churners and take away churn phase characteristics now, based on the fourth month, tag the churned consumers as follows: Those who have neither placed or received any calls AND have not even once accessed mobile internet during the churn phase. The attributes used to tag churners are: totalicmou9 totalogmou9 vol2gmb9 vol3gmb9 after tagging churners, remove all the attributes corresponding to the churn phase.

Churn data Exploratory Analysis
The target label that has to be predicted must be determined before the analysis of the columns can continue. Predicting client attrition for the ninth (September) month is the objective given here. Therefore, a user is considered to have been churned if they don't make any calls or consume any data in the ninth month. Let's partition the columns and add churn columns to better execute exploratory data analysis. For analysis, it was determined which pair plots could distinguish between churned and non-churned customers: Understanding-1 From the above figure3, it is understood that from the aforementioned plots, there is no clear way of distinguishing. There is a significant amount of overlap.

Observation of Features
According to the histograms from the pair plots, the revenue generated by the churned clients in the eighth month is typically very low. The distribution plot and box plot both show the same thing. Box plots show that the range of values for the eighth month may be distinguished. However, this is not true of the other months. A significant amount of the values for 6,7 months overlaps. Understanding-2: From the above figure4, it is understood that Features of the dataset seems much more distinguishable. Especially those that contain local calls from the eighth month.

Evaluation Metrics
Performance of traditional classification algorithms is evaluated by the metric accuracy which is defined as the percentage of examples that are correctly classified. This is not suitable when dealing with imbalanced data sets as the minority class has a smaller number of samples. In fact, misclassifying all minority samples and correctly classifying majority class samples gives a very good accuracy. Performance of a classifier is calculated based on the confusion matrix.

Specificity
Sensitivity is the percentage of Positives Correctly Classified. It denotes the accuracy of the negative class True Negative Rate (TN Rate), TNR are other names of Specificity.

False Positive Rate
False Positive Rate is the percentage of negatives wrongly classified.

False Negative Rate
False Negative Rate is the percentage of positives wrongly classified. In this paper, Accuracy measure is used to know the performance of the model.

Model Building and Results
After thorough examination of all the important features of the dataset, the most prevalent finding from these is that the number of churned customers is large in the sixth and seventh months but declines for the eighth month. This is typical for the majority of churned consumers, but not all. There are significant links between the sixth and eighth months. Recharge amounts are easier to discern from the other features. The points overlap for the majority of the features. As a result, a distinct distinction was not possible. The majority of churned consumers have poor values in particular during the eighth month. The pair plots demonstrate that the majority of churned customers have values in either of the two columns that are close to 0. In the above figures 5,6, 7,8 results obtained from the models, confusion matrix and accuracy, and AUC curve are depicted. The table 2 shows the comparison of accuracy of machine learning models implemented in this paper and it is proved that XGBoost outperforms other models in prediction of Telecom Churn.

Conclusion
The issue of client turnover has gotten much worse as the telecommunications sector has continued to expand. In the telecommunications sector, customer retention is a crucial concern since it lowers customer churn by raising customer satisfaction. The classification of the network-leaving clients will be advantageous to the telecommunications industry. This problem can be solved using machine learning algorithms and predictive analytics. The study examined machine learning techniques and applied to use on a dataset. By modelling and testing, the assessment metrics for the Logistic Regression, Random Forest, Gradient Boosted Machine Tree, Extreme Gradient Boosting and Decision Tree models were improved. This work can be extended to various other datasets from the telecom department and implement methodologies to achieve better results in future work.