A Comparative Study of Supervised Methods with Application to Cold Call Insurance Prediction

Vannaroth Lim

Sana Bahari

Objective

To make prediction models to accurately predict whether a client will purchase car insurance or not

Description

●Dataset is from bank, which also provides car insurance service

●Bank has possible customer information to call and advertise car insurance

●Bank wants to save money and time by forecasting customer’s action

●Our objective is to make prediction models to identify potential customers

Methodology Flowchart

1.jpg

Methodology

  1. Business Understanding - identify why the particular type of business needs data-mining approach for decision making. In this project, there is substantial competition between insurance companies, hence it is vital for organizations to follow new methodology to have competitive advantage

  2. Data Understanding - It is crucial to gain insight about the nature of the data-set before modeling and processing. Understanding each variable leads to increased efficiency of taking the correct approach in data- mining.

  3. Data Preparation - Performs different techniques to transform raw data into proper and usable dataset for constructing data-mining models. This step identifies missing values, deleting or replacing such values, finding outliers, and correlation between variables to identify only independent predictors to be used in modeling.

  4. Modeling - After cleaning and arranging the dataset, proper data-mining models are used to make prediction. The purpose of the models is to suggest new patterns that have not been discovered before.

  5. Evaluation - Compare results from various models to decide the most accurate one. In addition, analyzing the variable importance based on their sensitivity. Furthermore, information fusion senility analysis provides the best approach to choose the most significant predictors among the variables.

Data Preprocessing

2.jpg
Picture1.jpg
3.jpg
4.jpg
5.jpg

Modeling

Model 1 - Decision Tree

Picture3.jpg
 

Model 2 - K-Nearest Neighbors (KNN)

KNN

Model 3 - Support Vector Machine (SVM)

SVM

Model 4 - Neural Networks

Neural Network

Result Summary

Picture7.jpg

KNN models performs the worst among the four models. It has 66.93% Accuracy, which is not desired in this classification model. Furthermore, the sensitivity 68.51% and specificity 36.68% values are not acceptable for this model as well. KNN performs a bad job in predicting false negative which has high impact on business performance. For instance, for 5th fold there are only 155 true negatives while 174 false negatives are observed. Therefore, the performance of KNN model for insurance call is not acceptable and thereby not recommended. This may be related to the high dimensionality of the problem.

Support Vector Machine (SVM) also performs poorly and only slightly better compared to KNN in terms of accuracy, but still is inferior to Decision Trees and Neural Network methods. For SVM the accuracy is 68.30%, which is almost identical to that of KNN, the sensitivity is 71.72% which is slightly better than that of KNN. However, it performs unacceptable results for specificity with 31.37%. For a similar comparison, for 5th fold there are 153 false negatives while 176 true negatives exist. Hence, just like KNN, SVM would not be a good choice for the application at hand.

The C5-Decision Tree shows 73.99% Accuracy. The Sensitivity level which is True positive is 73.82% and Specificity for this model which is true negative is 74.36%. This shows such method is acceptable for car insurance success calling application, with a room to improvement. As remark, it should be highlighted in the competition for which this dataset was used C5 algorithm was the winner. While, we also confirm the successful performance of such Data Mining method, we observe Neural Network outperforms C5 as explained below.

 The last and the best model is Neural Network, which performs best among all the four data-mining models. It obtains 79.3% accuracy, which is the highest in terms of prediction. It also provides good results for sensitivity and specificity with 80.6% and 77.1%, respectively. Thus, the best model that we choose is Neural network. To summarize, Table 3 below is given for all the methods used in this study.

Information Fusion-Based Sensitivity Analysis

 Information fusion is method which mixes information together and results in new evidence. This method is used to realize which variable plays an important role in making the prediction models. For this purpose, accuracy of all models should be standardized first and then the variable importance is calculated based on the standardized accuracy. For this project, IFBSA is performed and following results are obtained.The results indicate that “LastContactMonth” variable is the most important predictor in making prediction models.

 Information fusion is method which mixes information together and results in new evidence. This method is used to realize which variable plays an important role in making the prediction models. For this purpose, accuracy of all models should be standardized first and then the variable importance is calculated based on the standardized accuracy. For this project, IFBSA is performed and following results are obtained.

The results indicate that “LastContactMonth” variable is the most important predictor in making prediction models.

Managerial implications

 

There are several potential managerial implications from the result of this study. Managers of car insurance companies should be able to accurately predict of their insurance sale through cold calls. By using various variables that affect whether a cold call was successful, this study is able to determined and identify which variables is more important than others. The importance of this study is that it can greatly narrow a manager’s focus when it comes to allocating resources. Instead of having to spend time and money on different factors that could or could not impact the sale of car insurance, now managers can focus solely on few important factors instead. For example, based on the information fusion-based analysis result, it is evident that the top 4 factors that influence the outcome of cold call are previous attempts, communication, household insured or not, and last contact month. This make sense in a logical and business sense since the amount of attempts made previously can impact whether an individual will purchase the current car insurance policy or not. By persisting with a customer, he or she will be more encouraged to make the purchase due to familiarity and abundance of information provided. Furthermore, the form of communication is also an important factor. An individual will be more accepting of cold calls depends on what type of communication devices they are on. A mobile phone can suggest that the individual was not contacted at a time period where they are open to hear about new information.  Last but not least, the last time the individual was contacted regarding the car insurance has the highest influence on the result. This could be because the longer since a customer was contacted, the more likely they would have forgotten about the information.

Managers should utilize this information and capitalize on it. By investing in the right factors, it can help their companies gain new customer quickly and effectively. Furthermore, it is recommended that managers continue to gather new data cases and add in more var

Previous
Previous

Lazy Price - Natural Language Processing