Highlights
A telephone company is interested in determining which customer characteristics are useful for predicting churn, customers who will leave their service. Your task is to uncover patterns in the customer data that will help the company identify which types of customers are most (least) likely to churn.
Some tasks might have already been covered earlier. However, you may not be able to complete this project in a day or two. I recommend that you "move in" with this project and get really into it as early as you can. I encourage you to go beyond the project, but only after you have covered all the required stuff.
Data Preparation and Exploration
1. Perform data preparation on the data set, if needed. Give evidence that there are no problems with data quality or missing data. Which variables show anomalous behavior? How shall we deal with this? Which field will yield no usable statistical or graphical information, as a surrogate for the ID field?
2. Examine the variables graphically.
3. Examine the variables statistically.
(i) z-scores, or
(ii) min-max normalization [ (value-min)/range].
4. Relationships between variables.
5. Data Manipulation
6. With a view to uncovering customer churn patterns, investigate how each relevant variable is associated with Churn.
7. Find a pair of numeric variables which are interesting with respect to churn. That is, for a pair of variables, construct a scatter plot with a churn overlay. If things look uniform, then this is not particularly interesting. We are looking for differences within the scatter plot (churn vs. non-churn), which can help us understand the relationship between the two variables with churn. Now, if there seems to be a horizontal or vertical differentiation, then this is not interesting, as the churn behavior is altering only along one of the axes. We want to find churn behavior changing simultaneously along both axes.
Model Building
8. Which variables are you including in your models to predict Churn? Choose carefully to balance accuracy, generality, and interpretability.
Important: Provide a table of ALL the variables in the original data set, ranked by your judgment of their importance in predicting churn based on your work so far (show most important to least important). Also, provide a brief justification for either including or discarding the variable for your working models.
9. Develop a model of your choice (e.g., K-NN or Decision Tree) for predicting Churn. Use cross validation to measure the performance of the model. Explain the measures of validation in the confusion matrix.
10. Report the findings as such in your Executive Summary, along with supporting evidence, and a list of Recommendations (or reflections) for the company executives.
This IT and Computer Science has been solved by our PhD Experts at My Uni Paper. Our Assignment Writing Experts are efficient in providing a fresh solution to this question. We are serving more than 10000+ Students in Australia, the UK, and the US by helping them to score HD in their academics. Our Experts are well-trained to follow all marking rubrics and referencing styles.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.