Improving Business-to-Business Sales Using Machine Learning

Download Solution Order New Solution

Assignment Task

This question focuses on the case study "Champo Carpets: Improving business-to-business sales using machine learning algorithms". You can purchase this case from the Harvard Business Publishing website.

This case study includes multiple sample sets that you can use for different tasks as requested in the questions below. For instance, you can utilize the "SAMPLE ONLY" set for constructing your predictive models and the "Data for Clustering” set for your k-means model. Nevertheless, if necessary, you should incorporate additional features from the raw data into your models. Also do not forget the pre-processing steps for your analysis. In particular, ensure

  • Data doesn't have noises or outliers. For example, K-means is sensitive to outliers and noisy data. Real data always has outliers. Transformation of data to normal distribution helps reduce the impact of outliers and noises.
  • Variables are on the same scale and have the same mean and variance, usually -1.0 to 1.0 (standardized data) or 0.0 to 1.0 (normalized data). Some of your predictive and clustering methods should be run on normalized data.
  • There is no collinearity (a high level of correlation between two variables).
  • The number of variables is not too large. As the number of variables increases, classification and clustering methods cannot guarantee convergence. For example, a distance-based similarity measure in clustering methods converges to a constant value between any given examples when the number of variables is too large. The more variables, the more challenging to find strict differences between instances.
  • All variables should be numerical in your clustering analysis (unless you are planning to use a specific package that handles categorical variables).

After cleaning data, feel free to take any suitable approaches to answer the following questions.

(a) With the help of data visualization, provide key insights using exploratory data analysis.

(b) What kind of analytics and machine learning algorithms (e.g. classification, regression, cluster- ing, recommender systems and etc) can be used by Champo Carpets to solve their problems, and in general for value creation? Justify your choices. Hint: This is just a conceptual question. You do not need to run any of these models for this question. Constructing models is done in the next question.

(c) Develop ML models (e.g. logistic regression, decision trees, random forest, neural network, and boosting) to help identify features contributing to conversion (or non-conversion) of samples sent to customers. Hint: For each model, discuss how you select features and tune different parameters. How do you evaluate the performance of each model? How do you select the best model(s). Run all your models on both balanced and imbalanced data and check the difference. Please note that your binary target is the "order conversion" variable in the sample data. You can obtain this variable from the information provided in the raw data.

(d) Discuss the data strategy for building customer segmentation using clustering. What are the benefits Champo Carpets can expect from clustering? Hint: Data strategy should clearly identify the data that should be used and how it should be used, including any feature engineering that may be performed before the model building.

(e) Discuss clustering algorithms that can be used for segmenting Champo Carpets's customers. Please justify your choices. Discuss what distance and similarity measures are suitable in this case (Again, this is a conceptual question where you need to discuss which clustering method seems proper for this application and why).

(f) Develop customer segmentation using k-means clustering. Discuss the optimal number of clus- ters., significant variables, and cluster characteristics. Notice that when the scree plot does not provide a clear choice of k for the number of clusters, you can look at other measures that we have discussed, such as the Silhouette measure. In many clustering applications, you need to consider more than one measure to obtain the number of desirable clusters.

(g) Write your own collaborative filtering function as a recommender system. Hint: Collaborative filtering technique is based on an aggregation of customer purchase history. For each customer, you can use various measures such as Pearson correlation, Euclidean distance, or cosine similarity to find the nearest neighbors. You can then use the nearest neighbors to recommend products. For example, suppose using cosine similarity, you find out that the closest customer to customer H-2 is customer T-5. Customer T-5 has purchased carpet type double black and gray color, which are not purchased by H-2. Hence these products can be recommended.

(h) What will be your final recommendation to Champo Carpets?

This Marketing has been solved by our PHD Experts at My Uni Paper.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.