Highlights
Task
Learning Outcomes to be assessed
1 Critique data mining tools and methodologies used to interrogate data and yield actionable insights.
2 Differentiate between, and analyse, structured, semi-structured, unstructured datasets.
3 Critically analyse a problem domain and select data mining tools and techniques to implement a business intelligence solution.
4 Formulate a requirement analysis of an organization’s goals and propose data mining solutions that will meet those requirements.
5 Demonstrate and critically evaluate a proposed data mining solution to meet a set of requirements.
6 Formulate, use and evaluate advanced analytical models for organizational problem solving.
Part 1: Supervised Data Mining
Dataset
The dataset Car_Reviews.csv contains 10,678 customer reviews of various car models manufactured by Hyundai, Kia, Ford, and Toyota over several years. It also contains labels indicating whether customers recommended purchase to others or not.
Task
Construct a classification model in Python that can automatically label a customer review to indicate whether the customer would recommend the car to others or not. You should implement Support Vector and Naïve Bayes classification algorithms and select the best performing model for deployment on new customer reviews.
In addition to providing the python code file, you are required to provide critical analysis of the following points in context of the given task (in a pdf report):
1. Data Cleaning – Explain all the steps taken to clean the data.
2. Creation of Document-Term or TF-IDF Matrix – Discuss the tuning of the hyperparameters - ‘ngram_range’ and ‘min_df’.
3. Model Creation, Evaluation and Selection – Discuss why the two algorithms are suitable for text classification. State the model evaluation/ selection criteria.
4. Model Deployment – Discuss possible limitations of the selected model when deployed on new customer reviews.
Part 2: Unsupervised Data Mining
Task
Using the TF-IDF Matrix created in Part1, visualize the customer reviews contained in Car_Reviews.csv on an interactive scatter plot. On hovering over any data point in the plot, one should be able to read the corresponding review in a hover box. [Note: You can increase the width and height of the plot within the code to increase the size of hover box, if needed] In addition to providing the python code file, you are required to provide critical analysis of the following points in context of the given task (in a pdf report):
1. Visualizing clusters- Briefly discuss the approach used to visualize data on the scatter plot. Can you identify two clusters of customer reviews based on positive and negative feedback?
2. Visualizing sub-clusters – Can you identify any sub-clusters within the two clusters as identified in the previous point?
This B9BA103-Computer Science Assignment has been solved by our Computer Science Expert at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.
You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.