SIT720: Perform K-Means Clustering on the Complete Dataset - The Purity of Clusters Formed by the Number of Principal Components - Engineering Assignment Help

Download Solution Order New Solution
Assignment Task :

Learning Outcomes
This assessment assesses the following Unit Learning Outcomes:

1 - Perform unsupervised learning of data such as clustering and dimensionality reduction.

Purpose
This assessment task is for student to apply skills for data clustering and dimensionality reduction. Students will be required to demonstrate ability in data representation, and competency in applying suitable clustering/dimensionality reduction techniques in a real-world scenario.

 

Part-1: Clustering  
Dataset and the ipynb files are provided in zip format (available in ‘Assessment 2 – T2 2020 - Dataset’ link) in the assessment section (Assessment->Assessment 2) of the unit site.
1. Download the attached clustering.csv file. Read the file and separate the class and feature matrix.  
2. Determine the number of clusters from the dataset. Is this same as the actual number of classes in the dataset? 
3. Perform K-Means clustering on the complete dataset and report purity score. 
4. There are several distance metrics for K-Means such as Euclidean, Squared Euclidean, Manhattan, Chebyshev, Minkowski.

Your job is to compare the purity score of k-means clustering for different distance metrics. 
Select the best distance metric and explain why this distance metric is best for the given dataset.

 

Part-2: (Dimensionality Reduction using PCA/SVD)
1.
For the dataset (clustering.csv), perform PCA.

  •  plot the captured variance with respect to increasing latent dimensionality.

What is the minimum dimension that captures:

  •  at least 89% variance? 
  •  at least 99% variance? 

2. Determine the purity of clusters formed by the number of principal components which captured 89% and 99% variances respectively. Plot a line graph of the purity scores against the captured variances. Discuss your findings. 

3. Let's assume you have two datasets one is linear and another is curved structural data.

  • Can we apply PCA on these datasets? Justify your answer.

 

This Engineering Assignment has been solved by our Engineering  Experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.