COMP11069 - Data Mining and Visualization - IT Assignment Help

Download Solution Order New Solution
Assignment Task

 

1. Consider the diagrams in Figure 1. They respectively represent a histogram and boxplot of an individual variable from a dataset.
a) Use the figures to discuss the data distribution. Specifically describe which type of information (e.g., nature of the data, outlier, descriptive statistics can be inferred from each type of figure).
b) Critically discuss the appropriateness of using mean and standard deviation to respectively account for central tendency and spread of these data. The calculation of them for the dataset yields 0.62 (mean) and 2.82 (standard deviation). Suggest alternatives to account for this information.

2. Consider a binary supervised classification problem where feature pre-filtering has been performed using hypothesis testing. 4 out of 10 variables yield p- values lower than 0.05 in the hypothesis tests for the difference of the means. Besides, 3 of those 4 variables showed to be highly correlated (correlation coefficient greater than 0.9 pair-wise).
a) If the amount of data available suggests that 3D vectors would be appropriate for your classifier, discuss if a selection process over the pre-filtered data is likely to yield the best performing feature set.

b) On the basis of your previous answer, discuss how you would implement a pre-filtering stage using hypothesis testing and correlation analysis to pre-select m out of d possible features.

3. A data scientist is asked to train and evaluate a classifier to solve a multi-class classification problem. She is given a dataset composed of 100 samples (1/3 per class), with 10 variables. The data scientist decides to use 5 variables for classification. To perform feature selection, she splits the dataset into train (80%) and test sets (20%), and performs an exhaustive selection using the classifier performance obtained after classifying the training set.
a) Critically discuss the appropriateness of this approach. (4)
b) Suggest alternatives to the above methodology to overcome the problems you identified. If you found the approach had no flaws, describe how you would carry out ROC analysis for evaluation in such a problem.

 

This COMP11069 - IT Assignment has been solved by our IT experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.