Data Mining Algorithms & Techniques - Classification - IT Assignment Help

Download Solution Order New Solution
Assignment Task:

Task:

Objective of the Assignment
To successfully apply a set of data mining skills imparted through lectures and lab session to a previously unseen dataset using Weka to achieve knowledge discovery and producing a written technical paper format report.
Deliverables
A single zip called FirstName_LastName_StudentNumber._ass1.zip to be uploaded to

Moodle containing the following files
• A report file based on this file. The only accepted format is pdf
• A set of supporting files including but not limited to the following, which should be clearly referenced from your documentation.
o Original dataset
o dataset.arff
o trainigSet.arff
o testingSet.arff
o j48tree.arff
o kmeans.arff
o dbscan.arff
Choosing Your Dataset
1. Your dataset should concern a real-world problem that lends itself to easy understanding by your classmates.
2. It should ideally have >1000 tuples/rows/instances.
3. It should ideally have >=6 attributes
4. It should have attributes which can serve as labels so that the accuracy of your data analysis can be determined.

 

Part 1 – Classification
1. Description of your dataset – 10%

• Title: Brief title to capture the data and objective of your assignment
• Objective: What you want to uncover by examining the data in this assignment. You can update this as you progress through your project revising it and making it more specific.
• Data description: A description of the data in detail under the following subheadings:
o The problem domain
o The source of the data
o The agencies working with the data
o The intended use of the data
o The attribute types of the data
Please include screen shots (with one or two sentences of summary) of the dataset and also of the data summaries and graphs that are available through Weka.

2. Preprocessing – 15%

In this section you should
1. Identify the set of preprocessing techniques that can be applied to your data and clearly indicate which techniques are appropriate and which ones are not.
2. Provide evidence through screenshot of the effects of preprocessing the data along with a short explanation.
3. Generate a file called dataset.arff which is the outcome of the preprocessing.
3. Divide your dataset into training and test set – 5% Divide the dataset into training and testing data sets (9:1). Additional resources links
are in moodle. The files generated as part of this process should be saved and submitted as the following

- trainingSet.arff and
- testingSet.arff

Screen shots of these files should be included.
4. Experiment - Classification: J48 Tree – 15%

For each of the following classification techniques

1. Train your model using trainingSet.arff
2. Test your model using testingSet.arff
3. Write a few paragraphs analyzing the results. Be sure to vary parameters at least 3 times in each case. Support this analysis with screenshots of the following
a. The model or a visualization of the model
b. The results of the model
c. Any additional output of the model including but not limited to
i. Rules
ii. Confidence Values
iii. Confusion Matrixes
iv. Etc.
d. Simple references to the notes or URL links to online resources complete with a sentence or two of explanation.
4. You may iterate over the pre-process step.

 

Experiments
For each of the following 2 clustering techniques
1. Use dataset.aff as input. If adaptions are necessary clearly indicate them.
2. Write one or two paragraphs analyzing the results of the clustering. Be sure to vary parameters at least 3 times in each case. Support this analysis with screenshots of the following
a. The clusters and/or a visualization of the clusters
b. The results of the clusters
c. Any additional output of the clustering process
d. Simple references to the notes or URL links to online resources complete with a sentence or two of explanation.
e. Evaluate the clusters using the “classes to clusters evaluation”. A worked example may be found here

http://www.cs.ccsu.edu/~markov/ccsu_courses/datamining- ex3.html

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.