Highlights
Task
Introduction
This is an individual assignment and worth 15% of your final grade. It intends to evaluate your understanding and practical skills to deal with the first few steps in a typical data science process. In this assignment, you are provided three data files, i.e., “data1.csv”, “data2.csv” and “data3.csv”, which form the dataset created from a higher education institution related to students enrolled in different undergraduate degrees1.
The files “data1.csv” and “data2.csv” contain the same set of students but distinct sets of attributes for describing the student, where each student has its unique ID. The file “data3.csv” contains a different set of students with each student described by all attributes from both “data1.csv” and “data2.csv”.
You are asked to carry out data acquisition, preparation and exploration based on the three data sources according to the given instructions. For example, you need to develop and implement appropriate steps to load and merge the data from the three data files, perform data cleaning, make explorative data analysis, and report your findings. A discussion forum and further announcements for the assignment will be available in Canvas. You are responsible for checking Canvas on a regular basis to stay informed with regards to any updates about the assignment.
Task 1 – Data Acquisition and Preparation
At first, you need to acquire three data files “data1.csv”, “data2.csv”, and “data3.csv”, which are included in a single .zip file named “assignment1_data.zip”, under the menu “Assignments ? Assignment 1” in Canvas and put them into your working folder in the Jupyter Notebook. These data files are adapted from the “Student Drop out and Academic Success” data set in the UCI repository2, which contain many records of students with each record corresponding to a specification of the student in terms of its various attributes. The files “data1.csv” and “data2.csv” contain the same set of students but two distinct sets of attributes for describing a student. In contrast, the file “data3.csv” contains a different set of students, where each record of the student consists of all attributes from both “data1.csv” and “data2.csv”.
The set of 38 possible attributes for a student record and their corresponding value ranges are given below:
As a data scientist, you will be asked to analyse the data from the three data files. However, before doing that you know that you need to carry out some data preparation operations, e.g., merging and cleaning the data. In this task, you are asked to utilise the Python package “Pandas” to do the following:
1.1. Loading the data from the three data files into three Pandas DataFrames and checking whether theloaded data are equivalent to the data contained in the raw data files.
1.2. Merging the obtained three DataFrames into a single one that should contain all students from the three DataFrames, where each student has a unique ID and is described by the 38 attributes listed above.
1.3. Cleaning the data by using the knowledge you have learned.
Task 2 – Data Exploration
Now you have finished Task 1 and obtained a DataFrame composed of the cleaned data. You can start to explore your data by carrying out the following steps:
2.1. Choosing two columns with categorical and numerical values, respectively, and visualising each of them in an appropriate way. Note that you need to explore and identify potentially important columns (and can justify your choice) instead of making random choice.
2.2. Choosing three pairs of columns and exploring the relationship between the two columnsinvolved in each pair via appropriate descriptive statistics and visualisation tools. Your choice of the column pairs should intend to address some “plausible hypotheses” on the data.
2.3. Building a scatter matrix for all numerical columns.
Note: Graphs must contain appropriate titles, axis labels, etc. to make themselves self-explained. They should be clear enough for readers to read. You can research on appropriate categorical text label, as the data set does not have the text description of the numerical code.
Task 3 – Report
In this task, you are asked to write a report to elaborate your analyses and findings in Tasks 2 and 3. You should:
3.1. Create a sub-heading tilted “Task 1: Data Acquisition & Preparation” in your report under which you should:
3.2. Create a sub-heading named “Task 2: Data Exploration” in your report under which you needto:
This COS60008-IT Computer Science Assignment has been solved by our IT Computer Science Expert at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.
You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.