Highlights
Overview
In this assignment, you will examine a data file and carry out the first steps of the data science process, including the cleaning and exploring of data. You will need to develop and implement appropriate steps, in IPython, to load a data file into memory, clean, process, and analyse it.
This assignment provides practical experience with the typical first steps of the data science process.
Assessment of Learning Outcomes
This assessment will measure your ability to:
• prepare the provided data for analysis
• explore the provided data and build a scatter matrix for all numerical columns
• write a report to explain and justify how you dealt with different kinds of errors
Learning Outcomes
This assessment is relevant to the following Course Learning Outcomes:
• CLO1 Use industry and evidence-based tools and approaches to transform raw data into a format suitable for a data science pipeline
• CLO3 Extract an interpretation and visualisation of data using exploratory data analysis in Python
• CLO4 Construct and document an experimental methodology for analysis of data
• CLO5 Select appropriate models, and apply simple machine learning tools and feature selection strategy for a defined data science problem
• CLO6 Apply professional standards to allow reproducibility of analysis
Assessment details
General Requirements This section contains information about the general requirements that your assignment must meet. Please read all requirements carefully before you start.
• You must do the analysis in IPython.
• Parts of this assignment will include a written report; this must be in PDF format.
• Please ensure that your submission follows the file naming rules specified in the tasks below. File names are case sensitive, i.e. if it is specified that the file name is gryphon, then that is exactly the file name you should submit; Gryphon, GRYPHON, griffin, and anything else but gryphon will be rejected.
Task 1: Data Preparation (10%)
Have a look at the file Automobile.csv, which is available in Canvas under the Assignments/Assignment 1 section of the course Canvas.
This Automobile Dataset consists of the specification of an auto in terms of various characteristics, its assigned insurance risk rating along with its normalised losses in use as compared to other cars. The original dataset was created/donated to UCI repository by Jeffrey C. Schlimmer
Below is a description of the attributes.
• Symboling: Insurance risk rating (+3 indicates high-risk auto; -3 indicates safe).
• Normalised-losses: Normalised losses in use as compared to other cars.
• Make: Make of the car.
• Fuel-type: Fuel type of the car.
• Aspiration: Aspiration of the car.
• Num-of-doors: Number of doors of the car.
• Body-style: Body of the car.
• Drive-wheels: Drive-type of the car.
• Engine-location: Location of the engine.
• Wheel-base: Measurement of wheel-base.
• Length: Length of the car.
• Width: Width of the car.
• Height: Height of the car.
• Curb-weight: The curb-weight of the car.
• Engine-type: The type of engine used in the car.
• Num-of-cylinders: Number of cylinders the engine has.
• Engine-size: The size of the engine.
• Fuel-system: The fuel system of the car.
• Bore: The bore of the cylinder.
• Stroke: Number of strokes.
• Compression-ratio: Compression ratio of the car.
• Horsepower: Engine power.
• Peak-rpm: Peak Revolutions Per Minute.
• City-mpg: Miles Per Gallon for city-drive.
• Highway-mpg: Miles Per Gallon for highway-drive.
• Price: price of the car.
Being a careful data scientist, you know that it is vital to carefully check, any available data before starting to analyse it. Your task is to prepare the provided data for analysis. You will begin by loading the CSV data from the file (using appropriate pandas functions) and checking whether the loaded data is equivalent to the data in the source CSV file. Then, you need to clean the data by using the knowledge we taught in the lectures. You need to deal with all the potential issues/errors in the data appropriately (such as typos, extra whitespaces, sanity checks for impossible values, and missing values, etc.).
Task 2: Data Exploration (10%)
Explore the provided data based on the following steps:
1. Choose 1 column with nominal values, 1 column with ordinal Values, and 1 column with numerical values. (Please try to explore the columns/attributes of potential importance to the analysis, not just a random choice). Then, create a visualisation for each of them.
2. Explore the relationships between columns. You need to choose three pairs of columns to focus on, and you need to generate one visualisation for each pair. Each pair of columns that you choose should address a plausible hypothesis for the data concerned.
3. Build a scatter matrix for all numerical columns.
Note, each visualisation (graph) should be complete and informative in itself and should be clear for readers to read and obtain information.
This COSC2791: IT Assignment has been solved by our IT Experts at onlineassignmentbank. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.