Assessed Course Learning Outcomes
CLO2: Analyse and apply strategies and technologies for effective data management that supports evidence-based decisions.
CLO3: Research organisational and societal problems using descriptive, predictive and prescriptive analytics models drawing on both internal and external data sources to generate insight, create value and support evidence-based decision making.
CLO5: Communicate effectively in a clear and concise written manner for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.
Task Rationale
Students will critically analyse the role of a data engineer and learn about the strategies and technologies that a data engineer can apply for effective data management to support the roles of data scientists and machine learning engineers. Students will research organisational and societal problems using descriptive, predictive and prescriptive models based on regression analysis and decision trees that draw on both internal and external data sources to generate insight, create value, and support evidence-based decision making.
Completing this Assessment 2 Report will enable students to develop themselves as well-informed individuals, critical and creative thinkers, and employable and enterprising professionals. They will be able to analyse and apply strategies and technologies for effective data management and research, build and evaluate descriptive, predictive, and prescriptive analytics models which can generate insight, create value, and support evidence-based decision making.
Task Instructions
Task 1 – Role of a Data Engineer (20 Marks – 1000 Words)
Assessed CLO2 and CLO5
Task 1.1: Discuss why a data engineer plays an essential role in designing, building, and maintaining the data infrastructure within an organization. (10 Marks – 500 Words)
Task 1.2: Discuss two key challenges of managing data in an organisation that a data engineer plays a critical role in addressing through possessing key technical skills and capabilities. (10 Marks – 500 Words)
Task 2 – Exploratory Data Analysis and Linear Regression Analysis (40 Marks)
Assessed CLO3 and CLO5
Ensure you use Altair AI Studio (previously RapidMiner Studio) for Task 2. Failure to do so may result in Task 2 not being marked, and zero marks will be awarded.
Carefully study the academic-salaries.csv data set and accompanying description of each variable. Each record in the dataset contains five independent variables that determine academic salaries, and one outcome dependent variable (salary).
Variables Description:
rank: Academic level of rank (AsstProf, AssocProf, Prof) – Nominal
discipline: A = theoretical departments, B = applied departments – Nominal
yrs.since.phd: Years since graduated with a PhD – Integer
yrs.service: Years of service at current University/College – Integer
sex: Female or Male – Nominal
salary: Nine-month salary in dollars (US) – Outcome dependent variable – Integer
Note: You should conduct desktop research to identify determinates/drivers of academic salary in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) and a subsequent Linear Regression Model predicting academic salaries using academic-salaries.csv data set.
Briefly discuss the key findings of your exploratory data analysis summarised in Table 2.1 and data preparation process you have undertaken and provide justification for variables that are most likely to predict academic salaries (salary) (20 marks 500 words).
Include all appropriate outputs such as Altair AI Studio Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the academic-salaries.csv data set in your Assignment 2 report. Note: export Processes and Graphs from Altair AI Studio using File/Print/Export Image option, include in Task 2 section or in Appendix 2 of Assessment 2 report.
Task 3 Predictive Analytics Case Study (40 Marks) (CLO3, CLO5) Ensure you use Altair AI Studio (formerly RapidMiner Studio) for Task 3. Failure to do so may result in Task 3 not being marked, and zero marks will be awarded. The goal of the Predictive Analytics Case Study is to predict whether a patient is likely to have a stroke or not (see Table 2 Data Dictionary for stroke-data.csv data set below). Table 2 Data dictionary for stroke-data.csv
|
Variable Name |
Description |
Data Type |
|
id |
unique identifier |
Numeric |
|
gender |
gender of the patient |
Categorical "Male", "Female" or "Other" |
|
age |
age of the patient |
Numeric |
|
hypertension |
patient has hypertension |
Binary 0 = No = the patient does not have hypertension 1 = Yes = the patient has hypertension |
|
heart_disease |
patient has heart disease |
Binary 0 = No = the patient does not have heart disease 1 = Yes = the patient has heart disease |
|
ever_married |
patient has ever married |
Categorical "No" or "Yes" |
|
work_type |
patient’s work type |
Categorical “children", "Govt_job", "Never_worked", "Private" or "Self-employed" |
|
Residence_type |
patient’s type of residence |
Categorical - "Rural" or "Urban" |
|
avg_glucose_level |
average glucose level in blood |
Numeric |
|
bmi |
body mass index |
Numeric |
|
smoking_status |
patient’s smoking status |
Categorical - "formerly smoked", "never smoked", "smokes" or "Unknown" means that information is unavailable for patient |
|
stroke |
patient had a stroke |
Binary 1 if the patient had a stroke or 0 if not |
In completing Task 3 you will apply business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Task 3 and three sub tasks.
Conduct an exploratory data analysis and data preparation of stroke-data.csv data set using Altair AI Studio to understand the characteristics of each variable and relationship of each variable to other variables. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the stroke-data.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 3.1 Results of Exploratory Data Analysis and Data Preparation.
Hint: Statistics Tab and Visualisations Tab in Altair AI Studio provide a lot of descriptive statistical information and useful charts like Barcharts, Scatterplots required for Task 3.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 3.1 which variables contribute most to predicting whether a patient is likely to have a stroke or not. You could also consider transforming some variables and creating new variables and converting the target/label variable (stroke) into a binominal variable to facilitate analysis in Tasks 3.2 and 3.3.
Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict whether a patient is likely to have a stroke or not (20 marks 500 words).
Build a Decision Tree model for predicting whether a patient is likely to have a stroke or not, on the stroke-data.csv data set using Altair AI Studio and a set of data mining operators in part determined by your exploratory data analysis in Task 3.1. Provide these outputs from Altair AI Studio (1) Final Decision Tree Model process, (2) Final Decision Tree diagram and (3) Decision tree rules. Briefly explain your final Decision Tree Model Process, and discuss the results of Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether a patient is likely to have a stroke or not based on key contributing variables and relevant supporting literature on interpretation of decision trees (10 marks 250 words).
You will need to validate your Final Decision Tree Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining process. Discuss the performance of the Final Decision Tree Model for predicting whether a patient is likely to have a stroke or not based on key results of the confusion matrix presented in Table 3.3 Model Performance Metrics.
Table 3.3 will summarise following model performance metrics – (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (10 marks 250 words).
Note 1: the important outputs from the data mining analyses conducted in Altair AI Studio for Task 3 must be included in your Report 3 to provide support for your conclusions reached regarding each analysis conducted for 3.1, 3.2 and 3.3. Note you can export important outputs from Altair AI Studio as jpg image files and include these screenshots in the relevant Task 3 parts of your Assessment 3 Report.
Note2: you will find the Altair AI Studio Tutorials useful references for the data mining process activities conducted in Task 3 in relation to the exploratory data analysis and data preparation, decision tree analysis and evaluation of the performance of the Final Decision Tree model. These concepts are covered in the Module Altair AI Studio Practicals and Altair AI Studio Tutorials contained within Altair AI Studio.
This assessment is designed to enable students to develop advanced knowledge and skills in data engineering, exploratory data analysis (EDA), predictive analytics, and decision tree modeling using Altair AI Studio. The assessment is structured into three key tasks aligned with specific Course Learning Outcomes (CLOs):
Task 1 – Role of a Data Engineer (20 Marks)
Focus:
Explain the essential role of a data engineer in designing, building, and maintaining data infrastructure.
Discuss two major challenges in managing data and how data engineers address them using their technical skills.
Task 2 – Exploratory Data Analysis and Linear Regression Analysis (40 Marks)
Focus:
Use the academic-salaries.csv dataset to conduct EDA and linear regression analysis to predict academic salaries.
Capture processes, summarize key findings, and justify variable selection.
Build a final linear regression model and analyze its performance based on statistical outputs (coefficients, p-values, etc.).
Task 3 – Predictive Analytics Case Study (40 Marks)
Focus:
Conduct exploratory data analysis and data preparation on the stroke-data.csv dataset.
Build a decision tree model to predict the likelihood of stroke occurrence.
Validate the model using cross-validation and performance metrics such as accuracy, sensitivity, specificity, and F1 score.
The overall objective is to develop critical thinking, analytical skills, and technical competency in data mining tools for evidence-based decision making, supporting the roles of data scientists and machine learning engineers.
The academic mentor began by carefully explaining the assessment structure and its learning objectives to the student. Emphasis was placed on how each task contributes to developing skills in data management, data analysis, and decision-making.
Approach:
The mentor advised the student to research the responsibilities of a data engineer using academic sources and industry reports. The student was guided to explore real-world examples to justify the data engineer's critical role in data infrastructure design, including data pipeline development and storage solutions.
For the challenges, the mentor encouraged a deep dive into common industry problems such as handling data quality issues and ensuring data security.
Outcome:
The student successfully outlined the data engineer's essential contributions and addressed two challenges (data quality and security), explaining how key technical skills (e.g., ETL processes, data warehousing technologies) solve these challenges.
EDA Process:
The mentor provided step-by-step guidance on using Altair AI Studio, beginning with loading the academic-salaries.csv dataset. The student was taught to explore variable distributions, check for missing values, and visualize relationships using bar charts and scatterplots.
Key Results & Variable Justification:
The mentor helped the student summarize statistics such as mean, mode, minimum, and maximum values, and identify potential predictors of academic salary, focusing on variables like rank, years since PhD, and years of service.
Linear Regression Model Process:
The student was guided to preprocess the data by converting nominal variables into numeric types, ensuring compatibility with the regression model. The mentor walked the student through setting up the regression process in Altair AI Studio, adding operators, and configuring parameters.
Outcome:
The student presented a comprehensive process flow screenshot, summarized the model outputs in a table, and explained key statistical results, such as the significance of coefficients and model fit (e.g., R⊃2; score). Relevant academic references supported the interpretation of findings.
Exploratory Data Analysis & Data Preparation:
Under the mentor’s supervision, the student explored the stroke-data.csv dataset using Altair’s Statistics and Visualizations tabs, performing correlation analysis and handling missing or invalid values. The student was encouraged to create new features and justify why variables like age, BMI, and hypertension are critical predictors.
Decision Tree Model Construction:
The mentor provided practical examples of using the Decision Tree operator in Altair AI Studio. The student learned to design the modeling process, generate the decision tree diagram, and extract decision rules.
Model Validation:
The student applied cross-validation and used performance metrics (accuracy, sensitivity, specificity, F1 score) to assess model effectiveness, guided by the mentor’s explanations on interpreting the confusion matrix.
Outcome:
The student captured and documented the process flow, model diagram, rules, and performance metrics. Key variables contributing to stroke prediction (such as age and hypertension) were discussed and justified with supporting literature.
By following the academic mentor’s structured approach, the student achieved the following outcomes:
Comprehensive Understanding of Data Engineer Role (CLO2, CLO5):
The student successfully explained the importance of data infrastructure and key challenges data engineers face, supported by technical examples.
Proficiency in Data Analysis (CLO3, CLO5):
The student conducted exploratory data analysis and regression modeling using Altair AI Studio, providing statistical insights and justified variable selection. The process was well-documented with clear visuals and explanations.
Development of Predictive Models (CLO3, CLO5):
Through the case study, the student applied the CRISP-DM methodology, built a decision tree model, validated its performance, and demonstrated an understanding of model interpretation and accuracy.
Effective Communication:
The final report was clear, concise, and structured, using appropriate tables, figures, and academic referencing as per assessment expectations.
This assessment enabled the student to develop critical analytical skills and technical competence required for effective data management and predictive analytics. The structured guidance provided by the academic mentor ensured the student could approach the assessment systematically, achieving all learning outcomes while producing a high-quality, evidence-based report.
Struggling to complete your academic assignment on time? Download our expertly prepared sample solution now to explore the structure, approach, and key insights needed for success. This sample is designed to serve as a valuable reference to help you understand the assignment requirements more clearly.
Important Reminder:
This sample solution is for reference purposes only. Submitting it as your own work may lead to serious academic consequences due to plagiarism.
Looking for a completely original, high-quality solution tailored to your exact requirements? Our team of professional academic writers is ready to deliver a custom-written, plagiarism-free assignment solution that meets your guidelines and ensures top academic performance.
Professionally researched and written from scratch
100% plagiarism-free with guaranteed originality
Aligned precisely to your assessment instructions
Delivered on time with academic referencing included
Supports your learning with well-organized, high-quality content
Take the stress out of assignments and boost your confidence today.
Download Sample Solution
Order Fresh Assignment
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.