CS6008 : Business Intelligence Assessment 2

Download Solution Order New Solution

Task Overview

Assessed Course Learning Outcomes

  • CLO2: Analyse and apply strategies and technologies for effective data management that supports evidence-based decisions.

  • CLO3: Research organisational and societal problems using descriptive, predictive and prescriptive analytics models drawing on both internal and external data sources to generate insight, create value and support evidence-based decision making.

  • CLO5: Communicate effectively in a clear and concise written manner for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.

Task Rationale

Students will critically analyse the role of a data engineer and learn about the strategies and technologies that a data engineer can apply for effective data management to support the roles of data scientists and machine learning engineers. Students will research organisational and societal problems using descriptive, predictive and prescriptive models based on regression analysis and decision trees that draw on both internal and external data sources to generate insight, create value, and support evidence-based decision making.

Completing this Assessment 2 Report will enable students to develop themselves as well-informed individuals, critical and creative thinkers, and employable and enterprising professionals. They will be able to analyse and apply strategies and technologies for effective data management and research, build and evaluate descriptive, predictive, and prescriptive analytics models which can generate insight, create value, and support evidence-based decision making.

Task Instructions

Task 1 – Role of a Data Engineer (20 Marks – 1000 Words)
Assessed CLO2 and CLO5

  • Task 1.1: Discuss why a data engineer plays an essential role in designing, building, and maintaining the data infrastructure within an organization. (10 Marks – 500 Words)

  • Task 1.2: Discuss two key challenges of managing data in an organisation that a data engineer plays a critical role in addressing through possessing key technical skills and capabilities. (10 Marks – 500 Words)

Task 2 – Exploratory Data Analysis and Linear Regression Analysis (40 Marks)
Assessed CLO3 and CLO5

Ensure you use Altair AI Studio (previously RapidMiner Studio) for Task 2. Failure to do so may result in Task 2 not being marked, and zero marks will be awarded.

Carefully study the academic-salaries.csv data set and accompanying description of each variable. Each record in the dataset contains five independent variables that determine academic salaries, and one outcome dependent variable (salary).

Variables Description:

  • rank: Academic level of rank (AsstProf, AssocProf, Prof) – Nominal

  • discipline: A = theoretical departments, B = applied departments – Nominal

  • yrs.since.phd: Years since graduated with a PhD – Integer

  • yrs.service: Years of service at current University/College – Integer

  • sex: Female or Male – Nominal

  • salary: Nine-month salary in dollars (US) – Outcome dependent variable – Integer

Note: You should conduct desktop research to identify determinates/drivers of academic salary in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) and a subsequent Linear Regression Model predicting academic salaries using academic-salaries.csv data set.

  • Conduct and report on exploratory data analysis (EDA) of the academic-salaries.csv data set using the Altair AI (formerly RapidMiner) Studio data mining tool. Note this will require the use of number of data mining operators. Provide following for Task 2.1:
  • a screen capture of your final EDA process, briefly describe your EDA process
  • summarise key results of your exploratory data analysis in Table 1 Results of Exploratory Data Analysis for academic-salaries.csv. Table 2.1 should include key characteristics of each variable.

Briefly discuss the key findings of your exploratory data analysis summarised in Table 2.1 and data preparation process you have undertaken and provide justification for variables that are most likely to predict academic salaries (salary) (20 marks 500 words).

  • Build and report on final Linear Regression model for predicting academic salaries (salary) using Altair AI (formerly RapidMiner) Studio data mining process and appropriate set of data mining operators for the academic-salaries.csv data set as determined by your exploratory data analysis in Task 2.1. Note this will require use of number of data mining operators and all variables need to be type numeric to run a linear regression model. Provide the following for Task 2.2:
  • Screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process
  • Table 2 Results of Final Linear Regression Model for academic-salaries.csv data set.
  • Discuss the results and performance of Final Linear Regression Model for academic-salaries.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting academic salaries (salary) and relevant supporting literature on interpretation of a Linear Regression Model. (20 marks 500 words)

Include all appropriate outputs such as Altair AI Studio Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the academic-salaries.csv data set in your Assignment 2 report. Note: export Processes and Graphs from Altair AI Studio using File/Print/Export Image option, include in Task 2 section or in Appendix 2 of Assessment 2 report.

Task 3 Predictive Analytics Case Study (40 Marks) (CLO3, CLO5) Ensure you use Altair AI Studio (formerly RapidMiner Studio) for Task 3. Failure to do so may result in Task 3 not being marked, and zero marks will be awarded. The goal of the Predictive Analytics Case Study is to predict whether a patient is likely to have a stroke or not (see Table 2 Data Dictionary for stroke-data.csv data set below). Table 2 Data dictionary for stroke-data.csv

Variable Name

Description

Data Type

id

unique identifier

Numeric

gender

gender of the patient

Categorical "Male", "Female" or "Other"

age

age of the patient

Numeric

hypertension

patient has hypertension

Binary

0 = No = the patient does not have hypertension

1 = Yes = the patient has hypertension

heart_disease

patient has heart disease

Binary

0 = No = the patient does not have heart disease

1 = Yes = the patient has heart disease

ever_married

patient has ever married

Categorical "No" or "Yes"

work_type

patient’s work type

Categorical “children", "Govt_job", "Never_worked", "Private"

or "Self-employed"

Residence_type

patient’s type of residence

Categorical - "Rural" or "Urban"

avg_glucose_level

average glucose level in blood

Numeric

bmi

body mass index

Numeric

smoking_status

patient’s smoking status

Categorical - "formerly smoked", "never smoked", "smokes" or

"Unknown" means that information is unavailable for patient

stroke

patient had a stroke

Binary 1 if the patient had a stroke or 0 if not

 

In completing Task 3 you will apply business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Task 3 and three sub tasks.

3.1 Exploratory data analysis and data preparation

Conduct an exploratory data analysis and data preparation of stroke-data.csv data set using Altair AI Studio to understand the characteristics of each variable and relationship of each variable to other variables. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the stroke-data.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 3.1 Results of Exploratory Data Analysis and Data Preparation.

Hint: Statistics Tab and Visualisations Tab in Altair AI Studio provide a lot of descriptive statistical information and useful charts like Barcharts, Scatterplots required for Task 3.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 3.1 which variables contribute most to predicting whether a patient is likely to have a stroke or not. You could also consider transforming some variables and creating new variables and converting the target/label variable (stroke) into a binominal variable to facilitate analysis in Tasks 3.2 and 3.3.

Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict whether a patient is likely to have a stroke or not (20 marks 500 words).

3.2 Decision Tree Model

Build a Decision Tree model for predicting whether a patient is likely to have a stroke or not, on the stroke-data.csv data set using Altair AI Studio and a set of data mining operators in part determined by your exploratory data analysis in Task 3.1. Provide these outputs from Altair AI Studio (1) Final Decision Tree Model process, (2) Final Decision Tree diagram and (3) Decision tree rules. Briefly explain your final Decision Tree Model Process, and discuss the results of Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether a patient is likely to have a stroke or not based on key contributing variables and relevant supporting literature on interpretation of decision trees (10 marks 250 words).

3.3 Final Decision Tree Model Validation and Performance

You will need to validate your Final Decision Tree Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining process. Discuss the performance of the Final Decision Tree Model for predicting whether a patient is likely to have a stroke or not based on key results of the confusion matrix presented in Table 3.3 Model Performance Metrics.

Table 3.3 will summarise following model performance metrics – (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (10 marks 250 words).

Note 1: the important outputs from the data mining analyses conducted in Altair AI Studio for Task 3 must be included in your Report 3 to provide support for your conclusions reached regarding each analysis conducted for 3.1, 3.2 and 3.3. Note you can export important outputs from Altair AI Studio as jpg image files and include these screenshots in the relevant Task 3 parts of your Assessment 3 Report.

Note2: you will find the Altair AI Studio Tutorials useful references for the data mining process activities conducted in Task 3 in relation to the exploratory data analysis and data preparation, decision tree analysis and evaluation of the performance of the Final Decision Tree model. These concepts are covered in the Module Altair AI Studio Practicals and Altair AI Studio Tutorials contained within Altair AI Studio.

Summary of Assessment Requirements

This assessment is designed to enable students to develop advanced knowledge and skills in data engineering, exploratory data analysis (EDA), predictive analytics, and decision tree modeling using Altair AI Studio. The assessment is structured into three key tasks aligned with specific Course Learning Outcomes (CLOs):

  1. Task 1 – Role of a Data Engineer (20 Marks)
    Focus:

    • Explain the essential role of a data engineer in designing, building, and maintaining data infrastructure.

    • Discuss two major challenges in managing data and how data engineers address them using their technical skills.

  2. Task 2 – Exploratory Data Analysis and Linear Regression Analysis (40 Marks)
    Focus:

    • Use the academic-salaries.csv dataset to conduct EDA and linear regression analysis to predict academic salaries.

    • Capture processes, summarize key findings, and justify variable selection.

    • Build a final linear regression model and analyze its performance based on statistical outputs (coefficients, p-values, etc.).

  3. Task 3 – Predictive Analytics Case Study (40 Marks)
    Focus:

    • Conduct exploratory data analysis and data preparation on the stroke-data.csv dataset.

    • Build a decision tree model to predict the likelihood of stroke occurrence.

    • Validate the model using cross-validation and performance metrics such as accuracy, sensitivity, specificity, and F1 score.

The overall objective is to develop critical thinking, analytical skills, and technical competency in data mining tools for evidence-based decision making, supporting the roles of data scientists and machine learning engineers.

Step-by-Step Process Guided by the Academic Mentor

Step 1: Understanding Assessment Requirements

The academic mentor began by carefully explaining the assessment structure and its learning objectives to the student. Emphasis was placed on how each task contributes to developing skills in data management, data analysis, and decision-making.

Step 2: Task 1 – Role of a Data Engineer

  • Approach:
    The mentor advised the student to research the responsibilities of a data engineer using academic sources and industry reports. The student was guided to explore real-world examples to justify the data engineer's critical role in data infrastructure design, including data pipeline development and storage solutions.
    For the challenges, the mentor encouraged a deep dive into common industry problems such as handling data quality issues and ensuring data security.

  • Outcome:
    The student successfully outlined the data engineer's essential contributions and addressed two challenges (data quality and security), explaining how key technical skills (e.g., ETL processes, data warehousing technologies) solve these challenges.

Step 3: Task 2 – Exploratory Data Analysis (EDA) and Linear Regression

  • EDA Process:
    The mentor provided step-by-step guidance on using Altair AI Studio, beginning with loading the academic-salaries.csv dataset. The student was taught to explore variable distributions, check for missing values, and visualize relationships using bar charts and scatterplots.

  • Key Results & Variable Justification:
    The mentor helped the student summarize statistics such as mean, mode, minimum, and maximum values, and identify potential predictors of academic salary, focusing on variables like rank, years since PhD, and years of service.

  • Linear Regression Model Process:
    The student was guided to preprocess the data by converting nominal variables into numeric types, ensuring compatibility with the regression model. The mentor walked the student through setting up the regression process in Altair AI Studio, adding operators, and configuring parameters.

  • Outcome:
    The student presented a comprehensive process flow screenshot, summarized the model outputs in a table, and explained key statistical results, such as the significance of coefficients and model fit (e.g., R⊃2; score). Relevant academic references supported the interpretation of findings.

Step 4: Task 3 – Predictive Analytics Case Study

  • Exploratory Data Analysis & Data Preparation:
    Under the mentor’s supervision, the student explored the stroke-data.csv dataset using Altair’s Statistics and Visualizations tabs, performing correlation analysis and handling missing or invalid values. The student was encouraged to create new features and justify why variables like age, BMI, and hypertension are critical predictors.

  • Decision Tree Model Construction:
    The mentor provided practical examples of using the Decision Tree operator in Altair AI Studio. The student learned to design the modeling process, generate the decision tree diagram, and extract decision rules.

  • Model Validation:
    The student applied cross-validation and used performance metrics (accuracy, sensitivity, specificity, F1 score) to assess model effectiveness, guided by the mentor’s explanations on interpreting the confusion matrix.

  • Outcome:
    The student captured and documented the process flow, model diagram, rules, and performance metrics. Key variables contributing to stroke prediction (such as age and hypertension) were discussed and justified with supporting literature.

Final Outcome and Learning Objectives Achieved

By following the academic mentor’s structured approach, the student achieved the following outcomes:

  • Comprehensive Understanding of Data Engineer Role (CLO2, CLO5):
    The student successfully explained the importance of data infrastructure and key challenges data engineers face, supported by technical examples.

  • Proficiency in Data Analysis (CLO3, CLO5):
    The student conducted exploratory data analysis and regression modeling using Altair AI Studio, providing statistical insights and justified variable selection. The process was well-documented with clear visuals and explanations.

  • Development of Predictive Models (CLO3, CLO5):
    Through the case study, the student applied the CRISP-DM methodology, built a decision tree model, validated its performance, and demonstrated an understanding of model interpretation and accuracy.

  • Effective Communication:
    The final report was clear, concise, and structured, using appropriate tables, figures, and academic referencing as per assessment expectations.

Conclusion

This assessment enabled the student to develop critical analytical skills and technical competence required for effective data management and predictive analytics. The structured guidance provided by the academic mentor ensured the student could approach the assessment systematically, achieving all learning outcomes while producing a high-quality, evidence-based report.

Unlock Your Academic Success with Expert Assignment Help

Struggling to complete your academic assignment on time? Download our expertly prepared sample solution now to explore the structure, approach, and key insights needed for success. This sample is designed to serve as a valuable reference to help you understand the assignment requirements more clearly.

Important Reminder:
This sample solution is for reference purposes only. Submitting it as your own work may lead to serious academic consequences due to plagiarism.

Looking for a completely original, high-quality solution tailored to your exact requirements? Our team of professional academic writers is ready to deliver a custom-written, plagiarism-free assignment solution that meets your guidelines and ensures top academic performance.

Why Order a Fresh Custom Solution?

  • Professionally researched and written from scratch

  • 100% plagiarism-free with guaranteed originality

  • Aligned precisely to your assessment instructions

  • Delivered on time with academic referencing included

  • Supports your learning with well-organized, high-quality content

Take the stress out of assignments and boost your confidence today.

Download Sample Solution

Order Fresh Assignment

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.