Highlights
Introduction
Australia is formally defined by more than ”Statistical Area Level 2” (SA2) distinct geographical regions, designed to represent communities of between 3000-25000 people ”that interact together socially and economically”. In this assignment, we’ll focus on the 350+ SA2s within the Greater Sydney area, and you will be tasked with spatially integrating several datasets of various formats to calculate a score for how ”well-resourced” each region is.
Task 1
Import all datasets (clean if required) into your PostgreSQL server, using a well-defined data schema. These sources include:
Task 2
Compute a score for how ”well-resourced” each individual neighbourhood is according to the following formula, where S is the sigmoid function, z is the normalised z-score, and ’young people’ are defined as anyone aged 0-19. Feel free to only calculate scores for SA2 regions with a population of at least 100, and you are welcome to extend the scoring function however you deem necessary, so long as rational explanation is provided (e.g. other mathematical standardisation techniques, mitigating the impact of outliers, calculating some metrics per-capita or per-sqkm, etc).
Task 3
Extend the score by sourcing one additional dataset for each group member , and then incorporating all new datasets into your scoring function. For full marks, at least one dataset should be of spatial data, and at least one should be of a type not used so far in this assignment (e.g. JSON, XML, or collated via web scraping). As an example of subject matter for your additional datasets, they could focus on positive aspects for a region such as public facilities or other census statistics, or negative impacts such as crime rates or car accidents.
For either version of your scoring function (or both!), the following subtasks should also be achieved:
Task 4: Advanced Class Only
There are two additional components for DATA2901 students.
Create a new version of your score using ranks (r) rather than z-scores (z). As a theoretical example, rather than considering a particular SA2 to have 42 public transport stops, you would use the fact that this would rank it 14th of the This will require a new standardisation technique other than the simple sigmoid z-score summation of before, so additionally consider how to convert these values into a comparable, interpretable score. Compare this new score to your previous one from Task 2 - discuss their differences, and conclude which (if any) is more reliable.
Scoreadv = f (rretail, rhealth, rstops, rpolls, rschools)
2. Use a supervised or unsupervised machine learning technique to add further depth to your results. This task is intentionally broad to allow creative applications, but some examples could include:
This DATA2x01 - Data Science has been solved by our PhD Experts at My Uni Paper.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.