Saharan Africa, Europe and Central Asia,Data - IT Assignment Help

Download Solution Order New Solution
Assignment Task

 

Task

 For all programming, please use Python; tasks for data processing should use PySpark. Please answer the questions in a single Python file, which you can return to us.Setup:Visit the World Bank databank’s In the top of the panel on the right, click on “Add Country”, “Add Series” and “Add Time” and, for each of those, click “Select All”, exit out of that window, and click on “Apply Changes”. You should have 242 countries, 166 economic series and 27 years.2. Programmatically or manually, download the table as a CSV file. Additionally, download the metadata for Country, Series, Country-Series and Series-Time.Questions:Using PySpark, please answer the following questions about the data set (a Python function for each question that returns the answer would be best):1. For low-income countries (as defined in the Country metadata), which are the top 5 in terms of having experienced the largest use of electricity access since 2010?2. For each country region (e.g., Sub-Saharan Africa, Europe and Central Asia, etc.), compute the population-weighted percentage of female employment.3. How many new internet users were there between 2000 and 2015?4. Which are the top 10 countries that increased their GDP’s reliance on agriculture between 2005 and 2012?5. For all countries and all years in the data set, which experienced the 20 largest contractions from year-to-year in foreign direct investment net inflows (FDINI) as % of GDP? The answer should be a list (Country, Year, FDINI change) triples.6. Same as question 5, but rather than percentage changes, use the GDP per capita in 2010 constant US$ and the total population in order to rank the decrease in absolute FDINI (i.e., return a list of the 20 largest 2010 constant US$ contractions in FDINI).7. Describe a scheme to store and persist the series data that is designed/optimized for the queries in the previous questions, assuming there is a much, much greater amount of data. Which fields would you select as your primary/partition key?8. Let’s define ‘knowledge date’ as the date that a particular value was available to be queried from your database system, and assume each annual data series is assembled by the 7th day of the following year (i.e., the knowledge date for any value in our dataset for 2003 is compiled on 01/07/2004). However, also assume any data point could be revised or corrected at any time; i.e., effectively resulting in multiple knowledge dates (one for each value) for the same data date for the corrected item.Make the necessary changes to the table to handle this bi-temporality, and introduce two simulated corrections to the life expectancy at birth (Females) in the Czech Republic for 2003:a. Changing it from the original value of 78.5 to 78.7, with an effective correction date of 06/15/2005.b. Changing it again to 78.9, with an effective correction date of 07/11/2005.9. Think how you’d write a generic query to obtain the same results with a given as-of date (i.e., the query should return the same results you would have obtained if you walked back in time to that as-of date and ran it with the most recent data available then). For example, if the as-of date was 03/24/2004, a query for the female life expectancy of European countries should return the 78.5 value for CZK. Of course, the query cannot know whether a specific value has been corrected.Use this to return the 5-year population-weighted moving average of female life expectancy in countries that belong to the Europe and Central Asia region between 2000 and 2010, assuming the following as-of dates: 03/24/2004, 06/20/2005, and 06/20/2015.9 questions
 


This IT Assignment has been solved by our IT Experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
    
Be it a used or new solution, the quality of the work submitted by our assignment Experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.