Highlights
Exam Conditions:
This is anopen book exam.
Instructions to Students:
• Write your answers after each question. Please do not change the order of the questions.
• Your submission is to be made as a single Microsoft Word document – no other format is acceptable.
• Answer all questions and submit your Microsoft Word document on Blackboard in the Assessment area (Task 2).
• Please rename this file to “ICT707_T2 _FirstName_LastName_ID.docx” for submission.
Submission Declaration:
The submission will be checked by SafeAssign. By submitting this assessment item, youdeclare that your submission is your own work and is in accordance with the University’s Student Academic Integrity Policy.
Section A – Short Answer Questions (total 20 marks with 5 marks each)
1. What are the 4 categories of dataanalytics? In your own words discuss the difference between them. Give one example for each category.
Answer:
2. In your own wordsexplain the features of RDD, and compare it with Python List.
Answer:
3. The Hadoop Distributed File System (HDFS) provides robust and reliable file management for the Hadoop ecosystem. Considering building an online text editing platform which allows users to edit text articles with HTML-style formatting, do you recommend using HDFS for storing users’ files? Why?
Answer:
4. Amdahl’s Law sets the upper limit the speed improvement that can be expected by doing parts of an algorithm in parallel.
a) Assume 60% of the program can be parallelised and the speedup for this part is 10. What is the speedup for the whole program? Please explain your calculation.
b) Assume 60% of the program can be parallelised and there are unlimited resources to boost the speed. What is theoretical speedup for the whole program?Please explain your calculation.
Section B – Coding Questions (total 20 marks)
1. [10 marks] The following code implements the Word Count application.
Please answer the following questions based on the given code (2 marks each).
a) Explain why we use flatMap() in Line 5.
b) Complete Line 7 to only keep the words with more than 4 characters.
c) Explain the purpose of Line 9.
d) Explain the purpose of Line 11.
e) Write the content/value of “words” after executing Line 13.
2. [10 marks] We have a csv file “flights.csv” consisting of the following data:
ID,Start,Destination,Date,Airline,Distance
0001,Brisbane,Sydney,15/01/2000,Qantas,100 many other records
This IT/Computer Science Assignment has been solved by our IT/Computer Science Experts at onlineassignmentbank. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.