Highlights
1. Introduction
The primary purpose of this project is to introduce students to the different technical aspects involved in this course and subsequent projects. By the end of this project, a student would have achieved the following:
• Setting up an AWS account and a simple EC2 instance in AWS
• Learning about the Twitter API, and querying twitter using keywords, language filters and retrieving tweets from user timeline. Make sure to store your data as you will be using this data in the final project.
• Setting up a Solr (a fully functional, text search engine in Java) instance - understand basic Solr terminology and concepts.
The specific challenges in completing this project are as given below:
• Curate the Twitter personalities who hold significant influence on their followers and write script to retrieve replies from their tweets.
• Curate the keywords and hashtags for effective data collection which can be used for downstream tasks of classification, sentiment analysis etc.
• Working with multi-lingual data, and learning to index and tokenize them for search.
• Correctly setup the Solr instance to accommodate language and Twitter specific requirements.
Task 1: Content Curation and Ingestion
Curate the required set of query terms, language filters, person of interest and combinations thereof to crawl and index tweets, subject to the following requirements:
1. At least 40,000 tweets in total with not more than 15% being retweets.
2. At least 5,000 tweets per language i.e, English, Hindi and Italian
3. At least 5,000 tweets per country
4. At least 500 tweets per person of interest (timeline). Note that Twitter allows max 3200 recent tweets to be extracted from a person’s account.
5. During at least 5-day window in each country:
a. At least 1 tweet by each of the POIs per day
b. At least 3,000 tweets collected per day which are about reactions of general public to government’s policies on COVID. Tip: Curate query terms(hashtags/keywords/mentions) which can extract tweets relevant to both COVID and government policy (eg: ‘covid AND government’).
c. This means that for the collected data, the tweet date must have at least 5 distinct values, and for each such day there must be at least 3,000 tweets.
d. Essentially, you cannot collect say 20,000 tweets on one day and split the rest between other two days.
e. Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
f. Additionally, you cannot collect tweets talking just about COVID in general times (for eg: “I hope #COVID19 goes away before The Batman is released next year!!!” is not a reaction to government’s policy).
This CSE 435/535: IT Assignment has been solved by our IT Experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.