Enjoy Upto 30% off on all Your Assignments ORDER NOW

+61480015851

+61480015851 info@myassignmentservices.com

CSE 435/535: Information Retrieval - Content Curation, Ingestion, Tokenization and Indexing - IT Assignment Help

Download Solution

Order New Solution

Assignment Task:

1. Introduction

The primary purpose of this project is to introduce students to the different technical aspects involved in this course and subsequent projects. By the end of this project, a student would have achieved the following:

• Setting up an AWS account and a simple EC2 instance in AWS

• Learning about the Twitter API, and querying twitter using keywords, language filters and retrieving tweets from user timeline. Make sure to store your data as you will be using this data in the final project.

• Setting up a Solr (a fully functional, text search engine in Java) instance - understand basic Solr terminology and concepts.

The specific challenges in completing this project are as given below:

• Curate the Twitter personalities who hold significant influence on their followers and write script to retrieve replies from their tweets.

• Curate the keywords and hashtags for effective data collection which can be used for downstream tasks of classification, sentiment analysis etc.

• Working with multi-lingual data, and learning to index and tokenize them for search.

• Correctly setup the Solr instance to accommodate language and Twitter specific requirements.

Task 1: Content Curation and Ingestion

Curate the required set of query terms, language filters, person of interest and combinations thereof to crawl and index tweets, subject to the following requirements:

1. At least 40,000 tweets in total with not more than 15% being retweets.

2. At least 5,000 tweets per language i.e, English, Hindi and Italian

3. At least 5,000 tweets per country

4. At least 500 tweets per person of interest (timeline). Note that Twitter allows max 3200 recent tweets to be extracted from a person’s account.

5. During at least 5-day window in each country:

a. At least 1 tweet by each of the POIs per day

b. At least 3,000 tweets collected per day which are about reactions of general public to government’s policies on COVID. Tip: Curate query terms(hashtags/keywords/mentions) which can extract tweets relevant to both COVID and government policy (eg: ‘covid AND government’).

c. This means that for the collected data, the tweet date must have at least 5 distinct values, and for each such day there must be at least 3,000 tweets.

d. Essentially, you cannot collect say 20,000 tweets on one day and split the rest between other two days.

e. Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.

f. Additionally, you cannot collect tweets talking just about COVID in general times (for eg: “I hope #COVID19 goes away before The Batman is released next year!!!” is not a reaction to government’s policy).

This CSE 435/535: IT Assignment has been solved by our IT Experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Name

Email *

Country

Phone No.*

Subject

Deadline (AEST) Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)

Time

Upload your assignment

Kindly mention your assignment details

Captcha

Verify Captcha *

I accept the T&C and all policies of the website and agree to receive offers and updates.

CSE 435/535: Information Retrieval - Content Curation, Ingestion, Tokenization and Indexing - IT Assignment Help

Assignment Task:

Get It Done! Today

Subjects

Contact Us

CSE 435/535: Information Retrieval - Content Curation, Ingestion, Tokenization and Indexing - IT Assignment Help

Assignment Task:

Get It Done! Today

Download Sample

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.