SIT772: Database and Information Retrieval- Vector & Boolean Model- IT Assignment Help

Download Solution Order New Solution

Highlights

Internal Code: 1HBBJ

IT Assignment Help:

Task: Question 1 Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model. You have collected the following documents (unstructured) and plan to apply an index technique to convert them into an inverted index. Doc 1:Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. Doc 2:Information retrieval is finding material of an unstructured nature that satisfies an information need from within large collections. Doc 3:Information systems is the study of complementary networks of hardware and software that people and organizations use to collect, filter, process, create, and distribute data. In the process of creating the inverted index, please complete the following steps: a. Remove all stop words and punctuation, and then apply Porter’s stemming algorithm to the documents. The list of stop words for this task is provided asfollows: Is, The, Of, To, An, A, From, Can, Be, On, Or, That, Within, And, Use b. Create a merged inverted list including the within-document frequencies for each term. c. Use the index created in part (b) to create a dictionary and the related posting file. d. You may like to test the inverted index by using the following keywords: information, system, index e. Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query. f. Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).  

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.