Highlights
Question
1. This question consists of multiple CSV files (In the Zipped Folder) with ‘large texts’ in one of the columns in each file. Your job is to use the open-source NLP (Natural Language Processing) libraries and perform various tasks.
Task
i. Extract the ‘text’ in all the CSV files and store them into a single ‘.txt file’.
ii. Research Install the libraries(SpaCy - scispaCy ‘en_core_sci_sm’/’en_ner_bc5cdr_md’). Install the libraries (Transformers (Hugging Face) - and any bio-medical model (BioBert) that can detect drugs, diseases, etc from the text).
iii. Programming and Research
Using any in-built library present in Python, count the occurrences of the words in the text (.txt) and give the ‘Top 30’ most common words. And store the ‘Top 30’ common words and their counts into a CSV file.
Using the ‘Auto Tokenizer’ function in the ‘Transformers’ library, write a ‘function’ to count unique tokens in the text (.txt) and give the ‘Top 30’ words.
iv. Named-Entity Recognition (NER) Extract the ‘diseases’, and ‘drugs’ entities in the ‘.txt file’ separately using ‘en_core_sci_sm’/’en_ner_bc5cdr_md’ and biobert. And compare the differences between the two models (Example: Total entities detected by both of them, what’s the difference, check for most common words, and check the difference.)
2. Here’s an adventurous story intertwined with Python programming questions that involve nested for loops, conditional statements, string manipulations, and more.
This IT Computer Science Assignment Help has been solved by our IT Computer Science Experts at My Uni Paper.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.