Highlights
Objective
Clean and standardize entity names (sponsoring government agencies and awarded companies) from SAM.gov (USA) and the E-Procurement Government of India databases.
Part 1: Data Cleaning and Standardization
Manually clean and standardize a subset (100 records) of entity names from the provided datasets..
Part 2: Automation Proposal and Script Development
Develop a basic automation script or method using Python and language models (OpenAI API, Llama2, etc.) to standardize entity names in the datasets.
Part 3: Scalability and Production Readiness
Document how the proposed method can be scaled and implemented in a production environment.
Details :
Include considerations for continuous data updating and processing large volumes of data. Explain how the method adheres to data quality and standards.
Evaluation Criteria
Standards & Quality: Accuracy and consistency in the final cleaned and standardized data. Scalability: The potential of the method to handle large datasets efficiently in a production environment.
Documentation: Clarity and comprehensiveness of the documentation, including reasoning for scaling the solution.
Deliverables
Candidates should submit a Google Drive folder containing:
Documentation
Additional Task Details
This IT Computer Science has been solved by our PhD Experts at My Uni Paper.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.