COMP2550/COMP4450/COMP6445: Data Science and Applied Machine Learning Assignment Help

Download Solution Order New Solution
Internal Code: 1ACACF

Data Science and Applied Machine Learning Assignment Help

Task Starting from the sample of 100,000 users from #DebateNight given in the Quantitative Methods tutorial, your task is to predict as accurately as possible the botness feature (this is a regression task) and the is_bot feature (i.e. classification task). Note that the is_bot is a binary feature (TRUE or FALSE) constructed as shown in the Data Science and Machine Learning tutorial: is_bot = TRUE when botness > 0.5, is_bot = FALSE when botness? 0.5. Note also that you cannot use the bots when predicting is_bot. A successful outcome of your assignment are two machine learning processing pipelines (one classifier and one regressor) which outperform the examples in the tutorial, in terms of the following measures: • RMSE, Rsquarred, MAE and MARE – for the regressor; • Balanced Accuracy, Precision, Recall, and FScore – for the classifier. Here are a number of things that you can try to achieve your task (for both the classifier and the regressor): • stratified sampling – make sure the folds on which you learn contain both classes in the same percentage as the entire dataset; • oversampling, undersampling – are both strategies to re-balance the dataset and improve the prediction performance of the minority class; • try out other classifiers – you can explore more complex and/or more powerful classifiers. One technique you might want to look into is bagging (ensemble methods), which take a set of weak classifiers and output one strong classifier through a voting system; • feature preprocessing – in the tutorial we only scratched the surface of all the ways you can preprocess your dataset to improve prediction performance. For example, taking percentiles of a long-tailed feature is only one way you can correct the skewness, taking the log is another. • more data – in the tutorial, we constructed a numerical dataset and we have thrown away half of the features because they were not numeric. But maybe there is information in there crucial to detecting 1 bot. Might be worth looking into that. • external data – you can use anything out there to improve the training of your classifier/regressor. Tweeting patterns and the graph of other users might be indicative. The outcomes. There are two outcomes of this assignment: • written document: each group of students will produce a written report (of maximum 5 pages), in which you will describe every that techniques that you employed to construct the predictors. You need to detail what pre-processing did you use and why, what feature analysis and what feature construction did you perform, which machine learning algorithms did you employ, etc. Feel free to describe techniques that did not work and your explanation towards why. Upon reading this document, the evaluator should get a clear idea of why is this problem difficult, what solutions did you try, what works and what doesn’t and why. Please include graphs and diagrams (R notebook is recommended for this task). It is particularly important that the written document clearly states the contributions of each team member to the project. Note also that each member will submit a confidential statement of the contributions of each team member (see the end of this document). Failure to do so will lead to a grade penalty. • a practical implementation: you are required to construct the code to train one classifier (for is_bot) and one regressor (for botscore), and to test them on a testing dataset. We provide on Wattle a testing set which does not contain the class variable botness. You will be evaluated on your predictions on this dataset (see Grading scheme). We recommend implementing your ML algorithms in R with the caret package, but Python is also acceptable (in which case you need to motivate the need for Python in your written document). Your implementation is required to output, for a given testing set, the performance measures indicate here-above. A minimum requirement to achieve any points for the practical implementation is to achieve a better prediction performance than the predictors constructed in the tutorial. We encourage you to construct boxplots and bar-plots to show the superiority of your predictors and include them in the written document.
ThisCOMP2550/COMP4450/COMP6445 Data Science Assignment has been solved by our Data Science Experts at onlineassignmentbank. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
 

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.