LPM Probabilities Sensible And AIC Function - Finance Assignment Help

Download Solution Order New Solution
Assignment Task 

 

1. Describe this data set. You can use the help() function for this. A short paragraph will be enough. Pay special attention to the variable Purchase which will be your response variable. Why is this a classification problem and why someone might find this exercise useful? 

2. What is the fraction of people who purchased insurance? Explain how you computed this  in R. What would be your error rate if you always predicted that someone would not buy insurance? Explain how you computed this in R. 

3. Use the full data set to perform two LPM regressions. One with a constant only and one with all 85 predictors. Which is best according to the SER? What is the value of the AIC for the full regression. You can use the AIC() function. Replicate this value from scratch. In your best model, how many coefficients are statistically significant at the 6% and 17% significance level? Use the p value approach and a loop that cycles through all your p values. You may need to use the vcov function for this. 

4. For the full regression, compute the predicted probabilities and obtain the following features: min, max, mfan. Are the LPM probabilities sensible? Compute the confusion matrix and overall fraction of incorrect predictions. Explain what the confusion matrix is telling you about the types of mistakes made by the LPM.

5. If the insurance company tried to sell insurance at random, the best ir could do is have a success rate of 6%. This could be very costly specially if the broker needs to visit everyone he/she tries to sell insuance to. Argue that the company would like to try to sell insurance only to customers who are likely to buy it. So the overall error rate is not of interest. Instead, the fraction of individuals that are correctly predicted to buy insurance is of interest. The rest of the questions will have you create a training set and a test set by random row selection. The test set will have 1000 random rows and the training set will have all other rows. When using KNN, standardize your predictors using the function scale(). 

6. Fit a logistic regression using the training set and compute the predicted classes over the test set. Use a cut-off of 0.5 for the classifier. Report the confusion matrix in a nicely formatted table. Repeat (using a loop) for a cut-off of 0.25. For each estimation, compute the fraction of individuals that are correctly predicted to buy insurance as well as reporting the confusion matrix. Is this better than the random guessing success rate of 6%? 

7. Now use KNN with K = 1,3,5. For each K, Report the confusion matrix in a nicely formatted table and compute the fraction of individuals that are correctly predicted to buy insurance. Is this better than the random guessing success rate of 6%? 

8. Finally, use LDA to Report the confusion matrix in a nicely formatted table and compute the fraction of individuals that are correctly predicted to buy insurance. Is this better than the random guessing success rate of 6%?

 

 

This Finance Assignment has been solved by our Finance experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.