Business Analytics: Imbalance Data in Classification Techniques- Toy Info- Report Writing Assignment Help

Download Solution Order New Solution
Internal Code: 1HGEJ

Report Writing Assignment Help:

Task: SECTION A: Discussion Questions
  •  Explain the concept of having the imbalance data in classification techniques and the way that it should be treated in developing the classification models?
  • Explain the concept of over-fitting. Explain how overfitting can be avoided?
  • Give two examples of how logistic regression can be used. You only need to explain the problem. One example is the bank that are using logistic regression to classify its new customers for loan approval. The bank wanted to identify customers that are more likely to default on their loan. Explain why you cannot use linear regression in your examples.
SECTION B:  QUANTITATIVE QUESTIONS
  1. There are 500 client records in the first sheet of the file Toy-Info which have shopped many special toys from an e-Business website. Each record includes data on types of product purchased (between 1-5), purchase amount ($), age, gender, marital status, whether the client has a membership and whether the customer has a discount card. A business analyst has applied the k-means clustering method on all seven variables.The analyst increased the number of clusters to recommend a proper value of k. The resultant tests for k=5 and k=6shown in the following sheets of the filerevealed the best k as k=6.
    1. Explain how the analyst found that k=6is a proper number of clusters. Refer the relevantsheet name, table name and the values you compared.
    2. Describe all6 clusters by their average characteristics.
  2. A company provides maintenance service for washing machines in Victoria.The analyst of the company aims to estimate the repair time and the service cost for each maintenance. He assumes the repair time as the dependent variable which can be related to number of months since last service, type of repair and the repair person. The following table reports 10 samples of the maintenances.
Repair time (hours) Months since last service Type of repair Repairperson
2.1 2 Mechanical John
2.8 2 Electrical John
1.6 3 Mechanical John
3.9 4 Electrical Bob
2.5 6 Mechanical John
3.1 6 Electrical John
4.5 7 Electrical Bob
4.7 8 Electrical Bob
3.8 9 Mechanical Bob
4.6 9 Electrical Bob
  • Create an estimated simple regression modelfor this data wheremonths since last service is the independent variable. What does the model indicate about the relationship between months since last service and repair time? How strong is the relationship? Report the accuracy measures and the equation.
  • Calculate the residual errors for each repair exists in the table and interpret the meaning of positive and negative values of the residualsin this analysis.Which type of repair (electrical or mechanical) is more desirable and which repairperson (John or Bob) has worked more efficient?
  • Create a scatter chart with months since last service on the x axis for which the points representing electrical and mechanical repairs are shown in different colors. Create a similar chart of months since last service and repair time for which the points representing repairs by John and Bob are shown in differentcolors. Do these charts suggest any potential modifications to your simple linear regression model? Why?
  1. The following data is the results of a 4- year study conducted to assess how age, weight, and gender influence the risk of diabetes. Risk is interpreted as the probability (times 100) that the patient will have diabetes over the next 4-year period.
    1. Develop a multiple regression model that relates risk of diabetes to the person’s age, weight and the gender. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression.
    2. Develop an estimated multiple regression model that relates risk of diabetes to the person’s age, weight, gender and life style. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression.
    3. What is the risk percentage of diabetes over the next 4 years for a 55-year-old man living in a big city with 70 kg weight? Use both models to estimate the risk and compare the result.
 

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.