Bank Telemarketing Campaign Analysis Assignment

Download Solution Order New Solution

Assignment Task

In this question you will analyse data from a telemarketing campaign. In this campaign, a Portuguese retail bank attempted to get its customers to subscribe to a long-term deposit. These customers were either pro-actively contacted by the bank through a call-centre, or were customers that got in contact with the call centre for other reasons. All the contacts were by telephone, either using a land-line or using a mobile.

Data relating to calls taking place between May 2008 and December 2008 are given in the file bank08 and stored in the file bank08. RData. The variables are: employment: the employment status of the customer, taking the values active (is employed or self-employed) or inactive (student, retired or unemployed)

  • Marital: the marital status of the customer, taking the values single, married or single-again (that is, divorced or widowed)
  • Education: the highest level of education the customer has received, taking the values primary, secondary or tertiary
  • Mortgage: whether the customer has a housing loan, taking the values yes or no
    personal Loan: whether the customer has a personal loan, taking the values yes or no
    contact: the type of telephone used for the contact, taking the values landline or mobile
    month: the month in which the call took place, taking the values mar, apr, may, jun, jul, aug, sep, oct, nov, dec day: the day of the week on which the call took place, taking the values mon, tue, wed, thu or fri
  • previousContact: the number of contacts with the customer performed before this campaign
  • previousOutcome: outcome of contact from previous campaign, taking the values failure, nonexistent (because customer was not contacted)
  • Missing: for each call, the number of missing values in the variables age to previousOutcome duration: the duration of the call, in seconds

The aim in this question will be to identify a good model that predicts, based on information known before the call is placed, whether a customer is likely to take out a long-term deposit.

The marks for this question are divided between parts (a) to (e) as follows.

(a) In this part, you will carry out an exploratory data analysis of the data in bank08.

(i) Explain why it does not make sense in this context to treat duration as an explanatory variable.

(ii) Through the use of suitable plots, comment concisely on the distribution of values in each variable age to nMissing individually. Although you should obtain a visual summary of each variable whilst working on your EMA, include just two plots in your answer.

(iii) For this rest of this question you should treat age as a covariate and the other explanatory variables employment, marital, education, personalLoan, contact, month, day and previousContact as factors. Give a disadvantage of treating age as a covariate instead of as a factor. For which, if any, of these explanatory variables does it not matter whether they are treated as a covariate or a factor. Justify your choice.

(iv) Create a new data frame, called bank08NonMissing that contains only the calls for which there are no missing values. For the rest of this question you should work with the data in bank08NonMissing instead of bank08. (Hint: If you have attached bank08 you are therefore advised to detach it at this point and attach bank08NonMissing.) What type of data analysis does using bank08NonMissing correspond to? In your opinion is such a data analysis likely to be valid? Justify your opinion.

(v) In the next part you will consider duration (or a transformation of it) as the response instead of outcome. With the help of a suitable plot, explain why this seems reasonable.

(vi) Based on just the variable itself, which transformation of duration (which may include the transformation raising duration to the

(vi) Based on just the variable itself, which transformation of duration (which may include the transformation raising duration to the power 1, equivalent to not transforming it) do you feel makes it most suitable to be used as the response variable in a linear model? Justify your choice.

(vii) Using the transformation you selected in part (a) (vi), graphically explore the relationship between the age of the customer and the duration of the call. If the age of the customer is to be included in a linear model of the duration of the call, which transformation of age (which may be the transformation raising age to the power 1) appears to be most suitable? Justify your choice.
(Hint: In doing this part you may like to adapt the code given in TMA05 Q1(a)(ii).)

(b) In this part you will try modelling the (transformed) duration using the variables (transformed) age, employment, marital, education, mortgage, personal Loan, contact, month, day, previousContact and previousOutcome.

(i) Through the use of simple linear regression and/or 1-way ANOVA, identify the variables that seem to be strongly related to the
transformation of duration you selected in part (a) (vi). For each of these variables include the associated p-value.

(ii) Find the model that best fits the (transformed) duration. In your solution you should describe the process you have used to find this model.

(iii) Check the final model you obtained in part (b)(ii). Do the assumptions seem reasonable?
(c) In this part you will try modelling outcome using the variables (transformed) age, employment, marital, education, mortgage, personalLoan, contact, month, day, previousContact and previousOutcome.

(iv) Considering the months over which calls were made in 2008 and 2009, why might this cause a problem when using the models based on the 2008 data to predict successful calls made in 2009?

This Statistics has been solved by our PHD Experts at My Uni Paper.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.