Highlights
Course: Applied Statistics and Big Data - Module A1
Instructor: Dimitris Fouskakis 02/09/2020
1. We will use the Salaries data-set (in blackboard under the name Salaries. csv), which contains 2008-09 nine-month academic salary for Assistant Professors (AsstProf), Associate Professors (AssocProf) and Professors (Prof) in a college in the U.S. The data were collected as part of the on-going effort of the college’s administration to monitor salary differences between male and female faculty members.
The data-set contain the following variables:
i. Load the data into R.
ii. Fit a simple linear regression model using salary as the response variable and yrs.service as the explanatory variable. Construct the plot of the least squares fitted line.
iii. Check the assumptions of the above model.
iv. Give the interpretation of the estimated coefficients of the above linear model.
v. Fit a simple linear regression model using salary as the response variable and sex as the explanatory variable.
vi. Give the interpretation of the estimated coefficients of the above linear model and perform statistical inference about them.
vii. Fit a multiple linear regression model using salary as the response variable and all the rest as the explanatory variables.
viii. Give the interpretation of the estimated coefficients of your multiple regression model and perform statistical inference about them. Give an estimate of the standard deviation of the errors and comment on its value. Interpret the value of the coefficient of determination for this model.
ix. Based on the results of your last model, produce a 99% confidence interval for the average nine-month salary, in dollars for a male Assistant Professor, in an “applied” department, who received his PhD 8 years ago and has 6 years of service.
2. Consider the data of Exercise 1. Create a new categorical variable and call it salary cat which takes the value 0 (low) when salary is less than 71000 dollars and the value 1 (high) in all other cases.
i. Fit a simple logistic regression model with salary cat as a response variable and sex as the explanatory variable.
ii. Produce the summary of the logit model you have fitted and interpret the estimated coefficients in terms of odds and odds ratios.
iii. Fit a multiple logistic regression model with salary cat as a response variable and yrs.service and sex as the explanatory variables.
iv. Produce the summary of the logit model you have fitted and interpret the estimated coefficients in terms of odds and odds ratios.
v. Based on the results of your last model, estimate the probability of a low salary for a Female Academic with 6 years of service.
Instructions
i. Assignment submission deadline: 09 September, 2020 at 13:00 (Italian Time). Please send me your paper at fouskakis@math.ntua.gr. Please note that no assignment will be acceptable after this date and time.
ii. Your paper should be on a pdf format. This file should be named using the following format: SURNAME-NAME.pdf (replace with your details). The file should start with a cover page in where you will include your details (title of the assignment, your name, your surname, your email and your student number).
iii. In the pdf file you should try to present the solutions of the exercises in a compact way and explaining the interpretations of your findings as simple as possible. Also you should include the R codes and the R output.
iv. The paper must be typed on a computer (no scanned) and its maximum length would be 20 pages. You can use any word processor you wish but you have to send me the pdf file at the end. All questions are compulsory.
v. It is important that the coursework reflects your knowledge rather than it being simply an accumulation of information. The assignment should be well structured and easy to read.
© Copyright 2026 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.