Linear Regression - Statistical Analysis Report - Statistics Assignment Help

Download Solution Order New Solution
Assignment Task:

Description:

You will use R to conduct a series of analytics on a data set.   There are three data sets to pick from: one is on housing sales in Brooklyn,  another is on heart health, and a third is on air strikes in world war II.   You will perform analytics based on the last two modules - Modules 3 and 4.    You will want to generate your report with an rmd file.   Submit the rmd and file from the knit.

Series of analytics to conduct:

1.  Conduct EDA (exploratory data analysis)  by applying statistical analysis on at least ten variables.  You will want to be selective as to which ones make sense to analyze.  In addition to basic analysis, conduct a t-test between two classes of one variable.  For instance, assuming you had the data, you may want to compare customer expenditures of one product between two stores.   This will require that you identify a variable where you have at least two classes.  Which variable you use is up to you but it should be meaningful.    Additionally, you want to provide plots such as scatter plots and histograms.  Provide at least ten plots including one plot that uses color to plot different classes on the basis of two variables.  For example, if you have housing sales data with the longitude and latitude in the data you can plot this and assign different colors to the dots based on four classes of house prices - which you may need to determine.

Deliverables from this step:

        Statistics analysis - including correlation analysis

       t-tests of two groups

       plots - scatter plots and historograms.

2.  Linear regression:

Identify a likely target variable and conduct linear regression using at least ten predictor variables.  Generate the regression model.    Generate twenty target values using the model  and unseen variable values.   Compare that with the actual values.  Provide a table of the residuals.     Since the data sets are quite large, you will want to work with a sample of about 1000 records.  Provide the output generated from the regression

Deliverables from this step:

   regression using at least ten variables

   table of twenty calculated regression results

   output from regression

3.  logistic regression: 

Identify a variable with at least two classes (categories) and conduct logisitc regression using at least ten predictor variables.  Generate the logistic regression model.    Generate twenty classifications using the model  and unseen variable values.  Compare that with the actual values.  Provide a table of the residuals.     Since the data sets are quite large, you will want to work with a sample of about 1000 records.  Provide the output generated from the logistic regression.

 

This Statistics Assignment has been solved by our Statistics Experts at My Uni Paper. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.