Statistical Analysis of the Data on Diamonds: Carat, Cut, Colour & Clarity- ggplot2- R Programming Assignment

Download Solution Order New Solution
Internal Code: MAS7690

R Programming Assignment:

Background Information and Data Anyone who has ever thought about buying a diamond knows about the four C’s: Carat, Cut, Colour and Clarity. The relationship of Price with each "C"; is obvious: bigger diamonds are more expensive; colourless diamonds are more expensive (D - most expensive to J - least expensive), clarity ("Internally Flawless"; – most expensive to "Included" – least expensive), cut ("Ideal" – most expensive to "Fair" – least expensive). The dataset diamonds from the package ggplot2 has prices and other attributes of almost 54,000 round cut diamonds. The aim of the assignment is to determine which "C" matters the most when buying a diamond using exploratory and regression analyses. For your report use Price (in US dollars), Carat (weight of the diamond), Cut (Ideal, Premium, Very Good, Good, Fair), Colour (from D to J), and Clarity (IF, VVS1 to I1). Task: 1. Summary statistics and visual graphs Use appropriate data summary methods (Descriptive Statistics) to describe the variables of interest. If you decided to subset the dataset then compare the summary statistics among the subsamples of the variable. If you decided to eliminate any cases from the analysis include a brief explanation why you made that decision. Provide appropriate visual graphs to investigate which "C" Carat (weight of the diamond), Cut (Ideal, Premium, Very Good, Good, Fair), Colour (from D to J), and Clarity (IF, VVS1 to I1) has a stronger influence on the price of diamonds and explain why. If you decided to subset the dataset then compare the features of the graphs among the subsamples of the variable. 2. Create a model for diamond price prediction using least squares regression analysis Discuss which variables you would consider as significant predictors of a diamond price. You may refer to summary statistics and graphs from part 1. Provide additional scatterplots and correlation coefficients if applicable. If you are using a scatterplot, describe the strength of the relationship between the variables. Interpret the intercept and slope coefficients of the estimated regression model. Rate the four “C”s in terms of which “C” matters the most. Explain the reasoning behind the rating strategy. 3. Introduction The introduction is a summary of what is contained in the report. State the purpose of the report, provide information on the problem that required investigation, state how this investigation was carried out, which approaches were used to summarise the variables and creating the prediction model. 4. Conclusion The conclusion is the summary of the findings. It may also include the limitations of your analysis. Avoid introducing new material in your conclusion.  

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.