Ecological Data Module Assignment

Download Solution Order New Solution

Assignment Task

Complete the questions and record the answers as you go. When you are confident, input the answers using the quiz in LMS. As always, remember that plagiarism is considered a serious offence at UWA and some of the questions in assignments are designed to test for this – you are welcome to discuss and compare your work with other people but you must write your own script and do all the questions yourself. There is quite a lot of work involved in this assignment and some parts are challenging: allow yourself plenty of time and don’t drive yourself crazy thinking you have to get 100%!

Question

1. Effect of time since fire on species abundance

This question is similar to the first example from the labs, so make sure you’ve done and understood that one first. A researcher was investigating the effect of fire on the abundance of a rare lizard species. She identified different sites within a reserve that had been last burned at different times. She then used pit traps to survey for the lizard abundance, recording the number of lizard caught within each site. The data is in the file ‘timesincefiredata_assignment.xlsx’.

Get the data into R and have a look at it (including a plot). Does it look like there is a relationship between time since fire and the abundance of the lizard? Do you think it looks linear, or does it look like it might be more exponential? Does it go up and down like the data in the lab example? Do you think you’ll need a parabola to model it, like we did in the lab?

We’ll use a Poisson family glm to model the data. Why is this more appropriate than a standard linear model with normal (Gaussian) errors, or a glm with binomial errors or Gamma errors?

Fit a series of models to the data, predicting abundance in terms of time since fire. Try models with the following link functions: identity, log and square root. Looking at help(family) in R will tell you how to use different link functions. Next fit a further three models using the same three link functions, but with an additional explanatory variable - time since fire squared, to see if a quadratic model is justified. Plot the predictions of the six fitted models. Which one looks like it fits the data best?

Compare the AIC values of the six models to decide which one is best. (Record the AIC values).

Check this best model for evidence of overdispersion. If needed, fit an appropriate model to account for overdispersion.

Record the p-value that you would report for significance of the effect of time since fire on the abundance of the lizard based on this best model.

What would you conclude from this analysis?

2. Drought experiments

Find and open the 'drought' excel file. This data shows the results from a drought experiment for four different species. There were ten big pots for each species. In each case, about the same amount of seed was sown in each pot, but due to variability in germination, the number of plants in each pot was quite variable.

The number of plants in each pot was counted and then a six week drought was applied. After six weeks the pots were watered again. The number of plants that survived the drought was then counted in each of the pots.

The research question was whether there were differences between species in drought tolerance (ie survival).

First put the data into an appropriate format within Excel for entry into R (a separate row for each rep, or pot in this case), convert to .csv and read it in.

(Or convert within R if you prefer)

Next calculate the percentage survival for each pot in R.

Use a boxplot to plot the percentage survival by species – what does this show?

Use a standard linear ANOVA to test for differences between species. What do you find?

There are several major problems with using a standard linear ANOVA in this case. Binomial data, like survival data, is not likely to be normally distributed, especially if there are values close to 0% or 100%. Traditionally this was dealt with by arcsin transformation. You could do this to the data easily in R, then try the ANOVA again. However, we have another problem because there are different numbers of plants in each pot. We should therefore give the pots with more plants more weight in the analysis. This can be done using the ‘lm’ function and you can look up how to do it, but it is much easier to use the ‘glm’ function with binomial error distribution, which will handle all these issues automatically.

Use a binomial glm analysis to test whether there are differences between species. Don’t forget to check whether there is evidence of overdispersion in the data, and account for it if needed.

If you find a significant effect of species overall, then how to do pair-wise comparisons is always a good question… there is not really any super easy approach with a binomial glm. One possibility in this case to test whether two similar species are significantly different is to do a glm on a subset of the data containing only these species. You can get the subset by using the R function ‘subset’.

Another approach might be to relabel the two most similar species with one name (so they are then the same level for the species factor), then fit another model (give it another name), then test whether the two models are different. (This is like what we did in labs for the germination example). If they are not significantly different, then the relabelling is ok, which means the species are not significantly different. You can continue trying to group species in this way until you know that all ungrouped combinations are significantly different.

Test for differences between species using either of these two methods above.

Another issue in this case is the fact that plant density may have had an effect on survival, and that this effect could have depended on species as well. Fit another glm with pre-drought plant number as a covariate, and determine whether there is evidence for whether plant density may have had an effect on survival, and whether this effect could have depended on species. If you find evidence that plant density has had an effect on survival then you will need to test for differences between species again, while also accounting for this affect of plant density. In this case, plotting percentage survival against initial plant density with different colours/characters for different species will help a lot, especially if you then plot the model predictions for the different species as well.

Write down your conclusions about this experiment based on your full analysis.

Another Drought Experiment

The same drought experiment is repeated with the same species, but in a different soil, and you are happy you did the analysis of the first one in R, because now you have all the code you need to do the analysis of the second. The data for this experiment can be found in the 'drought2' excel file.

Redo the analysis above for the second experiment, making sure you check for overdispersion and the effect of plant density again, and accounting for them as needed.

Write down your conclusions about this second drought experiment based on your full analysis.

3. Mountain diversity example

Researchers want to test the hypothesis that plant diversity increases with altitude. They find different mountains where national parks or other protected areas cover a reasonably wide range of altitudes going up the side of the mountain – wide enough to enable them to sample sites at different altitudes along a transect up the side of the mountain. They record abundances of all plant species at each site on each mountain, and then calculate the diversity of each site. The resulting data is in the file "mountaindiversity.csv".

How many different mountains did they sample?

Did they sample the same number of sites on each mountain?

Were the sites sampled on each particular mountain evenly spaced in altitude?

Were the sites sampled on a particular mountain always at the same altitude as the sites sampled on every other mountain?

Plot the height of all the sites against their diversity. Does it look like there is a relationship? Positive or negative? Fit a simple linear model predicting diversity from height. Is it significant? What is the p-value? Does the fitted model indicate a positive or negative relationship (whether significant or not)? Is this test valid? Why/why not?

Boxplot diversity predicted by mountain. Does it look like there are differences between mountains? Fit a simple linear model predicting diversity from mountain. Is it significant? What is the p-value? Is this test valid? Why/why not?

Calculate the mean diversity for each mountain and the mean height for each mountain. Plot these mean heights (on the x-axis) against these mean diversities (on the y-axis). Does it look like there is a relationship? Positive or negative? Fit a simple linear model predicting mean diversity from mean height. Is it significant? What is the p-value? Does the fitted model indicate a positive or negative relationship (whether significant or not)? Is this test valid? Why/why not?

Use the ‘aov’ and ‘Error’ functions to fit a linear model predicting diversity in terms of height, but with a random effect for mountain. Is height a significant predictor of diversity? What is the p-value? Does the fitted model indicate a positive or negative relationship (whether significant or not)? Is this test valid? Why/why not?

Use the ‘lme’ function from the ‘nlme’ library to fit a linear model predicting diversity in terms of height, but with a random effect for mountain. Is height a significant predictor of diversity? What is the p-value? Does the fitted model indicate a positive or negative relationship (whether significant or not)? Is this test valid? Why/why not?

Plot the height of all the sites against their diversity again, but this time using a different colour and/or symbol for each mountain. Hopefully this will help you understand the results of the previous tests better. Make any other plots you think will help. Write down your conclusions from the analysis and plotting that you have done.

4. Germination Data Revisited

In this question we go back to germination data, like in the lab. Lots of what you’ll need was covered in the lab examples, but there are also some fairly challenging parts to this question. If you are struggling and short of time, you may decide to skip some of the more challenging parts towards the end, and focus on doing all the other questions well. The last few percent may not be worth compromising your mental health!

The data set comes from an experiment on germinating white sapote seeds ( Casimiroa edulis , also known as Aztec fruit or cochitzapotl ). Sapote growers would like seeds to germinate faster, so researchers have trialled a new treatment they hope will speed up germination. They had 100 pots, each with a white sapote seed in it. 50 randomly selected pots were treated and the remainder acted as controls. The researchers checked every day and recorded the day that each seed germinated. The data is recorded in a file called 'white sapote time to germination.csv'.

Get the data into R and call the data frame that is read in ‘ws’. Have a look at ‘ws’. The two columns correspond to the times to germination for the 50 seeds for the control/treatment as indicated.

You’ll need to get the data into standard format for the next bit ie one variable for all the times and another indicating which treatment. You can do this in R or Excel as you prefer (quicker in R if you can work it out – use the ‘c’ function to make a variable with all the times and factor(rep(c('c','t'),each=50)) will make a variable with the treatments.) Make a boxplot showing time to germination for the two treatments (ie treatment/control). Does it look like the treatment has an effect? What kind of effect?

Fit a linear model predicting time to germination by treatment. Plot the residuals by treatment (and/or fitted value). Does it look like we have homogeneity of variance? Plot a histogram of the residuals. Does it look like we have normal residuals/errors? Apply a Bartlett test and a Shapiro test to check these… you should find that they clearly indicate that there is a problem with both heterogeneity of variance and non-normal residuals/errors. In what way do the residuals/errors appear non-normal?

These data are actually ‘survival data’, in that they represent the time until something happened for a sample of individuals. A linear model is often likely to be inappropriate for such data, as we see above. A Poisson model is a possible option for such data in this case, because the days are all whole numbers, like counts. Fit a Poisson glm predicting time to germination by treatment, with default link function, and look at the results. Is over-dispersion a problem? How do you know? Fit a quasipoisson model to deal with this. Does this quasipoisson model indicate that the treatment has a significant impact?

A glm with a Gamma distribution for errors is often used to model ‘survival data’, at least in simple cases. Fit a Gamma glm predicting time to germination by treatment, with default link function, and look at the results. Does this model indicate that the treatment has a significant impact? Is the level of significance indicated by the Gamma glm very different to that indicated by the quasipoisson model? (remember anything between 0.01 and 0.05 gets one star * indicating significant, whereas less than 0.01 gets ** indicating highly significant, so 0.015 vs 0.0015 would be very different; 0.015 vs 0.025 would not be very different). Record the AIC of all models fitted so far… which one appears to be the best?

Now we want to go back to do something similar to the lab - using a binomial model. For that we need to modify the data to get the total number of seeds germinated at each time, for each treatment. You could probably do that in Excel, if you had nothing better to do on a Saturday afternoon. But let’s do it in R instead – much faster. Create a histogram of the times to germination for the control only, with ‘bins’ or ‘breaks’ of size 1. This code should work: hist(ws$control,breaks=0:100). This makes a plot but also creates an R object in the process. Give this object created a name, say ‘histo’ and then look at it. Note that it has a sub-object within it called ‘counts’, which the number in each bin of the histogram, or in this case, the number of seeds germinating on each day. This sub-object can be pulled out using histo$counts, just like pulling a variable out of a data frame. Applying the ‘cumsum’ function to these counts then provides the cumulative sum of seeds that have germinated by each day, which is what we want. It should now be pretty easy to calculate this for each treatment. You’ll then need to stick these two variables together to get one long list of the total number of seeds germinated at each time, for each treatment. Remember that the ‘c’ function sticks numbers together. And then you’ll need to create a second variable for the times and a third variable for the treatment. (All of this could be done in Excel if you prefer of course.) When you have the data sorted, you should be able to plot cumulative germination by time and get a plot like the one on the next page.

This Ecological has been solved by our PhD Experts at My Uni Paper.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.