Atop Darien

Bee Curiosity

Leave a comment

Essentials of Research Planning: Data Collection

Essentials of Research Planning

Massasoit Land Use Research and Data Collection Planning

Date: October, 2014
Author: Sean Kent
Using R to plan out the collection of aquatic invertebrates and water samples

Overarching research question: How does land use and sustainable landscaping influence ecosystem services and native biodiversity?

Last week, we examined how to create a mock dataset to plan out the experiment that examined how large the impact of sustainable landscaping was on essential pollinators, native bees, specifically by asking “Does native bee diversity, richness, and abundance decline with distance from the native plantings?”. Check out this great page for more background on using R in data planning techniques. Here, you will need to create the following variables (if necessary, create other variables that are not on this list)

  1. Study Site
  2. Date
  3. Replicate
  4. Variable(s) for the water quality parameters you will be testing
  5. Variable(s) for the aquatic and soil invertebrates that you will be collecting
Review: How do you create a variable?

Recall from last week, to create a variable for the location of bee bowls in the distance experiment, the following code in R was used. We had 10 different locations and 15 bowls placed at each location.

Location <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), c(15,15,15,15,15,15,15,15,15,15))

How do you create a variable filled with random numbers?

Let’s take a look at how to create a vector filled with random numbers

WaterTemperature <- rnorm(150, mean = 25, sd = 10)

The WaterTemperature” vector will have a lenght of 150 with values that have a mean of 25 and standard deviation of 10. Use this example code to create and fill up vectors to plan out the water quality and biodiversity data collection.

Review: How do you create an empty vector?

Let’s say you want to create an vector that is empty (no values), you can use the following code example. Notice how I the lenght of the vector isn’t directly identified, ie. “length = 150”, but is “length = length(Location) ”, which makes sure that the lenght of this vector is as long as another vector that you are using.

WaterTemperature1 <- vector(mode = 'numeric', length = length(Location))

We need to take each separate vector and combine them into a dataframe. To do that, we will use the data.frame() function. Here is a quick example of how to create a dataframe of two vectors.

a <- c(1,2,3,4)
b <- c("Yes", "Yes", "No", "No")
DataFrame <- data.frame(a,b)
## [1] 1 2 3 4
## [1] "Yes" "Yes" "No"  "No"
##   a   b
## 1 1 Yes
## 2 2 Yes
## 3 3  No
## 4 4  No

To export the file, we use the write.csv() function.

DataFrame <- data.frame(a,b)
write.csv(DataFrame, "DataFrameExample.csv")

Leave a comment

Essentials of Research Planning

Essentials of Research Planning

Creating mock datasets in R and exporting them as a csv file for use in Excel

Date: October, 2014
Author: Sean Kent

Why would you want to create a mock dataset in the research planning stage?

Properly planning for an experiment is hard, it takes time and forces you to really think about the question you are asking, the hypothesis you are testing, and the data you will be collecting as evidence that will end up supporting or refuting the hypothesis. However, many times the thought and time necessary for properly planning how you will collect and analyze the data is overlooked, causing problems when you start to try to analyze the data. Check out this great page for more background on using R in data planning techniques.

Let’s start by running through the planning process for data collection.

Remember this research is asking if native bee diversity, richness, and abundance decline with distance from native plantings?

Let’s create vectors for the location, replicate, method of collection (bee bowl)


Since we are measuring whether bee diversity declines with distance from native plantings, let create a distance vector. It is good to use simple and descriptive names, so let’s call the vector distance and populate it with the different distances used in the experiments. Bee bowls placed at the edge of the meadow, inside the meadow, and at 20 meter intervals away from the meadow, up to 120 meters away. Here is a picture of the study site.


To combined different values (numbers, strings, etc) we use the c() function, think of it as combining or concatenating.

Distance <- c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m")
##  [1] "EdgeAdmin(0m)"    "MeadowAdmin(20m)" "FarMeadow(20m)"  
##  [4] "FarEdge(0m)"      "20m"              "40m"             
##  [7] "60m"              "80m"              "100m"            
## [10] "120m"

Wait, that’s not exactly what we wanted, there are only 10 values in the distance vector, but to properly test the validity of our hypothesis we need to make sure we have adequate replication, not just a single sample for each distance. However, we are not just creating a single category for each distance, but we are placing 15 bee bowls down at each location, meaning we are creating 15 entries for each distance. Typing in each distance 15 times (150 times total) is labor intensive, tedious, and prone to errors, so lets use the rep() function to speed up the process. I’ve also created two ways to make the vector as well, using the c() and rep() for Distance and Distance 1 respectively.

Distance <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), c(15,15,15,15,15,15,15,15,15,15))

Distance1 <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), rep(15))

Bee Bowls

Good, now we have 150 values in the distance vector. Next, we need to create a vector for our method of collection, bee bowls, specifically the color of the bee bowl.

BowlColor <- rep(c("Blue", "Yellow", "White"), 50) #Create the bowl color variable, 150 records
BowlColor[1:15] #Looks good, notice how it repeats in groups of three.
##  [1] "Blue"   "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"  
##  [8] "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"   "Yellow"
## [15] "White"

Looks good, now lets create the vector for replicate. For each transect, we have 5 total replicates, but we have replicated the color at each location. What I mean is that there is a replicate 1 for the blue, yellow, and white bee bowl. Therefore, we need to have a 3 values for replicate 1, instead of just 1 value. Plus, we need to have replicate 1 for each distance treatment (20m, 40m and so on), so the replicates need to be repeated 10 times. I’ve create the replicate vector in two ways to show you two different ways to do it.

Replicate <- rep(c("1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4", "5", "5", "5"), 10)
Replicate1 <- rep(c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3)),10)
Bee Abundance

Let’s create an empty vector for bee abundance

BeeAbundance <- vector(mode = 'numeric', length = length(Distance))

We need to take each separate vector and combine them into a dataframe. To do that, we will use the data.frame() function. Great, now lets create a comma separated file (can be used as a spreadsheet in excel). To export the file, we use the write.csv() function.

DistanceExperiment <- data.frame(Distance, BowlColor, Replicate,BeeAbundance)
write.csv(DistanceExperiment, "BeeAbundanceDistanceExperiment.csv")
Tada a nice clean spreadsheet