Essentials of Research Planning

Essentials of Research Planning

Creating mock datasets in R and exporting them as a csv file for use in Excel

Date: October, 2014
Author: Sean Kent

Why would you want to create a mock dataset in the research planning stage?

Properly planning for an experiment is hard, it takes time and forces you to really think about the question you are asking, the hypothesis you are testing, and the data you will be collecting as evidence that will end up supporting or refuting the hypothesis. However, many times the thought and time necessary for properly planning how you will collect and analyze the data is overlooked, causing problems when you start to try to analyze the data. Check out this great page for more background on using R in data planning techniques.

Let’s start by running through the planning process for data collection.

Remember this research is asking if native bee diversity, richness, and abundance decline with distance from native plantings?

Let’s create vectors for the location, replicate, method of collection (bee bowl)


Since we are measuring whether bee diversity declines with distance from native plantings, let create a distance vector. It is good to use simple and descriptive names, so let’s call the vector distance and populate it with the different distances used in the experiments. Bee bowls placed at the edge of the meadow, inside the meadow, and at 20 meter intervals away from the meadow, up to 120 meters away. Here is a picture of the study site.


To combined different values (numbers, strings, etc) we use the c() function, think of it as combining or concatenating.

Distance <- c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m")
##  [1] "EdgeAdmin(0m)"    "MeadowAdmin(20m)" "FarMeadow(20m)"  
##  [4] "FarEdge(0m)"      "20m"              "40m"             
##  [7] "60m"              "80m"              "100m"            
## [10] "120m"

Wait, that’s not exactly what we wanted, there are only 10 values in the distance vector, but to properly test the validity of our hypothesis we need to make sure we have adequate replication, not just a single sample for each distance. However, we are not just creating a single category for each distance, but we are placing 15 bee bowls down at each location, meaning we are creating 15 entries for each distance. Typing in each distance 15 times (150 times total) is labor intensive, tedious, and prone to errors, so lets use the rep() function to speed up the process. I’ve also created two ways to make the vector as well, using the c() and rep() for Distance and Distance 1 respectively.

Distance <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), c(15,15,15,15,15,15,15,15,15,15))

Distance1 <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), rep(15))

Bee Bowls

Good, now we have 150 values in the distance vector. Next, we need to create a vector for our method of collection, bee bowls, specifically the color of the bee bowl.

BowlColor <- rep(c("Blue", "Yellow", "White"), 50) #Create the bowl color variable, 150 records
BowlColor[1:15] #Looks good, notice how it repeats in groups of three.
##  [1] "Blue"   "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"  
##  [8] "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"   "Yellow"
## [15] "White"

Looks good, now lets create the vector for replicate. For each transect, we have 5 total replicates, but we have replicated the color at each location. What I mean is that there is a replicate 1 for the blue, yellow, and white bee bowl. Therefore, we need to have a 3 values for replicate 1, instead of just 1 value. Plus, we need to have replicate 1 for each distance treatment (20m, 40m and so on), so the replicates need to be repeated 10 times. I’ve create the replicate vector in two ways to show you two different ways to do it.

Replicate <- rep(c("1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4", "5", "5", "5"), 10)
Replicate1 <- rep(c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3)),10)
Bee Abundance

Let’s create an empty vector for bee abundance

BeeAbundance <- vector(mode = 'numeric', length = length(Distance))

We need to take each separate vector and combine them into a dataframe. To do that, we will use the data.frame() function. Great, now lets create a comma separated file (can be used as a spreadsheet in excel). To export the file, we use the write.csv() function.

DistanceExperiment <- data.frame(Distance, BowlColor, Replicate,BeeAbundance)
write.csv(DistanceExperiment, "BeeAbundanceDistanceExperiment.csv")
Tada a nice clean spreadsheet


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s