Atop Darien

Bee Curiosity

Summarizing the Native Bee Dataset using the PLYR Package in R: Part 1

Leave a comment

Using PLYR package in R to summarive the native bee dataset

Summarizing the Native Bee Dataset from the Massasoit Summer Research Undergraduate Program

In several posts, I will cover how to summarize the native bee data so we can create meaningful graphs and later conduct appropriate statistical analyzes. In the last post, I went over how to examine the dataset for errors and we will be using the corrected dataset here.

As the dataset is currently constituted, we cannot easily summarize the data, create graphs, or analyze the data. To create dataset summaries, we will use the plyr package in R to create summary datasets. Before we begin, it’s important to always keep in mind the major overarching question of this research:

What is the influence of land use on ecosystem health (in this case native bees)?

Based on this question, it would be helpful to have a dataset that summarizes the abundance of each native bee genera at each site. Let’s use the plyr package and the ddply() function to accomplish this task. First, we should develop a plan on how to organize the summary data.

How is the native bee dataset organized?

For the native bee data, each native bee that was collected was given a unique identifying number (SpeciesID).

You can see the unique identifying number in the photo below

PinnedBeeSpecimen

Here are the following pieces of data that were also recorded in the dataset.

  1. Collector: who collected the data
  2. Locality data:
    • State
    • County
    • City
    • Specific Location
    • Latitude & Longitude
  3. Date: Date the specimen was collected
  4. Collection Method
    • The collection method is either aerial netting of flying bees or collecting bees in bee bowls
  5. Bowl_Color
  6. Genus
  7. Species
Here is the actual R output of the column names
##  [1] "SpeciesID"        "Replicate"        "Collector"       
##  [4] "State"            "County"           "City"            
##  [7] "Location"         "Latitude"         "Longitude"       
## [10] "Date"             "CollectionMethod" "Bowl_Color"      
## [13] "Genus"            "Species"          "Notes"
Here is the first several rows of the dataset
##   SpeciesID Replicate                        Collector         State
## 1   2014037         5 Schoener, D. & Massasoit Interns Massachusetts
## 2   2014043         5 Schoener, D. & Massasoit Interns Massachusetts
## 3   2014045         2 Schoener, D. & Massasoit Interns Massachusetts
## 4   2014046         2 Schoener, D. & Massasoit Interns Massachusetts
## 5   2014047         5 Schoener, D. & Massasoit Interns Massachusetts
## 6   2014033         6 Schoener, D. & Massasoit Interns Massachusetts
##     County     City    Location Latitude Longitude    Date
## 1 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 2 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 3 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 4 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 5 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 6 Plymouth Brockton BeaverBrook    42.08     70.99 7/23/14
##   CollectionMethod Bowl_Color          Genus   Species
## 1          PanTrap      White    Agapostemon Virescens
## 2          PanTrap       Blue    Agapostemon Virescens
## 3          PanTrap      White    Agapostemon Virescens
## 4          PanTrap       Blue    Agapostemon          
## 5          PanTrap       Blue    Agapostemon Virescens
## 6          PanTrap     Yellow Augochlorella     aurata
##                  Notes
## 1                     
## 2                     
## 3                     
## 4                     
## 5                     
## 6 unsure need to check
Using the PLYR package and ddply() function to summarize the native bee data
  • First, let’s create a data set that contains the number of bees collected at each location, independent of collection method. For the initial analyses, we will only examine whether or not differences between the genera of native bees found at each study site, as opposed to looking at differences between species.
library(plyr)
Bees.site <- ddply(Bees.df, .(Location, Genus), summarise, # summarize total abundance of bees for each genera at each site
                  TotalBees = length(Genus)) 
  • As you can see, we used the ddply() function in the plyr package to summarize the data. If you examine the R code above, I’ve created a dataframe (Bees.site) that will have three columns: Location, Genus present at each location, and the total number of individuals in each genera found at each location. The length() function just counts the occurance of each genera for each site. Now, let’s take a look at the actual dataset we created.
Bees.site
##           Location          Genus TotalBees
## 1      BeaverBrook    Agapostemon         5
## 2      BeaverBrook           Apis         1
## 3      BeaverBrook Augochlorella          3
## 4      BeaverBrook         Bombus        10
## 5      BeaverBrook     Calliopsis         1
## 6      BeaverBrook       Halictus         1
## 7      BeaverBrook   Lasioglossum         4
## 8      BeaverBrook      Megachile         1
## 9      BeaverBrook          Osmia         1
## 10     BeaverBrook       Xylocopa         1
## 11 MassasoitMeadow    Agapostemon         3
## 12 MassasoitMeadow        Andrena         2
## 13 MassasoitMeadow     Anthophora         2
## 14 MassasoitMeadow           Apis         7
## 15 MassasoitMeadow Augochlorella          2
## 16 MassasoitMeadow         Bombus        34
## 17 MassasoitMeadow     Calliopsis         4
## 18 MassasoitMeadow       Halictus        16
## 19 MassasoitMeadow        Hylaeus         2
## 20 MassasoitMeadow   Lasioglossum        18
## 21 MassasoitMeadow      Megachile         5
## 22 MassasoitMeadow    Melissacler         1
## 23 MassasoitMeadow       Xylocopa         1
## 24 PoorMeadowBrook    Agapostemon         1
## 25 PoorMeadowBrook Augochlorella          3
## 26    SheepPasture Augochlorella          2
## 27    SheepPasture       Halictus         1
## 28    SheepPasture   Lasioglossum        13
## 29    SheepPasture          Osmia         1
## 30    WestgateMall           Apis         9
## 31    WestgateMall         Bombus         2
## 32    WestgateMall       Halictus         2
## 33    WestgateMall   Lasioglossum         7
## 34    WestgateMall      Megachile         1
## 35    WindsorTrail        Andrena         1
str(Bees.site)
## 'data.frame':    35 obs. of  3 variables:
##  $ Location : Factor w/ 6 levels "BeaverBrook",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Genus    : Factor w/ 14 levels "Agapostemon",..: 1 4 5 6 7 8 10 11 13 14 ...
##  $ TotalBees: int  5 1 3 10 1 1 4 1 1 1 ...

That looks great, but we should export the file as a .csv that can be read by spreadsheet software like excel. This is really easy using the write.csv() function in R, as you can see below.

write.csv(Bees.site, "BeesPerSiteSummarySummer2014.csv") #export a new spreadsheet file that summarizes the abundance of bees in each genera for each site

In the next post, I will show you how to break up the dataset so we can just examine the bees that were collected by bee bowls. We will use the plyr package and the subset() function to create the summarized datasets.

Advertisements

Author: Sean Kent

I am a naturalist, nature photographer, field ecologist, and educator. I have an external fondness for all creatures great and small, especially native bees and flowering plants.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s