Atop Darien

Bee Curiosity


Leave a comment

Native Bee Resources

Want to learn more about our amazing diversity of native bee species. Here is a nice video providing a brief overview of native bees.

Native Bee Pocket Identification Guide: This is an identification guide created by Xerces for the state of Pennsylvannia. It will be quite useful for most bees that you will encounter in Massachusetts. Link: http://www.xerces.org/download/pdf/PA_Pocket_Guide.pdf

Bee Basics – An introduction to out native bees: This is an absolutely fantastic and free document that gives an overview of bee biology, conservation, and It also has beautiful artwork and is easy to understand. Link: http://www.fs.usda.gov/Internet/FSE_DOCUMENTS/stelprdb5306468.pdf

Louie Schwartzberg’s incredible film on pollination.

 

Xerces Foundation: This is a nice page about ways to conserve pollinators.
Link: http://www.xerces.org/pollinator-conservation

Xerces Foundation: This is a nice overview of native bee biology.

The Buzz on Native Bees. This is a nice blog post about native bees from 2013. It is from the USGS and their staff scientist Dr. Hannah Hamilton.

Native bee identification adaptive key from Discoverlife: This is an excellent resource to key out native bees to genus or species. With a little bit of knowledge of bee morphology, you can identify most bees in the Northeast US to genus. Link: http://www.discoverlife.org/mp/20q?search=Apoidea#Identification

USGS Bee Inventory and Monitoring Lab: This site has amazing photographs of native bees and other insects. Sam Droege is also an amazing biologist, scientist, and person. Link: https://www.flickr.com/photos/usgsbiml/

Here is an example of a few photos that you can find on the website. These photos are all part of the public domain and can be reused for any purpose. See Sam’s flickr page for more details.

Handy Dandy Bee Manual: Fantastic resource and guide useful when doing conservation, research, or monitoring project on native bees. The Handy Dandy Bee manual has the protocols and background material needed to do most types of bee monitoring, including how to process and store specimens. Link: https://www.pwrc.usgs.gov/nativebees/Handy%20Bee%20Manual/The%20Very%20Handy%20Manual%20-%202015.pdf

Bug Guide (Native Bee Photos): This is a terrific website that has a strong community base of naturalists that post and identify photographs of insects taken in the United States.
Link: http://bugguide.net/node/view/475348

Books:

Wilson, J.S. & Carril O.M. The Bees in your backyard: A Guide to North American’s Bee. Princeton University Press. 2016.

Xerces Foundation. Pollinator Conservation Handbook

Michener, C. Bees of the World. Johns Hopkins University Press. 2000


Leave a comment

Essentials of Research Planning

Essentials of Research Planning

Creating mock datasets in R and exporting them as a csv file for use in Excel

Date: October, 2014
Author: Sean Kent

Why would you want to create a mock dataset in the research planning stage?

Properly planning for an experiment is hard, it takes time and forces you to really think about the question you are asking, the hypothesis you are testing, and the data you will be collecting as evidence that will end up supporting or refuting the hypothesis. However, many times the thought and time necessary for properly planning how you will collect and analyze the data is overlooked, causing problems when you start to try to analyze the data. Check out this great page for more background on using R in data planning techniques.

Let’s start by running through the planning process for data collection.

Remember this research is asking if native bee diversity, richness, and abundance decline with distance from native plantings?

Let’s create vectors for the location, replicate, method of collection (bee bowl)

Distance

Since we are measuring whether bee diversity declines with distance from native plantings, let create a distance vector. It is good to use simple and descriptive names, so let’s call the vector distance and populate it with the different distances used in the experiments. Bee bowls placed at the edge of the meadow, inside the meadow, and at 20 meter intervals away from the meadow, up to 120 meters away. Here is a picture of the study site.

DistanceExperiment

To combined different values (numbers, strings, etc) we use the c() function, think of it as combining or concatenating.

Distance <- c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m")
Distance
##  [1] "EdgeAdmin(0m)"    "MeadowAdmin(20m)" "FarMeadow(20m)"  
##  [4] "FarEdge(0m)"      "20m"              "40m"             
##  [7] "60m"              "80m"              "100m"            
## [10] "120m"

Wait, that’s not exactly what we wanted, there are only 10 values in the distance vector, but to properly test the validity of our hypothesis we need to make sure we have adequate replication, not just a single sample for each distance. However, we are not just creating a single category for each distance, but we are placing 15 bee bowls down at each location, meaning we are creating 15 entries for each distance. Typing in each distance 15 times (150 times total) is labor intensive, tedious, and prone to errors, so lets use the rep() function to speed up the process. I’ve also created two ways to make the vector as well, using the c() and rep() for Distance and Distance 1 respectively.

“r
Distance <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), c(15,15,15,15,15,15,15,15,15,15))

Distance1 <- rep(c("EdgeAdmin(0m)", "MeadowAdmin(20m)", "FarMeadow(20m)", "FarEdge(0m)", "20m", "40m", "60m", "80m", "100m","120m"), rep(15))
“`

Bee Bowls

Good, now we have 150 values in the distance vector. Next, we need to create a vector for our method of collection, bee bowls, specifically the color of the bee bowl.

BowlColor <- rep(c("Blue", "Yellow", "White"), 50) #Create the bowl color variable, 150 records
BowlColor[1:15] #Looks good, notice how it repeats in groups of three.
##  [1] "Blue"   "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"  
##  [8] "Yellow" "White"  "Blue"   "Yellow" "White"  "Blue"   "Yellow"
## [15] "White"
Replicate

Looks good, now lets create the vector for replicate. For each transect, we have 5 total replicates, but we have replicated the color at each location. What I mean is that there is a replicate 1 for the blue, yellow, and white bee bowl. Therefore, we need to have a 3 values for replicate 1, instead of just 1 value. Plus, we need to have replicate 1 for each distance treatment (20m, 40m and so on), so the replicates need to be repeated 10 times. I’ve create the replicate vector in two ways to show you two different ways to do it.

Replicate <- rep(c("1", "1", "1", "2", "2", "2", "3", "3", "3", "4", "4", "4", "5", "5", "5"), 10)
Replicate1 <- rep(c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3)),10)
Bee Abundance

Let’s create an empty vector for bee abundance

BeeAbundance <- vector(mode = 'numeric', length = length(Distance))

We need to take each separate vector and combine them into a dataframe. To do that, we will use the data.frame() function. Great, now lets create a comma separated file (can be used as a spreadsheet in excel). To export the file, we use the write.csv() function.

DistanceExperiment <- data.frame(Distance, BowlColor, Replicate,BeeAbundance)
write.csv(DistanceExperiment, "BeeAbundanceDistanceExperiment.csv")
Tada a nice clean spreadsheet

DistanceExperimentSpreadsheet


Leave a comment

Summarizing the Native Bee Dataset using the PLYR Package in R: Part 1

Using PLYR package in R to summarive the native bee dataset

Summarizing the Native Bee Dataset from the Massasoit Summer Research Undergraduate Program

In several posts, I will cover how to summarize the native bee data so we can create meaningful graphs and later conduct appropriate statistical analyzes. In the last post, I went over how to examine the dataset for errors and we will be using the corrected dataset here.

As the dataset is currently constituted, we cannot easily summarize the data, create graphs, or analyze the data. To create dataset summaries, we will use the plyr package in R to create summary datasets. Before we begin, it’s important to always keep in mind the major overarching question of this research:

What is the influence of land use on ecosystem health (in this case native bees)?

Based on this question, it would be helpful to have a dataset that summarizes the abundance of each native bee genera at each site. Let’s use the plyr package and the ddply() function to accomplish this task. First, we should develop a plan on how to organize the summary data.

How is the native bee dataset organized?

For the native bee data, each native bee that was collected was given a unique identifying number (SpeciesID).

You can see the unique identifying number in the photo below

PinnedBeeSpecimen

Here are the following pieces of data that were also recorded in the dataset.

  1. Collector: who collected the data
  2. Locality data:
    • State
    • County
    • City
    • Specific Location
    • Latitude & Longitude
  3. Date: Date the specimen was collected
  4. Collection Method
    • The collection method is either aerial netting of flying bees or collecting bees in bee bowls
  5. Bowl_Color
  6. Genus
  7. Species
Here is the actual R output of the column names
##  [1] "SpeciesID"        "Replicate"        "Collector"       
##  [4] "State"            "County"           "City"            
##  [7] "Location"         "Latitude"         "Longitude"       
## [10] "Date"             "CollectionMethod" "Bowl_Color"      
## [13] "Genus"            "Species"          "Notes"
Here is the first several rows of the dataset
##   SpeciesID Replicate                        Collector         State
## 1   2014037         5 Schoener, D. & Massasoit Interns Massachusetts
## 2   2014043         5 Schoener, D. & Massasoit Interns Massachusetts
## 3   2014045         2 Schoener, D. & Massasoit Interns Massachusetts
## 4   2014046         2 Schoener, D. & Massasoit Interns Massachusetts
## 5   2014047         5 Schoener, D. & Massasoit Interns Massachusetts
## 6   2014033         6 Schoener, D. & Massasoit Interns Massachusetts
##     County     City    Location Latitude Longitude    Date
## 1 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 2 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 3 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 4 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 5 Plymouth Brockton BeaverBrook    42.08     70.99  7/8/14
## 6 Plymouth Brockton BeaverBrook    42.08     70.99 7/23/14
##   CollectionMethod Bowl_Color          Genus   Species
## 1          PanTrap      White    Agapostemon Virescens
## 2          PanTrap       Blue    Agapostemon Virescens
## 3          PanTrap      White    Agapostemon Virescens
## 4          PanTrap       Blue    Agapostemon          
## 5          PanTrap       Blue    Agapostemon Virescens
## 6          PanTrap     Yellow Augochlorella     aurata
##                  Notes
## 1                     
## 2                     
## 3                     
## 4                     
## 5                     
## 6 unsure need to check
Using the PLYR package and ddply() function to summarize the native bee data
  • First, let’s create a data set that contains the number of bees collected at each location, independent of collection method. For the initial analyses, we will only examine whether or not differences between the genera of native bees found at each study site, as opposed to looking at differences between species.
library(plyr)
Bees.site <- ddply(Bees.df, .(Location, Genus), summarise, # summarize total abundance of bees for each genera at each site
                  TotalBees = length(Genus)) 
  • As you can see, we used the ddply() function in the plyr package to summarize the data. If you examine the R code above, I’ve created a dataframe (Bees.site) that will have three columns: Location, Genus present at each location, and the total number of individuals in each genera found at each location. The length() function just counts the occurance of each genera for each site. Now, let’s take a look at the actual dataset we created.
Bees.site
##           Location          Genus TotalBees
## 1      BeaverBrook    Agapostemon         5
## 2      BeaverBrook           Apis         1
## 3      BeaverBrook Augochlorella          3
## 4      BeaverBrook         Bombus        10
## 5      BeaverBrook     Calliopsis         1
## 6      BeaverBrook       Halictus         1
## 7      BeaverBrook   Lasioglossum         4
## 8      BeaverBrook      Megachile         1
## 9      BeaverBrook          Osmia         1
## 10     BeaverBrook       Xylocopa         1
## 11 MassasoitMeadow    Agapostemon         3
## 12 MassasoitMeadow        Andrena         2
## 13 MassasoitMeadow     Anthophora         2
## 14 MassasoitMeadow           Apis         7
## 15 MassasoitMeadow Augochlorella          2
## 16 MassasoitMeadow         Bombus        34
## 17 MassasoitMeadow     Calliopsis         4
## 18 MassasoitMeadow       Halictus        16
## 19 MassasoitMeadow        Hylaeus         2
## 20 MassasoitMeadow   Lasioglossum        18
## 21 MassasoitMeadow      Megachile         5
## 22 MassasoitMeadow    Melissacler         1
## 23 MassasoitMeadow       Xylocopa         1
## 24 PoorMeadowBrook    Agapostemon         1
## 25 PoorMeadowBrook Augochlorella          3
## 26    SheepPasture Augochlorella          2
## 27    SheepPasture       Halictus         1
## 28    SheepPasture   Lasioglossum        13
## 29    SheepPasture          Osmia         1
## 30    WestgateMall           Apis         9
## 31    WestgateMall         Bombus         2
## 32    WestgateMall       Halictus         2
## 33    WestgateMall   Lasioglossum         7
## 34    WestgateMall      Megachile         1
## 35    WindsorTrail        Andrena         1
str(Bees.site)
## 'data.frame':    35 obs. of  3 variables:
##  $ Location : Factor w/ 6 levels "BeaverBrook",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Genus    : Factor w/ 14 levels "Agapostemon",..: 1 4 5 6 7 8 10 11 13 14 ...
##  $ TotalBees: int  5 1 3 10 1 1 4 1 1 1 ...

That looks great, but we should export the file as a .csv that can be read by spreadsheet software like excel. This is really easy using the write.csv() function in R, as you can see below.

write.csv(Bees.site, "BeesPerSiteSummarySummer2014.csv") #export a new spreadsheet file that summarizes the abundance of bees in each genera for each site

In the next post, I will show you how to break up the dataset so we can just examine the bees that were collected by bee bowls. We will use the plyr package and the subset() function to create the summarized datasets.


Leave a comment

Using R for data validation: Checking the native bee dataset from the Massasoit’s summer undergraduate research project

Using R for data validation: Checking the native bee dataset

This is post walking you through the very important, but often overlooked, process of data validation. I will show you how to use R to check for errors in the dataset on native bees from the summer research project.

Examining the structure of the bee data

We use the str() function to examine the structure of the dataset. This will tell us what type of variables we have, the type of data we have in each variable, and is a great first way to check for any errors in the dataset.

str(Bees.df)
## 'data.frame':    170 obs. of  14 variables:
##  $ SpeciesID                              : int  2014096 2014100 2014108 2014109 2014115 2014116 2014117 2014118 2014119 2014120 ...
##  $ Collector                              : Factor w/ 14 levels " Haskell, D. ",..: 7 4 4 4 6 14 14 14 13 2 ...
##  $ State                                  : Factor w/ 1 level "Massachusetts": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County                                 : Factor w/ 2 levels "Bristol","Plymouth": 2 2 2 2 2 2 2 2 2 2 ...
##  $ City                                   : Factor w/ 3 levels "Brockton","E. Bridgewater",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Location                               : Factor w/ 7 levels "BeaverBrook",..: 2 2 2 2 1 2 2 2 5 5 ...
##  $ Latitude                               : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ Longitude                              : num  70.6 70.6 70.6 70.6 70.6 ...
##  $ Date                                   : Factor w/ 5 levels "7/17/14","7/18/14",..: 1 1 1 1 4 1 2 3 4 4 ...
##  $ CollectionMethod..PanTrap.or.AerialNet.: Factor w/ 2 levels "AerialNet","PanTrap": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Bowl_Color                             : Factor w/ 6 levels "Blue","None",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Genus                                  : Factor w/ 14 levels "Agapostemon",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Species                                : Factor w/ 10 levels "","Andreniformis",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Notes                                  : Factor w/ 7 levels "","#1","#2","Female",..: 1 1 1 1 1 1 1 1 1 1 ...

Let’s examine the SpeciesID column to make sure that each bee has a unique identifier and that there has been no duplicate or skipped numbers. There should be 170 unique identifying numbers, meaning we should see a SpeciesID range from 2014001 to 2014170, so let’s check the range first to see if we get 2014001 to 2014170.
The reason I know there are 170 observations is because at the top of the structure output you see ## ‘data.frame’: 170 obs. of 14 variables:

range(Bees.df$SpeciesID)
## [1] 2014001 2014166

Oh no, that’s a problem we have a range that only goes to 2014166, when it should go to 2014170. This means one of two things, either the data was entered incorrectly for a few specimens or that some species have the same number.

Before we try to figure out how to fix this problem, I want you to think about how this would be extremely difficult to catch if you had to do it either in Excel or actually by looking at each individual bee in the collection.

Ok, so R recognizes SpeciesID as an integer, like the phosphate data for example, but SpeciesID is really a unique character for each bee, not an integar, so lets change it to a factor first.

SpeciesID <- as.factor(Bees.df$SpeciesID) #use the as.factor to change an integar variable to a factor
summary(SpeciesID)
## 2014064 2014065 2014072 2014080 2014085 2014086 2014122 2014123 2014001 
##       2       2       2       2       2       2       2       2       1 
## 2014002 2014003 2014004 2014005 2014006 2014007 2014008 2014009 2014010 
##       1       1       1       1       1       1       1       1       1 
## 2014011 2014012 2014013 2014014 2014015 2014016 2014017 2014018 2014019 
##       1       1       1       1       1       1       1       1       1 
## 2014020 2014021 2014022 2014023 2014024 2014025 2014026 2014027 2014028 
##       1       1       1       1       1       1       1       1       1 
## 2014029 2014030 2014031 2014032 2014033 2014034 2014035 2014036 2014037 
##       1       1       1       1       1       1       1       1       1 
## 2014038 2014039 2014040 2014041 2014042 2014043 2014044 2014045 2014046 
##       1       1       1       1       1       1       1       1       1 
## 2014047 2014048 2014049 2014050 2014052 2014053 2014054 2014055 2014056 
##       1       1       1       1       1       1       1       1       1 
## 2014057 2014058 2014059 2014060 2014061 2014062 2014063 2014066 2014067 
##       1       1       1       1       1       1       1       1       1 
## 2014068 2014069 2014070 2014071 2014073 2014074 2014075 2014076 2014077 
##       1       1       1       1       1       1       1       1       1 
## 2014078 2014079 2014081 2014082 2014083 2014084 2014087 2014088 2014089 
##       1       1       1       1       1       1       1       1       1 
## 2014090 2014091 2014092 2014093 2014094 2014095 2014096 2014097 2014098 
##       1       1       1       1       1       1       1       1       1 
## (Other) 
##      63

The output for the summary file is quite long, let’s just look at the first part of it.

x <- summary(SpeciesID) #I created a variable to easier explore the dupicates in the dataset
x[1:8] #Look at the summary(SpeciesID) above, you will notice that the first 8 SpeciesID values have a 2 associated with it, meaning there are two entries for each unique SpeciesID
## 2014064 2014065 2014072 2014080 2014085 2014086 2014122 2014123 
##       2       2       2       2       2       2       2       2

In the dataset, there eight species identification numbers that are assigned to two bees. SpeciesID
2014064, 2014065, 2014072, 2014080, 2014085, 2014086, 2014122, 2014123. Here is the uncorrected dataset. After determining the problems with the SpeciesID numbers, Adam and I went back to the dataset and found out the bees that had duplicated entries.

Screenshot of the Excel Dataset

I’ve highlighted four of the eight errors in the dataset and Adam and I corrected each issue that has been identified. 4 bees had the same number, 2014164 and 2014165 were entered as 2014064 and 2014065, so that was fixed as well. Lastly, one entry was entered twice. Thus, instead of having 170 collected bees, we have 169 collected bees.

Now that we’ve corrected the SpeciesID errors, let’s check to make sure there are not anymore problems with that column.

Let’s look at the structure of the corrected dataset and also the range of SpeciesID, which should be from 2014001 to 2014169.

str(Corrected.df)
## 'data.frame':    168 obs. of  15 variables:
##  $ SpeciesID       : int  2014037 2014043 2014045 2014046 2014047 2014059 2014048 2014022 2014028 2014033 ...
##  $ Replicate       : int  5 5 2 2 5 5 5 6 5 6 ...
##  $ Collector       : Factor w/ 13 levels " Haskell, D. ",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ State           : Factor w/ 1 level "Massachusetts": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County          : Factor w/ 2 levels "Bristol","Plymouth": 2 2 2 2 2 2 2 2 2 2 ...
##  $ City            : Factor w/ 3 levels "Brockton","E. Bridgewater",..: 1 1 1 1 1 2 1 1 1 1 ...
##  $ Location        : Factor w/ 7 levels "BeaverBrook",..: 1 1 1 1 1 3 7 2 2 1 ...
##  $ Latitude        : num  42.1 42.1 42.1 42.1 42.1 ...
##  $ Longitude       : num  71 71 71 71 71 ...
##  $ Date            : Factor w/ 5 levels "7/17/14","7/18/14",..: 5 5 5 5 5 5 5 4 4 4 ...
##  $ CollectionMethod: Factor w/ 2 levels "AerialNet","PanTrap": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Bowl_Color      : Factor w/ 5 levels "Blue","None",..: 4 1 4 1 1 1 4 1 1 5 ...
##  $ Genus           : Factor w/ 14 levels "Agapostemon",..: 1 1 1 1 1 1 2 5 5 5 ...
##  $ Species         : Factor w/ 10 levels "","Andreniformis",..: 8 8 8 1 8 8 1 3 3 3 ...
##  $ Notes           : Factor w/ 7 levels "","#1","#2","Female",..: 1 1 1 1 1 1 3 7 7 7 ...
range(Corrected.df$SpeciesID)
## [1] 2014001 2014169
check1 <- as.factor(Corrected.df$SpeciesID)
x <- summary(check1)
x[1:10]
## 2014001 2014002 2014003 2014004 2014005 2014006 2014007 2014008 2014009 
##       1       1       1       1       1       1       1       1       1 
## 2014010 
##       1

Looks like all the problems with the speciesID column have been corrected. Now we need to check the other columns to see if there are any problems there as well.

str(Bees.df)
## 'data.frame':    170 obs. of  14 variables:
##  $ SpeciesID                              : int  2014096 2014100 2014108 2014109 2014115 2014116 2014117 2014118 2014119 2014120 ...
##  $ Collector                              : Factor w/ 14 levels " Haskell, D. ",..: 7 4 4 4 6 14 14 14 13 2 ...
##  $ State                                  : Factor w/ 1 level "Massachusetts": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County                                 : Factor w/ 2 levels "Bristol","Plymouth": 2 2 2 2 2 2 2 2 2 2 ...
##  $ City                                   : Factor w/ 3 levels "Brockton","E. Bridgewater",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Location                               : Factor w/ 7 levels "BeaverBrook",..: 2 2 2 2 1 2 2 2 5 5 ...
##  $ Latitude                               : num  42.4 42.4 42.4 42.4 42.4 ...
##  $ Longitude                              : num  70.6 70.6 70.6 70.6 70.6 ...
##  $ Date                                   : Factor w/ 5 levels "7/17/14","7/18/14",..: 1 1 1 1 4 1 2 3 4 4 ...
##  $ CollectionMethod..PanTrap.or.AerialNet.: Factor w/ 2 levels "AerialNet","PanTrap": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Bowl_Color                             : Factor w/ 6 levels "Blue","None",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Genus                                  : Factor w/ 14 levels "Agapostemon",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Species                                : Factor w/ 10 levels "","Andreniformis",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Notes                                  : Factor w/ 7 levels "","#1","#2","Female",..: 1 1 1 1 1 1 1 1 1 1 ...

Let’s look at the other columns:

  1. State: Factor w/ 1 level “Massachusetts”
    – One state, that’s correct
  2. County: Factor w/ 2 levels “Bristol”,“Plymouth”
    – Two counties, that’s correct
  3. City: Factor w/ 3 levels “Brockton”,“E. Bridgewater”
    – Three cities, that’s correct
  4. Location: Factor w/ 7 levels “BeaverBrook”
    – Seven field sites, that’s correct
  5. CollectionMethod: Factor w/ 2 levels “AerialNet”,“PanTrap”:
    – Two collection methods, that’s correct
  6. Bowl_Color: Factor w/ 6 levels “Blue”,“None”
    – Six bowl colors, that’s wrong. There should be 5 bowl colors: none, Yellow, Blue, Orange, and White.
levels(Bees.df$Bowl_Color)
## [1] "Blue"   "None"   "Orange" "White"  "yellow" "Yellow"
Bees.df$Bowl_Color[85]
## [1] yellow
## Levels: Blue None Orange White yellow Yellow

Notice, that by typing in a capital and lower case for Yellow, R recognizes that as two factors instead of what we were actually trying to show with a single factor. In row 85, the bowl color was typed as “yellow” instead of “Yellow”. Bees.df$Bowl_Color[85] That’s easy to fix.

Bees.df$Bowl_Color[Bees.df$Bowl_Color == "yellow"] <- "Yellow"
Bees.df$Bowl_Color[85]
## [1] Yellow
## Levels: Blue None Orange White yellow Yellow

Let’s check the corrected dataset one last time to make sure that the data all appears to be valid.

## 'data.frame':    168 obs. of  15 variables:
##  $ SpeciesID       : Factor w/ 168 levels "2014001","2014002",..: 37 43 45 46 47 59 48 22 28 33 ...
##  $ Replicate       : Factor w/ 6 levels "1","2","3","4",..: 5 5 2 2 5 5 5 6 5 6 ...
##  $ Collector       : Factor w/ 13 levels " Haskell, D. ",..: 11 11 11 11 11 11 11 11 11 11 ...
##  $ State           : Factor w/ 1 level "Massachusetts": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County          : Factor w/ 2 levels "Bristol","Plymouth": 2 2 2 2 2 2 2 2 2 2 ...
##  $ City            : Factor w/ 3 levels "Brockton","E. Bridgewater",..: 1 1 1 1 1 2 1 1 1 1 ...
##  $ Location        : Factor w/ 7 levels "BeaverBrook",..: 1 1 1 1 1 3 7 2 2 1 ...
##  $ Latitude        : num  42.1 42.1 42.1 42.1 42.1 ...
##  $ Longitude       : num  71 71 71 71 71 ...
##  $ Date            : Factor w/ 5 levels "7/17/14","7/18/14",..: 5 5 5 5 5 5 5 4 4 4 ...
##  $ CollectionMethod: Factor w/ 2 levels "AerialNet","PanTrap": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Bowl_Color      : Factor w/ 5 levels "Blue","None",..: 4 1 4 1 1 1 4 1 1 5 ...
##  $ Genus           : Factor w/ 14 levels "Agapostemon",..: 1 1 1 1 1 1 2 5 5 5 ...
##  $ Species         : Factor w/ 10 levels "","Andreniformis",..: 8 8 8 1 8 8 1 3 3 3 ...
##  $ Notes           : Factor w/ 7 levels "","#1","#2","Female",..: 1 1 1 1 1 1 3 7 7 7 ...