rabm
rabm-vignette.Rmd
rabm
makes extracting data from the Africa Bird Map easier and safer. It is built off an API developed and maintained by Michael Brooks.
The package has three primary extraction functions:
extract_observers
helps to extract records using observers as the filtering unit.extract_species
helps to extract records using species as the filtering unit.extract_all
is meant to extract data without filtering for specific observers and species.For all three functions, a time frame and geographical region must be specified.
In the following sections, we go through all three functions describing their inputs and behavior and providing once case study each. If you find errors, please contact David Clarance or make a pull request on Github.
To submit to the Africa Bird Map, an ‘observer’ must register an account. On doing so an observer number (also called citizen science number) is allocated. This number is the basis of quering the database for records pertaining to an observer.
We begin by taking a quick dive into the function:
extract_observers( <----- This is the function name. It doesn't change.
observer_numbers, <----- Input the observer number or numbers. You can find these on the website.
start_date, <----- What date do you want to draw records starting from? The format is YYYY-MM-DD.
end_date, <----- What date do you want to draw records ending on? The format is YYYY-MM-DD.
region_type, <----- Here you specify what kind of regions you're working with (eg: country, pentad).
region_id, <----- Here you specify the value the region takes (eg: kenya, 0310_4000)
return_type, <----- For more advanced use. You don't need to specify this.
return_format <----- For more advanced use. You don't need to specify this.
)
Generally, users need to specify the observer number or numbers, the range they want to pull records in and the regions for which they’d like to pull records.
The definitions of the arguments are given below in detail:
observer_numbers
A character or a vector of characters that represent the ADU number or numbers that you want to query the database for. Example: observer_numbers = ‘10723’ or observer_numbers = c(‘10723’, ‘40147’)start_date
A character representing the start date from which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Jan,2019. Example: ‘2019-01-26’.end_date
A character representing the end date to which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Feb, 2019. Example: ‘2019-02-26’.region_type
Can take one of the following types: country, pentad, group, qdgc. Default is ‘country’region_id
Has to be co-defined with region_type
. That is, it is an instantiation of the region_type
. For instance if the region_type
= ‘country’, then the region_id
= ‘kenya’. Or if region_type
= ‘pentad’, then region_id
= ‘3355_1825’. Default is ‘kenya’.return_type
There are two options, ‘data’ and ‘count’. ‘data’ is the actual records submitted to the africa bird map. ‘count’ return a count of the rows. Default is ‘data’.return_format
Default s ‘CSV’, though the API default is ‘JSON’Depending on the return_format, the function will return a dataframe with the following:
return_type = data
A tidy dataframe is returned with the records that fall within the specification defined.return_type = count
A tidy dataframe is returns with the number of rows that will be produced if the return_type was to be specified as ‘data’.How would this work in practice?
Doris Schaule is one of Kenya’s most prolific mapper. Let’s explore some of her contributions to the Atlas in Kenya. It must be noted that Doris has contributed records outside Kenya (even in Germany!!), however for purposes of this exercise we will only focus on the records generated in Kenya.
The first step is to obtain Doris’ observer number. I did this by asking her. In theory, it is possible to build a function to get this number, however for privacy reasons I have refrained from doing so. Her number is 12939.
We first begin by loading packages required.
# first load the package
library(rabm) # to pull data from the Africa Bird Atlas
library(dplyr) # to analyze some of the data pulled
library(magrittr) # to pipe commands easily
library(ggplot2) # to to plot some of the data
library(lubridate) # for working with dates
library(tidyr) # more data manipulation functions
Once that’s done, we’ll draw the raw data (each row is a record) for Doris. We’ll use this data to do some exploratory analysis.
# now get all records from 2007 to present date for Doris
doris_raw_records <- extract_observers(
observer_numbers = '12939',
start_date = '2007-01-01',
end_date = '2019-07-01',
region_type = 'country',
region_id = 'kenya'
)
#> Pulling data for 12939
Now that we have the data, let’s take a look at how Doris has contributed to the bird atlas.
How many records has Doris contributed?
nrow(doris_raw_records)
#> [1] 20228
As we can see above, Doris has contributed 20228 records. In reality, these will be more since some records might not have been vetted at the time of publishing this document.
How many full protocol cards has Doris contributed?
n_distinct(doris_raw_records$CardNo)
#> [1] 446
How many pentads has Doris submitted FPs for?
n_distinct(doris_raw_records$Pentad)
#> [1] 68
Again, there is a possibility that Doris has contributed far more. However, these will only enter the database once they have been vetted.
How many distinct species has Doris recorded in Kenya?
n_distinct(doris_raw_records$Spp)
#> [1] 531
How has Doris’ contributions evolved over time?
doris_fp_per_year <- doris_raw_records %>%
# first find the number of cards recorded per year
mutate(YearOfRecord = year(StartDate)) %>%
group_by(YearOfRecord) %>%
summarize(
CardsRecorded = n_distinct(CardNo)
) %>%
ungroup()
# put the information into a nice graph
doris_fp_per_year %>%
ggplot(aes(x = YearOfRecord, y = CardsRecorded)) +
theme_light() +
geom_col() +
labs(title = "FP cards submitted by Doris by year",
caption = "Note: This graph does not contain FP cards that have not been vetted")
If this isn’t impressive enough, let’s see how Doris’ contribution compares to all the other contributors in Kenya.
Let’s first get all records for Kenya. Don’t worry if you don’t recognize this function. We’ll get back to it later in the article.
all_kenya_records <- extract_all(
start_date = "2007-01-01",
end_date = "2019-07-01",
region_type = "country",
region_id = "kenya"
)
Now let’s get the number of FP records submitted each year for all of Kenya but all observers.
all_fp_per_year <- all_kenya_records %>%
mutate(YearOfRecord = year(StartDate)) %>%
group_by(YearOfRecord) %>%
summarize(
CardsRecorded = n_distinct(CardNo)
) %>%
ungroup()
How do Doris’ records figure in Kenya’s contributions?
doris_fp_per_year %>%
rename("DorisFPCards" = "CardsRecorded") %>%
left_join(all_fp_per_year, by = "YearOfRecord") %>%
rename("AllFPCards" = "CardsRecorded") %>%
mutate(OtherFPCards = AllFPCards - DorisFPCards) %>%
mutate(PercentageContributed = round((DorisFPCards / AllFPCards)*100, 2)) %>%
ggplot(aes(x = YearOfRecord, y = PercentageContributed)) +
geom_col() +
ggtitle("Percentage of Doris' contributions to Kenyan records",
subtitle = "There are 310 recorded observers in Kenya")
Doris has single handedly contributed ~10% of all Kenyan records.
Each species recorded in the Africa Bird Atlas is assigned a species ID, referred to in the data as Spp
. To pull data for a species, you’ll need to know it’s species Ids. There are two ways to figure out the Spp
for a particular species.
To find ids for multiple species, the first step is to pull the species list for a particular country. This can be done by using get_species_list()
function. This function only takes one argument: the name of the country.
# get species list for nigeria
nigeria_species_list <- get_species_list("nigeria")
nigeria_species_list
#> # A tibble: 923 x 5
#> Spp Common_group Common_species Genus Species
#> <int> <chr> <chr> <chr> <chr>
#> 1 1109 Alethe Brown-chested Alethe poliocephala
#> 2 2540 Alethe Fire-crested Pseudalethe castanea
#> 3 2660 Ant-thrush White-tailed Neocossyphus poensis
#> 4 2650 Anteating-chat Northern Myrmecocichla aethiops
#> 5 1144 Antpecker Woodhouse's (Red-headed) Parmoptila woodhousei
#> 6 622 Apalis Bar-throated Apalis thoracica
#> 7 3078 Apalis Black-capped Apalis nigriceps
#> 8 3084 Apalis Black-collared Apalis pulchra
#> 9 3074 Apalis Black-throated Apalis jacksoni
#> 10 1099 Apalis Buff-throated Apalis rufogularis
#> # ... with 913 more rows
Now let’s say we wanted to find the species ids for a few common birds (sorry I don’t know my Nigerian birds well enough!). We’re interested in finding cattle egret, laughing dove, speckled pigeons.
To do this we’ll use the find_species()
function. This function takes the species list and species names. You can use both common and scientific names.
interesting_species_names <- c("ibis bubulcus", "laughing dove", "speckled pigeon")
interesting_species_ids <- find_species(species_list = nigeria_species_list,
species_names = interesting_species_names)
interesting_species_ids
#> # A tibble: 3 x 6
#> Spp Common_group Common_species Genus Species SearchColumn
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 61 Egret Cattle Bubulc~ ibis egret cattle bubulcus~
#> 2 317 Dove Laughing Strept~ senegal~ dove laughing strepto~
#> 3 311 Pigeon Speckled Columba guinea pigeon speckled colum~
We begin by taking a quick dive into the function:
extract_species( <----- This is the function name. It doesn't change.
species_ds, <----- Input the Spp. You can find these
start_date, <----- What date do you want to draw records starting from? The format is YYYY-MM-DD.
end_date, <----- What date do you want to draw records ending on? The format is YYYY-MM-DD.
region_type, <----- Here you specify what kind of regions you're working with (eg: country, pentad).
region_id, <----- Here you specify the value the region takes (eg: kenya, 0310_4000)
return_type, <----- For more advanced use. You don't need to specify this.
return_format <----- For more advanced use. You don't need to specify this.
)
The definitions of the arguments are detailed below:
species_ids
A character or a vector of characters that represent the id or ids of the species. Example: observer_numbers = ‘10723’ or observer_numbers = c(‘10723’, ‘40147’)start_date
A character representing the start date from which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Jan,2019. Example: ‘2019-01-26’.end_date
A character representing the end date to which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Feb, 2019. Example: ‘2019-02-26’.region_type
Can take one of the following types: country, pentad, group, qdgc. Default is ‘country’region_id
Has to be co-defined with region_type
. That is, it is an instantiation of the region_type
. For instance if the region_type
= ‘country’, then the region_id
= ‘kenya’. Or if region_type
= ‘pentad’, then region_id
= ‘3355_1825’. Default is ‘kenya’.return_type
There are two options, ‘data’ and ‘count’. ‘data’ is the actual records submitted to the africa bird map. ‘count’ return a count of the rows. Default is ‘data’.return_format
Default is ‘CSV’, though the API default is ‘JSON’Depending on the return_format, the function will return a dataframe with the following: 1. return_type = data
A tidy dataframe is returned with the records that fall within the specification defined. 2. return_type = count
A tidy dataframe is returns with the number of rows that will be produced if the return_type was to be specified as ‘data’.
Cattle Egrets are commonly found across Nigeria. They are easy to identify and often appear on full protocol lists.
We’ll first need to load the required packages. Since we have already loaded the packages, we won’t do it again.
Let’s start by getting all raw records for cattle egrets.
cegret_raw_records <- extract_species(
species_ids = 61,
start_date = '2007-01-01',
end_date = '2019-07-01',
region_type = 'country',
region_id = 'nigeria'
)
#> Pulling data for 61
How many times has the cattle egret been recorded in Nigeria
nrow(cegret_raw_records)
#> [1] 1445
How many pentads has the cattle egret been recorded in?
n_distinct(cegret_raw_records$Pentad)
#> [1] 962
How many FP cards does the cattle egret appear in?
n_distinct(cegret_raw_records$CardNo)
#> [1] 1441
By how many observers has the cattle egret been recorded?
n_distinct(cegret_raw_records$ObserverNo)
#> [1] 111
How does the cattle egret figure against the Grey-backed Camaroptera?
First, lets get the species IDs for both.
common_species <- find_species(species_list = nigeria_species_list, species_names = c("cattle egret", "grey-backed camaroptera"))
common_species
#> # A tibble: 2 x 6
#> Spp Common_group Common_species Genus Species SearchColumn
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 61 Egret Cattle Bubulc~ ibis egret cattle bubulcus ~
#> 2 628 Camaroptera Grey-backed Camaro~ brevic~ camaroptera grey-backe~
Now let’s pull data for both across all of Nigeria.
common_raw_records <- extract_species(
species_ids = common_species$Spp,
start_date = '2007-01-01',
end_date = '2019-07-01',
region_type = 'country',
region_id = 'nigeria'
)
#> Pulling data for 61
#> Pulling data for 628
Let’s look at absolute occurence across years in Nigeria
common_raw_records %>%
# first create a few columns needed
mutate(YearOfRecording = year(StartDate)) %>%
mutate(CommonName = glue::glue("{Common_species} {Common_group}")) %>%
# find aggregates by year
group_by(Spp, CommonName, YearOfRecording) %>%
summarize(
AbsoluteOccurence = n_distinct(CardNo)
) %>%
ungroup() %>%
# make a plot
ggplot(aes(x = YearOfRecording, y = AbsoluteOccurence, fill = CommonName)) +
theme_light() +
geom_col(position = "dodge") +
ggtitle("Absolute number of records for two common species in Nigeria")
Very often you will require data across species and observers. This is where the extract_all
function comes in use. This functions pulls all data for a specified date range and a specified region.
We begin by taking a quick dive into the function:
extract_all( <----- This is the function name. It doesn't change.
start_date, <----- What date do you want to draw records starting from? The format is YYYY-MM-DD.
end_date, <----- What date do you want to draw records ending on? The format is YYYY-MM-DD.
region_type, <----- Here you specify what kind of regions you're working with (eg: country, pentad).
region_id, <----- Here you specify the value the region takes (eg: kenya, 0310_4000)
return_type, <----- For more advanced use. You don't need to specify this.
return_format <----- For more advanced use. You don't need to specify this.
)
The definitions of the arguments are detailed below:
start_date
A character representing the start date from which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Jan,2019. Example: ‘2019-01-26’.end_date
A character representing the end date to which you want to query the database. It must be in the YYYY-MM-DD format. Defaults to 1 Feb, 2019. Example: ‘2019-02-26’.region_type
Can take one of the following types: country, pentad, group, qdgc. Default is ‘country’region_id
Has to be co-defined with region_type
. That is, it is an instantiation of the region_type
. For instance if the region_type
= ‘country’, then the region_id
= ‘kenya’. Or if region_type
= ‘pentad’, then region_id
= ‘3355_1825’. Default is ‘kenya’.return_type
There are two options, ‘data’ and ‘count’. ‘data’ is the actual records submitted to the africa bird map. ‘count’ return a count of the rows. Default is ‘data’.return_format
Default is ‘CSV’, though the API default is ‘JSON’Depending on the return_format, the function will return a dataframe with the following: 1. return_type = data
A tidy dataframe is returned with the records that fall within the specification defined. 2. return_type = count
A tidy dataframe is returns with the number of rows that will be produced if the return_type was to be specified as ‘data’.
It is known that Africa Paradise Flycatchers have both a resident population and a migrating population in Kenya. The question we ask here is: Are these two groups equally distributed across the country, or does the migratory population only migrate to specific parts of the country?
To do this, we’ll look at data from two very well mapped pentads.
Pentad 0115_3645
is in the heart of Nairobi and contains the National Museums of Kenya. We call this the museum pentad.
Pentad 0355_3940
is on the coast and is Doris’ home pentad. We call this the coast pentad.
Both pentads have more than a 100 full protocol lists submitted.
We first start by extract data for the two pentads.
raw_museum <- extract_all(
start_date = '2007-01-01',
end_date = '2018-12-31',
region_type = 'pentad',
region_id = '0115_3645'
)
raw_coast <- extract_all(
start_date = '2007-01-01',
end_date = '2018-12-31',
region_type = 'pentad',
region_id = '0355_3940'
)
An alternate way to do the same would be to pull all data for Kenya using region_type = 'country'
and region_id = kenya
and then filter for the two pentads. The method above is faster.
Next we find the reporting rates for both pentads for each month.
apf_museum_monthly <- raw_museum %>%
mutate(MonthOfRecording = month(StartDate, label = TRUE)) %>%
group_by(MonthOfRecording) %>%
summarize(
TotalCards = n_distinct(CardNo),
TotalPresent = sum(Spp == 682),
RecordingRateMuseum = round((TotalPresent / TotalCards)*100, 2)
) %>%
ungroup() %>%
select(MonthOfRecording, Museum = RecordingRateMuseum)
apf_coast_monthly <- raw_coast %>%
mutate(MonthOfRecording = month(StartDate, label= TRUE)) %>%
group_by(MonthOfRecording) %>%
summarize(
TotalCards = n_distinct(CardNo),
TotalPresent = sum(Spp == 682),
RecordingRateCoast = round((TotalPresent / TotalCards)*100, 2)
) %>%
ungroup() %>%
select(MonthOfRecording, Coast = RecordingRateCoast)
Finally we produce a graph that displays these two.
apf_museum_monthly %>%
left_join(., apf_coast_monthly, by = "MonthOfRecording") %>%
gather(Location, ReportingRate, Museum:Coast) %>%
ggplot(aes(x = MonthOfRecording, y = ReportingRate, color = Location, group = Location)) +
theme_light() +
geom_line() +
ggtitle("Reporting rates for Africa Paradise Flycatcher in two pentads")
This graph gives a solid indication that the migratory population goes through the coast. This comes with the caveat that this analysis is just meant to introduce the user to data extract and is not meant to be a statistically rigorous investigation.