# packages
library(galah)
library(dplyr)
library(here)
library(ggplot2)
galah_config(email = "your-email-here") # ALA-registered email
<- galah_call() |>
starlings filter(doi == "https://doi.org /10.26197/ala.98d038d3-2058-4294-b683-fcb51a11f018") |>
atlas_occurrences()
<- galah_call() |>
starlings_taxonomy identify("Sturnidae") |>
atlas_species()
Appendix D — Joins
If you work with biodiversity data, it is likely that you will need to join two separate datasets at some point to analyse how spatial, temporal, or environmental factors influence species. This chapter provides a brief overview of several common types of joins in dplyr to help you get started.
For a comprehensive introduction to joins, check out the Joins chapter in R for Data Science.
D.0.1 Prerequisites
In this chapter, we will use starling occurrence data from September 2015 in the ALA.
D.1 Keys
Joining dataframes relies on setting a key—one or more columns that exist in a primary table that correspond to one or more columns in a secondary table. Two datasets that we intend to join are matched according to the designated key.
As a simple example, let’s say we want to add complete taxonomic information to our starlings
dataframe, which contains occurrence records with some, but not all, levels of taxonomic information. starlings_taxonomy
contains complete taxonomic information for Sturnidae.
starlings
# A tibble: 3,944 × 8
genus species scientificName cl22 year month decimalLatitude
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Acridotheres Acridotheres t… Acridotheres … Quee… 2015 9 -16.9
2 Acridotheres Acridotheres t… Acridotheres … Quee… 2015 9 -16.9
3 Acridotheres Acridotheres t… Acridotheres … New … 2015 9 -33.8
4 Acridotheres Acridotheres t… Acridotheres … Vict… 2015 9 -37.7
5 Acridotheres Acridotheres t… Acridotheres … Aust… 2015 9 -35.2
6 Acridotheres Acridotheres t… Acridotheres … Quee… 2015 9 -27.2
7 Acridotheres Acridotheres t… Acridotheres … Vict… 2015 9 -38.0
8 Acridotheres Acridotheres t… Acridotheres … Quee… 2015 9 -16.9
9 Acridotheres Acridotheres t… Acridotheres … New … 2015 9 -33.9
10 Acridotheres Acridotheres t… Acridotheres … Aust… 2015 9 -35.3
# ℹ 3,934 more rows
# ℹ 1 more variable: decimalLongitude <dbl>
starlings_taxonomy
# A tibble: 5 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum
<chr> <chr> <chr> <chr> <chr> <chr>
1 https://biodive… Sturnus (St… Linnaeus, 1758 species Animal… Chord…
2 https://biodive… Acridothere… (Linnaeus, 1766) species Animal… Chord…
3 https://biodive… Aplonis (La… (Temminck, 1824) species Animal… Chord…
4 https://biodive… Aplonis (Ap… (G.R. Gray, 1861) species Animal… Chord…
5 https://biodive… Aplonis (Ap… Gould, 1836 species Animal… Chord…
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 5 more variables: class <chr>, order <chr>, family <chr>, genus <chr>,
# vernacular_name <chr>
Let’s join our starlings
dataframe with starlings_taxonomy
. The column genus
in starlings
appears to contain the same information in column genus
in starlings_taxonomy
.
|>
starlings select(genus) |>
distinct()
# A tibble: 3 × 1
genus
<chr>
1 Acridotheres
2 Aplonis
3 Sturnus
|>
starlings_taxonomy select(genus)
# A tibble: 5 × 1
genus
<chr>
1 Sturnus
2 Acridotheres
3 Aplonis
4 Aplonis
5 Aplonis
We can use this genus column as a key to add the extra levels of taxonomic information to the table containing starling occurrence records1.
|>
starlings left_join(starlings_taxonomy,
join_by(genus)) |>
::paged_table() # paged output rmarkdown
Warning in left_join(starlings, starlings_taxonomy, join_by(genus)): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1805 of `x` matches multiple rows in `y`.
ℹ Row 2 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.