Appendix G — Packages

These are some of our favourite packages, with most appearing many times throughout the book. We thought it would be nice to feature them more prominently here.

G.0.1 galah

galah is an R and Python interface to biodiversity data hosted by the Atlas of Living Australia (ALA). It enables users to query and download species occurrence records (observations, specimens, eDNA records, etc.), taxonomic information, or associated media such as images or sounds.

G.0.2 here

here is a package for easy file referencing in project-oriented workflows. The here package creates paths relative to the top-file directory in ways that are more robust to errors. Artwork by allison_horst.

G.0.3 skimr

skimr is a package for providing rich summaries of variables in datasets in R. Users can skim() these summary statistics to quickly get an overview of their data.

G.0.4 tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structure. Some notable packages within the tidyverse are dplyr, tidyr, and ggplot2.

G.0.5 ozmaps

ozmaps is a package that hosts a collection of Australian spatial object data: coastlines, state outlines, local government areas, and electoral boundaries. It streamlines the process of downloading shapefiles of Australia. Data in ozmaps are derived from the Australian Bureau of Statistics (2016).

G.0.6 sf

sf allows users to encode spatial vector data. The sf package provides access to simple features in R, allowing users to conduct geometrical operations using coordinate reference systems for spatial data manipulation.

G.1 General cleaning

G.1.1 dplyr

dplyr is a popular package for data manipulation, providing a consistent set of verbs (e.g., filter(), select(), mutate(), group_by()) to restructure your data.

G.1.2 tidyr

tidyr is a package to help you make “tidy data”, where each variable is a column, each observation is a row, and each value is a cell.

G.1.3 stringr

stringr is a package with a cohesive set of functions to filter, match, and prepare, and correct strings.

G.1.4 janitor

janitor is a package with a collection of helper functions for initial data exploration, summarising, and data cleaning.

G.1.5 ggplot2

ggplot2 is a package for visualising data using The Grammar of Graphics, where users assign variables to display, the type of graphics to use, and additional aesthetics or theme formatting.

G.2 Taxonomic validation

G.2.1 taxize

taxize allows users the ability to search taxonomic names over many taxonomic data sources using scientific and common names to resolve synonyms in their data.