The ERA dataset is a synthesis of peer-reviewed research conducted in sub-Saharan Africa. It provides valuable information on livestock diets, focusing on:
This Quarto notebook demonstrates how to download and subset the ERA dataset, and how to extract tables containing data on:
Links for reference:
- ERA
Publication
- ERA Codebase
A convenient shortcut to processed data is also provided within a
local diet_data/
folder, which stores:
- feeds_nutrition.csv
- feeds_digestibility.csv
- bibliography.csv
- field_descriptions.csv
These files can be reused for your own analyses or as a starting point for further data integration projects.
Install and load packages and create directory for downloaded data.
# Install and load pacman if not already installed
if (!require("pacman", character.only = TRUE)) {
install.packages("pacman")
library(pacman)
}
# Load or install required packages
pacman::p_load(
data.table,
s3fs,
clipr,
miceadds,
httr,
readxl,
DT
)
# Set a directory for downloaded data
dl_dir <- "downloaded_data"
# Create the directory if it doesn't already exist
if(!dir.exists(dl_dir)){
dir.create(dl_dir, showWarnings = FALSE)
}
The ERA dataset is stored in an Amazon S3 bucket with open/anonymous access. Below, we demonstrate how to list files in this bucket, download the most recent version of the dataset, and load the relevant tables into R.
We are importing data from a 2022 project that extracted livestock data using an excel template called skinny_cow_2022. Publications were identified using the era search strings as described in Rosenstock et. al. (2024).
# Create an S3 filesystem handle
s3 <- s3fs::S3FileSystem$new(anonymous = TRUE)
era_s3 <- "s3://digital-atlas/era"
# List files in the s3 bucket
all_files <- s3$dir_ls(file.path(era_s3, "data"))
# Filter for the "skinny_cow_2022" RData file, selecting the most recent version
target_file <- tail(
grep(".RData", grep("skinny_cow_2022", all_files, value = TRUE), value = TRUE),
1
)
# Define a local path for saving the downloaded file
save_path <- file.path(getwd(), dl_dir, basename(target_file))
# Download the file if it does not exist already
if (!file.exists(save_path)) {
s3$file_download(target_file, save_path, overwrite = TRUE)
}
# Load the livestock data using the miceadds package
livestock_data <- miceadds::load.Rdata2(
file = basename(save_path),
path = dirname(save_path)
)
These tables are available in the dataset: Pub.Out, Site.Out, Soil.Out, ExpD.Out, Times.Out, Prod.Out, Var.Out, Chems.Code, Chems.Out, AF.Out, Animals.Out, Animals.Diet, Animals.Diet.Comp, Animals.Diet.Digest, Other.Out, MT.Out, Out.Out, Data.Out.
How to access information about these tables is discussed in Section 3.1.
The ERA project maintains a “controlled vocabulary” that describes the fields, tables, and terminology used in the dataset. Below, we download the relevant Excel file from GitHub and load these vocab tables.
The controlled vocabularly is available here.
# Define the URL and local path for the vocab file
era_vocab_url <- "https://github.com/peetmate/era_codes/raw/main/era_master_sheet.xlsx"
era_vocab_local <- file.path(getwd(), dl_dir, "era_master_sheet.xlsx")
# Download the file (set 'update=TRUE' to refresh the local copy)
update <- F
if (update) {
download.file(era_vocab_url, era_vocab_local, mode = "wb") # Write in binary mode
}
# Identify sheet names in the workbook (excluding placeholder sheets)
sheet_names <- readxl::excel_sheets(era_vocab_local)
sheet_names <- sheet_names[!grepl("sheet|Sheet", sheet_names)]
# Read each sheet into a list of tables
master_codes <- suppressWarnings(
suppressMessages(
sapply(sheet_names,
FUN = function(x) {
data.table(readxl::read_excel(era_vocab_local, sheet = x))
},
USE.NAMES = TRUE
)
)
)
master_codes$era_fields_v2
)field_descriptions <- data.table::copy(master_codes$era_fields_v2)
The master_codes$era_fields_v2
table contains
descriptions of field in the ERA data model and is used subsequent
sections to describe the contents of the tables we will focus on.
master_codes$AOM
)table_name<-"AOM"
aom<-copy(master_codes[[table_name]])
descrip_aom<-field_descriptions[Table==table_name,unique(Table_Description)]
fields_aom<-field_descriptions[Table==table_name,.(Field,Field_Description)]
NA
Fields in the AOM table:
DT::datatable(fields_aom, options = list(pageLength = 5))
master_codes$ani_diet
)table_name <- "ani_diet"
ani_diet <- copy(master_codes[[table_name]])
descrip_ani <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_ani <- field_descriptions[Table == table_name, .(Field, Field_Description)]
# Remove depreciated columns
depreciated <- fields_ani[grepl("Depreciated legacy column", Field_Description), Field]
ani_diet[, (depreciated) := NULL]
The ani_diet table of the era_master_sheet describes and harmonizes the names of animal diet ingredients. These harmonized names are used to replace free text entries from the extraction templates and are used to create entries in the AOM vocabularly for feed ingredients.
Fields in the animal diet harmonization (ani_diet
)
table:
DT::datatable(fields_ani, options = list(pageLength = 5))
The ERA dataset is divided into multiple tables. We focus on those describing feed nutritional composition, feed digestibility, and bibliographic references.
mastercodes$Animals.Diet.Comp
)We are primarily interested in feed ingredients’ nutritional
composition, stored in the Animals.Diet.Comp
table.
table_name <- "Animals.Diet.Comp"
diet_nutrition <- livestock_data[[table_name]]
descrip_nut <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_nut <- field_descriptions[Table == table_name, .(Field, Field_Description)]
This table contains information about the nutritional composition of entire animal diets, ingredient groups or individual ingredients.
Fields in the diet_nutrition (Animals.Diet.Comp
)
table:
DT::datatable(fields_nut, options = list(pageLength = 5))
We want data only for individual feed ingredients (not entire diets or composite items), and we only want measured values (not estimated).
# Remove records for entire diets
diet_nutrition <- diet_nutrition[is_entire_diet == FALSE]
diet_nutrition[, is_entire_diet := NULL]
# Remove records for composite diet items
diet_nutrition <- diet_nutrition[is_group == FALSE]
diet_nutrition[, is_group := NULL]
# Keep only measured (non-estimated) values
diet_nutrition <- diet_nutrition[DC.Method != "Estimated"]
diet_nutrition[, DC.Method := NULL]
# Drop unneeded columns; rename main diet item column
diet_nutrition <- diet_nutrition[, !c("D.Item", "D.Item.Root.Other.Comp.Proc_All", "D.Item.raw")]
setnames(diet_nutrition, "D.Item.Root.Comp.Proc_Major", "D.Item")
DT::datatable(diet_nutrition, options = list(pageLength = 5))
If you want to find out more information about variables see AOM
elements under:
Management/Livestock Management/Feed Characteristic/Feed Chemical Composition
aom_nut_vars<-aom[grepl("Feed Characteristic/Feed Chemical Composition/",Path),.(AOM,Edge_Value,Description)]
DT::datatable(aom_nut_vars[!is.na(Description)], options = list(pageLength = 5))
We can merge in taxonomic or classification info (scientific names, CPC codes, etc.) from the AOM vocabulary to enhance our feed ingredient details.
aom <- copy(master_codes[["AOM"]])
descrip_aom <- field_descriptions[Table == "AOM", unique(Table_Description)]
fields_aom <- field_descriptions[Table == "AOM", .(Field, Field_Description)]
# Merge AOM data into the diet_nutrition table
diet_nutrition <- merge(
diet_nutrition,
aom[, .(AOM, `Scientific Name`, NCBI, WFO, Feedipedia, CPC_Code_Product, CPC_Code_Component)],
by.x = "D.Item.AOM",
by.y = "AOM",
all.x = TRUE
)
DT::datatable(diet_nutrition[!grepl("No Match", D.Item.AOM) &
!is.na(D.Item) &
!is.na(`Scientific Name`),.(B.Code,D.Item.AOM,D.Item,`Scientific Name`,NCBI,WFO,Feedipedia)],
options = list(pageLength = 5))
Some items might have "No Match"
for
D.Item.AOM
—these are typically composite items or other
special cases.
master_codes$Animals.Diet.Digest
)Digestibility data are found in the Animals.Diet.Digest
table.
table_name <- "Animals.Diet.Digest"
diet_digest <- livestock_data[[table_name]]
descrip_digest <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_digest <- field_descriptions[Table == table_name, .(Field, Field_Description)]
This table contains information about the digestibility of entire animal diets, ingredient groups or individual ingredients.
Fields in the diet_digest (Animals.Diet.Digest
)
table:
DT::datatable(fields_digest, options = list(pageLength = 5))
Similar to nutrition, we remove entries for entire diets, composite items, and estimated values.
full_length <- nrow(diet_digest)
# Remove entire diets
diet_digest <- diet_digest[is_entire_diet == FALSE]
diet_digest[, is_entire_diet := NULL]
sub_length <- nrow(diet_digest)
# Remove composite items
diet_digest <- diet_digest[is_group == FALSE]
diet_digest[, is_group := NULL]
# Remove estimated values
diet_digest <- diet_digest[!grepl("Estimated", DD.Method)]
# Drop unneeded columns, rename for consistency
diet_digest <- diet_digest[, !c("D.Item", "D.Item.Root.Other.Comp.Proc_All", "D.Item.raw")]
setnames(diet_digest, "D.Item.Root.Comp.Proc_Major", "D.Item")
DT::datatable(diet_digest[!is.na(D.Item)], options = list(pageLength = 5))
Most digestibility records in ERA are for entire diets, accounting for 88.3 % of records.
If you want to find out more information about digestibility
measurement methods see AOM elements under:
Management/Livestock Management/Feed Characteristic/Feed Digestibility/Digestibility Measurement Method/
aom_dig_methods<-aom[grepl("Digestibility Measurement Method/",Path),.(AOM,Edge_Value,Description)]
DT::datatable(aom_dig_methods[!is.na(Description)], options = list(pageLength = 2))
If you want to find out more information about digestibility
variables see AOM elements under:
Management/Livestock Management/Feed Characteristic/Feed Digestibility/
aom_dig_vars<-aom[!grepl("Digestibility Measurement Method/",Path) & grepl("Feed Digestibility",Path),.(AOM,Edge_Value,Description)]
DT::datatable(aom_dig_vars[!is.na(Description)], options = list(pageLength = 2))
We merge again with the AOM data to get classification details.
diet_digest <- merge(
diet_digest,
aom[, .(AOM, `Scientific Name`, NCBI, WFO, Feedipedia, CPC_Code_Product, CPC_Code_Component)],
by.x = "D.Item.AOM",
by.y = "AOM",
all.x = TRUE
)
DT::datatable(diet_digest[!grepl("No Match", D.Item.AOM) &
!is.na(D.Item) &
!is.na(`Scientific Name`),.(B.Code,D.Item.AOM,D.Item,`Scientific Name`,NCBI,WFO,Feedipedia)],
options = list(pageLength = 5))
Here we present additional tables that do not contain information about feed items, but provide you with insights as to how animal diet experiment data is structured in ERA. If you are interested in this we also suggest you explore the data extraction template skinny_cow_2022.
master_codes$Animals.Out
)table_name <- "Animals.Out"
diet_overview <- livestock_data[[table_name]]
descrip_diet_overview <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_diet_overview <- field_descriptions[Table == table_name, .(Field, Field_Description)]
Each row of this table lists an unqiue animal diet fed in an experiment. In early versions of ERA each diet is annotated with experiemental practices that relate feed addition, substitution, processing. In early version the table also indicates the presence of practices such as improved manure management, improved pasture management, aquasilvaculture.
Fields in the diet_overview (Animals.Out
) table:
DT::datatable(fields_diet_overview, options = list(pageLength = 5))
Notes on experimental diets:
DT::datatable(diet_overview[!is.na(A.Notes),.(B.Code,A.Level.Name,A.Notes)], options = list(pageLength = 5))
master_codes$Animals.Diet
)table_name <- "Animals.Diet"
diet_ingredients <- livestock_data[[table_name]]
descrip_diet_ing <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_diet_ing <- field_descriptions[Table == table_name, .(Field, Field_Description)]
This table describes the amount of a diet or diet ingredient fed to animals and/or the proportions of ingredients that compose a diet.
You can explore the fields interactively:
DT::datatable(fields_diet_ing, options = list(pageLength = 5))
Notes on experimental diets:
DT::datatable(diet_ingredients[,.(B.Code,D.Item.AOM,D.Item.Root.Comp.Proc_Major,D.Amount,D.Unit.Amount,D.Unit.Time,D.Unit.Animals)], options = list(pageLength = 5))
Note: The diet ingredients table
contains: - The total amount of the entire diet (where
Diet.Item == "Entire Diet"
) - The amount and identity of
each ingredient within a diet - Composite items that are themselves made
of multiple ingredients
Although not strictly necessary for extracting feed item composition and digestibility, it provides insights into ERA’s structure of animal diets.
All tables in ERA contain a field B.Code
that identifies
the source publication. We can link to the Pub.Out
table to retrieve bibliographic info for these studies.
table_name <- "Pub.Out"
bibliography <- livestock_data[[table_name]]
descrip_bib <- field_descriptions[Table == table_name, unique(Table_Description)]
fields_bib <- field_descriptions[Table == table_name, .(Field, Field_Description)]
This table captures bibliographic information. Each publication is assigned a unique code in the B.Code field that acts a key field in all other extraction template tables.
Fields in the bibliography (Pub.Out
) table:
DT::datatable(fields_bib, options = list(pageLength = 5))
We subset the bibliographic data to only those entries appearing in our nutrition or digestibility tables and remove uneeded fields:
bibliography <- bibliography[B.Code %in% unique(c(diet_nutrition[, B.Code], diet_digest[, B.Code]))]
bibliography <- bibliography[,!c("B.Link1","B.Link2","B.Link3","B.Link4","B.New","B.Project")]
DT::datatable(bibliography, options = list(pageLength = 5))
To make repeated analyses easier, we will save our cleaned tables to
CSV files in a local diet_data/
folder. These can be loaded
quickly in future sessions.
save_dir <- "diet_data"
if(!dir.exists(save_dir)){
dir.create(save_dir)
}
# Save the nutritional composition and digestibility tables
fwrite(diet_nutrition, file.path(save_dir, "feeds_nutrition.csv"))
fwrite(diet_digest, file.path(save_dir, "feeds_digestibility.csv"))
# Save the bibliography subset
fwrite(bibliography, file.path(save_dir, "bibliography.csv"))
# Prepare a small field description subset for reference
fields_subset <- field_descriptions[
Table %in% c("Animals.Diet.Comp", "Animals.Diet.Digest", "Pub.Out"),
.(Table, Field, Field_Description)
]
# Add an alternate table name for clarity when reading this file
fields_subset[Table == "Animals.Diet.Comp", Table_Alt := "feeds_nutrition"]
fields_subset[Table == "Animals.Diet.Digest", Table_Alt := "feeds_digestibility"]
fields_subset[Table == "Pub.Out", Table_Alt := "bibliography"]
fwrite(fields_subset, file.path(save_dir, "field_descriptions.csv"))
This work is funded under the CGIAR Livestock & Climate Initiative.