This vignette will help us explore the data available for legumes. It will cover the following:
- Downloading the data
- Sub-setting the data to agronomy and legumes
- Exploring geographic locations of studies
- Exploring crop types (products) in the data
- Exploring agronomic practices used in the studies
- Exploring the outcomes reported
This section retrieves the most recent version of the ERA.Compiled comparisons table from the ERA S3 bucket, saves it locally, and loads it for use.
# Set S3 path and initialize
s3 <- s3fs::S3FileSystem$new(anonymous = TRUE)
era_s3 <- "s3://digital-atlas/era"
bundle_dir <- file.path(era_s3, "data", "packaged")
# Get the latest bundle
all_files <- s3$dir_ls(bundle_dir)
latest_bundle <- tail(sort(grep("era_agronomy_bundle.*\\.tar\\.gz$", all_files, value = TRUE)), 1)
# Define download and extraction paths
dl_dir <- "downloaded_data"
dir.create(dl_dir, showWarnings = FALSE)
bundle_local <- file.path(dl_dir, basename(latest_bundle))
extract_dir <- file.path(dl_dir, tools::file_path_sans_ext(tools::file_path_sans_ext(basename(latest_bundle))))
# Download and extract
if (!file.exists(bundle_local)) {
s3$file_download(latest_bundle, bundle_local, overwrite = TRUE)
}
if (!dir.exists(extract_dir)) {
dir.create(extract_dir)
utils::untar(bundle_local, exdir = extract_dir)
}
# Locate files
json_agronomic <- list.files(extract_dir, pattern = "^agronomic_.*\\.json$", full.names = TRUE)
json_master <- list.files(extract_dir, pattern = "^era_master_codes.*\\.json$", full.names = TRUE)
parquet_file <- list.files(extract_dir, pattern = "^era_compiled.*\\.parquet$", full.names = TRUE)
# Load into variables
ERA_Compiled <- arrow::read_parquet(parquet_file)
ERA has a diverse range of practices, from agronomy to livestock and a few papers on postharvest storage. We will therefore focus on Agronomy and legumes for this User Guide. If you are interest in any other crops, subset them in the Products tab. For livestock practices, please explore our Livestock User Guide.
# Define the list of products you want
products_of_interest <- c(
"Cowpea", "Soybean", "Lablab", "Kersting's groundnut",
"Sesame", "Bambara Nut", "Groundnut", "Pigeon Pea"
)
# Subset the data
ERA_Compiled_subset <- ERA_Compiled[Product.Simple %in% products_of_interest]
# Optional: check the result
DT::datatable(
ERA_Compiled_subset,
options = list(
scrollY = "400px",
scrollX = TRUE,
pageLength = 20,
fixedHeader = FALSE
)
)