3  Database access

The OSSL is distributed through Google Cloud Storage bucket, MongoDB (deprecated), and API (experimental).

Google Cloud storage bucket hosts static files in two formats: compressed csv (.csv.gz) and qs (from qs R package). The csv.gz is intended to work across different platforms, while qs is the preferred format for being used within R.

MongoDB (deprecated) and the API can be used to construct requests to fetch data with specific filters (e.g. region, dataset source, etc.), differently from the static files where you need to download the whole database.

Tip

In the OSSL, a original dataset may share a common id across the VisNIR and MIR range. Some ids, however, have only one range represented (either VisNIR or MIR). OSSL is a tabular database keeping a soil sample with at least one spectral range. A filter operation must be run before using the database to remove observations (rows) with missing spectra for a desired range.

3.1 Google Cloud Storage

The datasets in the public bucket can be updated without notice. You can both run the link on a browser (or just click on it) to download the files, or provide the URLs in a programming language to automatically fetch them.

Use the following URLs to access the whole database levels:

Compressed csv
https://storage.googleapis.com/soilspec4gg-public/ossl_all_L0_v1.2.csv.gz
https://storage.googleapis.com/soilspec4gg-public/ossl_all_L1_v1.2.csv.gz

qs format (preferred in R)
https://storage.googleapis.com/soilspec4gg-public/ossl_all_L0_v1.2.qs
https://storage.googleapis.com/soilspec4gg-public/ossl_all_L1_v1.2.qs

Use these alternative URLs to access the OSSL as separate files:

Compressed csv
https://storage.googleapis.com/soilspec4gg-public/ossl_soilsite_L0_v1.2.csv.gz
https://storage.googleapis.com/soilspec4gg-public/ossl_soillab_L0_v1.2.csv.gz
https://storage.googleapis.com/soilspec4gg-public/ossl_soillab_L1_v1.2.csv.gz
https://storage.googleapis.com/soilspec4gg-public/ossl_mir_L0_v1.2.csv.gz
https://storage.googleapis.com/soilspec4gg-public/ossl_visnir_L0_v1.2.csv.gz

qs format (preferred in R)
https://storage.googleapis.com/soilspec4gg-public/ossl_soilsite_L0_v1.2.qs
https://storage.googleapis.com/soilspec4gg-public/ossl_soillab_L0_v1.2.qs
https://storage.googleapis.com/soilspec4gg-public/ossl_soillab_L1_v1.2.qs
https://storage.googleapis.com/soilspec4gg-public/ossl_mir_L0_v1.2.qs
https://storage.googleapis.com/soilspec4gg-public/ossl_visnir_L0_v1.2.qs

Example with R. Use dataset.code_ascii_txt and id.layer_uuid_txt. as joining columns:

## Packages
library("tidyverse")
library("curl")
library("qs") # >=0.25.5

## Separate files
soil <-  "https://storage.googleapis.com/soilspec4gg-public/ossl_soillab_L1_v1.2.qs"
soil <- qread_url(soil)

mir <- "https://storage.googleapis.com/soilspec4gg-public/ossl_mir_L0_v1.2.qs"
mir <- qread_url(mir)

## Join
ossl <- left_join(mir, soil, by = c("dataset.code_ascii_txt", "id.layer_uuid_txt"))

3.2 Python package

SoilSpecData is a Python package for handling soil spectroscopy data, with a focus on the Open Soil Spectral Library (OSSL).

Please follow the instructions available on the official package website.