1 Soil spectroscopy
1.1 What is soil spectroscopy
Soil spectroscopy refers to the measurement of light interaction with soil. Various methods exist to measure this interaction with the goal of understanding the mineral and organic composition of soils. The most common technique is to measure the absorption when a non-collimated flux of light from the visible and infrared region of the electromagnetic spectrum is applied to a soil surface. The proportion of the incident radiation reflected by the soil compared to a reference material is measured, forming the basis of diffuse reflectance spectroscopy. Distinct characteristic spectra can then be used to estimate various soil properties, including particle size distribution, minerals, and organic compounds. The OSSL is composed of diffuse reflectance spectra collected across the visible (Vis), near- (NIR), and mid-infrared (MIR) regions.
Visible and near-infrared (VisNIR) is typically represented between 350–2500 nm, while the mid-infrared (MIR) ranges between 2500–25000 nm (formally described as 4000–400 cm-1). In the OSSL, we have also imported a specific near-infrared library built with the Neospectra scanner (Si-Ware), with its range covering the 1350-2550 nm region.

With a collection of soil spectra and corresponding soil properties measured by conventional methods, one can fit predictive models to quantitatively estimate various soil properties on new scanned soil samples. This method is cost-effective, faster, does not produce chemical residues, and is more scalable than traditional soil analysis techniques. A soil spectral library, such as the OSSL, provides the training data for building and testing predictive models, which are typically developed using machine learning or chemometric algorithms. The effectiveness of these predictive models depends significantly on how well the library represents the variability and diversity of both current and new soil samples intended for prediction.
1.2 Estimating soil properties
Given that each soil spectral library that was imported into the OSSL used distinct procedures for analytically determining reference values, the incompatibility has been a subject of internal discussion in this project. Some global initiatives have been facing this same issue in their soil databases but there still no clear or full consensus on how to harmonize those different methods. This has been a topic of great discussion and research development at the Global Soil Partnership’s Global Soil Laboratory Network (GLOSOLAN).
In order to maximize transparency, for now, we have decided to produce two different levels for the OSSL database. Level 0 takes into account the original methods employed in each dataset but tries to initially fit them to two reference lists: KSSL Guidance – Laboratory Methods and Manuals and ISO standards. A copy of the KSSL procedures and coding scheme is archived in ossl-imports.
If a reference method does not fall in any previous method, then we create a new variable sharing at least a common property and unit. A final harmonization takes place in the OSSL Level 1, where those common properties sharing different methods are converted to a target method using some publicly available transformation rule, or in the worst scenario, they are naively binded or kept separated to produce its specific model. All the implementations are documented in the ossl-import/ossl_level0_to_level1_soillab_harmonization.csv repository.
In addition, GLOSOLAN’s Standard Operating Procedures (SOPs) list four groups of soil variables of interest to international soil spectroscopy projects:
Soil chemical variables:
- pH,
- Carbon,
- Phosphorous,
- Potassium,
- Nitrogen,
- Exchangeable cations and CEC,
- Extractable microelements,
- Trace and major element analyses,
- Gypsum,
- Electrical conductivity and total soluble salt content,
- Soluble sulfate and chloride analysis,
- Special analysis for peats, mineral and organic soils, agriculture and forest.
Soil physical variables:
- Bulk density,
- Coarse fragments,
- Particle-size distribution,
- Water retention curve,
- Porosity,
- Hydraulic conductivity function,
- Aggregate stability,
- Moisture content,
Soil biological variables:
- Microbial biomass,
- Soil Respiration,
- Enzyme activity,
- Microbial identification,
Soil contaminants:
- Heavy metal elements: As, Hg, Cu, Cd, Pb and similar,
- Other soil pollutants,
1.3 Predictive modeling
From initial findings with the OSSL [1], on average, the MIR range appears to be the best spectral region for developing spectral prediction models, followed by VisNIR and NIR (Neospectra). This happens because the MIR contains several fundamental and resolved absorption features from mineral and organic functional groups that translate to better prediction capacity, despite challenges in the interpretation that stem from chemical heterogeneity. VisNIR and NIR spectra, in turn, are made of overtones from the fundamental vibrations of the MIR range, hence, are less sensitive to soil constituents and may result in inferior performance. In addition, we found good performance for some soil properties that may not directly affect soil spectra but can be indirectly inferred and quantified (secondary properties), such as cation exchange capacity, pH, soil contaminants, etc. However, understanding both primary and secondary components in the soil helps to better understand the factors that contribute to the improvement of spectral predictive models within the complex context of soil systems and also to select the spectral range, where they are most pronounced.
We recommend visiting a dedicated tutorial where we present and walk through the modeling framework employed with the OSSL. We recommend checking the introduction section to understand why soil spectroscopy is a fit-for-purpose technology, why good predictions flow only from good data, and the best practices for model calibration.
The OSSL paper [1] also describes and validates in detail the OSSL models.
1.4 Instruments
A Global Soil Spectroscopy Assessment was published in 2021 by FAO [2], where a number of instruments were identified in soil spectroscopy laboratories. Within the SS4GG initiative, we published a paper on the spectral dissimilarity of 20 MIR instruments [3] providing clear recommendations to overcome this problem. A second analysis of VisNIR instruments is being developed in collaboration with the IEEE P4005 working group.

1.5 Software
There are many proprietary and open source software used to process spectral measurements. We provide below a list (not exhaustive) of the most common packages/modules that can be used in R or Python (both FOSS) for importing, preprocessing, and analyzing soil spectral data:
1.5.1 SoilSpecData
- 📛 Name: SoilSpecData
- 💼 Specialty: A Python package for handling soil spectroscopy data, with a focus on the Open Soil Spectral Library (OSSL) in Python.
- 💻 Programming language: Python
- 🔗 Homepage: Website, GitHub
- 📕 Albinet, F. SoilSpecData: A Python package for handling soil spectroscopy data, with a focus on the Open Soil Spectral Library (OSSL). Python package. Available in: https://fr.anckalbi.net/soilspecdata/.
- ©️ License: Apache-2.0 license
- 📧 Maintainer: Franck Albinet
1.5.2 SoilSpecTfm
- 📛 Name: SoilSpecTfm
- 💼 Specialty: A Python package for handling soil spectroscopy data, with a focus on the Open Soil Spectral Library (OSSL) in Python.
- 💻 Programming language: Python
- 🔗 Homepage: Website, GitHub
- 📕 Albinet, F. SoilSpecTfm: Provides Scikit-Learn compatible transforms for spectroscopic data preprocessing. Python package. Available in: https://github.com/franckalbinet/soilspectfm.
- ©️ License: Apache-2.0 license
- 📧 Maintainer: Franck Albinet
1.5.3 opusreader2
- 📛 Name: opusreader2
- 💼 Specialty: Read OPUS binary files from Fourier-Transform Infrared (FT-IR) spectrometers of the company Bruker Optics GmbH & Co. in R
- 💻 Programming language: R
- 🔗 Homepage: GitHub
- 📕 Baumann P, Knecht T, Roudier P (2023). opusreader2: Read spectroscopic data from Bruker OPUS binary Files. R package version 0.6.3.
- ©️ License: MIT
- 📧 Maintainer: spectral-cockpit
1.5.4 asdreader
- 📛 Name: asdreader
- 💼 Specialty: Reading ASD binary files in R
- 💻 Programming language: R
- 🔗 Homepage: GitHub
- 📕 Roudier, P. (2020). asdreader: reading ASD binary files in R. R package version 0.1-3 CRAN.
- ©️ License: GPL
- 📧 Maintainer: Pierre Roudier
1.5.5 prospectr
- 📛 Name: prospectr
- 💼 Specialty: Signal processing, resampling
- 💻 Programming language: R
- 🔗 Homepage: GitHub
- 📕 Stevens, A., & Ramirez-Lopez, L. (2022). An introduction to the prospectr package. R Package Vignette. R Package Version 0.2.6.
- ©️ License: MIT + file LICENSE
- 📧 Maintainer: Leornardo Ramirez-Lopez
1.5.6 simplerspec
- 📛 Name: simplerspec
- 💼 Specialty: Soil and plant spectroscopic model building and prediction
- 💻 Programming language: R
- 🔗 Homepage: GitHub
- 📕 Baumann, P. (2020). simplerspec: Soil and plant spectroscopic model building and prediction. Packages R CRAN.
- ©️ License: GNU General Public License v3.0
- 📧 Maintainer: Philipp Baumann
1.5.7 resemble
- 📛 Name: resemble
- 💼 Specialty: Memory-based learning in spectral chemometrics
- 💻 Programming language: R
- 🔗 Homepage: GitHub
- 📕 Ramirez-Lopez, L., and Stevens, A., and Viscarra Rossel, R., and Lobsey, C., and Wadoux, A., and Breure, T. (2022). resemble: Regression and similarity evaluation for memory-based learning in spectral chemometrics. R package Vignette. R package version 2.2.1.
- ©️ License: MIT + file LICENSE
- 📧 Maintainer: Leonardo Ramirez Lopez
1.5.8 mdatools
- 📛 Name: mdatools
- 💼 Specialty: A package for preprocessing, exploring and analysis of multivariate data. The package provides methods mostly common for Chemometrics.
- 💻 Programming language: R
- 🔗 Homepage: Website, GitHub
- 📕 Kucheryavskiy, S. (2020). mdatools – R package for chemometrics. Chemometrics and Intelligent Laboratory Systems: An International Journal Sponsored by the Chemometrics Society, 198(103937), 103937. doi:10.1016/j.chemolab.2020.103937.
- ©️ License: MIT
- 📧 Maintainer: Sergey Kucheryavskiy