Our paper to Nature Scientific Data has been accepted! This work is in collaboration with the Universidad Industrial de Santander in Colombia.

Abstract

Assessing cocoa bean quality using spectral information offers a noninvasive and objective alternative to traditional, often subjective and destructive, methods. However, progress has been limited by the lack of comprehensive datasets across multiple spectral resolutions. This work presents a new dataset capturing the spectral properties of cocoa beans at different spatiospectral resolutions, enabling non-invasive quality assessment and scalable evaluation methodologies. It comprises 19 scenes acquired with four imaging devices under both open (invasive) and closed (non-invasive) conditions, along with corresponding physicochemical measurements. Data collection follows the Colombian standard NTC 1252:2021, which labels beans as well, partially, or poorly fermented. Global physicochemical properties - moisture, polyphenols, and cadmium - were measured using gravimetric analysis, UV-visible spectroscopy, and atomic absorption spectroscopy with microwave digestion. Hyperspectral images were obtained using four devices covering up to the 350–1000 nm spectral range. Statistical analysis shows the dataset distinguishes between cocoa quality levels under both open and closed conditions, supporting the development of automated classification methods.


Acknowledgements

The authors thank Dr. Bernard Schmitt from the Institut de Planétologie et Astrophisique de Grenoble (IPAG) for providing the Specim IQ camera. Data acquisitions were carried out with the support of the Multi-camera Imaging Research and Acquisition (MIRA) Platform of GIPSA-lab.

This work has been partially supported by the ECOS Nord project n. C24M01, by the French National Research Agency (ANR) under grants ANR-15-IDEX-02, ANR-20-ASTR-0006 and ANR-23-IACL-0006 and by the Institut Carnot Logiciels et Systèmes Intelligents and LabEx PERSYVAL.

In addition, gratitude is due to cocoa field expert Miguel Beltran for his expertise, which contributed to the contextualization and documentation of cocoa quality parameters in this dataset. Additional thanks are given to Juan Daniel Suarez Jaimes and Juan Sebastian Espinosa Espinosa for part of the labeled processes, and the laboratory of Optics of High Dimensional Signal Processing group (HDSP) for the acquisition process.