Local based chemometric methods as a solution for the “big NIR data”
Near-infrared (NIR) spectroscopy has been widely used in many fields because of its speed, non-destructive approach, environmental friendliness and simplicity. NIR spectra are more and more included in the known “big data” world due to new and more accessible NIR technology.
New improvements include the use of portable instruments allowing collecting data out of the lab as well as imaging systems that allows collecting even larger quantity of data. The objective is to use such large NIR datasets in a faster and effective way and directly on-line through a web cloud of potential users. The challenge here is how to give a fast and precise prediction service using the cloud and protecting the raw data at the same time. At the CRA-W, in collaboration with several foreign institutes, different “local-based” chemometric tools applied to NIR data have been proposed to speed-up modeling and predicting processes (Local Partial Least Squares – LPLS - and Local Partial Least Squares using Scores – LPLS-S). These “local-based” approaches have been tested on real data sets and compared with the classical global PLS method. The studies concerned the quantification of characteristic quality parameters in corn seeds and the prediction of the total β-carotene content of cassava roots.
In all cases, these strategies showed to be an efficient alternative to optimize predictions, when compared to classical global models. The results showed that local approaches could solve the non-linearity problem and at the same time they have permitted a drastically reduction of the calculation time without losing prediction accuracy.
These methods permit, not only to obtain quantitative predictions with improved performance compared to classical regression methods, but also to extend the prediction to more than one product from a unique and large data set. This means that the spectral library can be multi-products, which can also drive to the development of unique predictions with consequent savings in time and effort required to develop and maintain individual calibration models. Last, but not least, the proposed methods work, not with the original NIR spectra but with a compressed data, allowing then a protection of the raw data.