Fibrous plant biomasses have an important potential as a source of renewable fuels and chemicals because of their great availability and sustainability. They represent therefore an important biomass resource for a bio-based economy [1,2]. Fibrous plant biomasses need to be analyzed to estimate reliably the available amounts and quality of their chemical components, and to assess their suitability to be converted into biofuels and to optimize these conversion processes [1,2]. Currently, standard wet chemical methods are used to determine the chemical characteristics of plant biomasses. These methods are reliable but they are also often tedious, resource and time consuming, expensive and/or use hazardous chemicals [3,4]. Near infrared (NIR) spectroscopy is a simple, fast, cheap, clean, non-destructive and reliable alternative method to wet chemical methods. NIR spectroscopy is widely used for the quantitative and qualitative analysis of pharmaceutical, food, feed and plant products [3,4]. NIR predictions are generally built based on linear multivariate models such as the partial least square (PLS) regression. To have realistic prediction performances in terms of prediction error, these models request a large number of samples which are representative of the whole population variability to cover its spectral space. If a better accuracy of such a prediction model with a large number of samples is wanted then the large dataset (eg. multiproduct) has to be split into small specific datasets (eg. per type of product or species). This procedure will reduce the non-linearity present in a large dataset . To solve the issue of splitting a dataset into small specific datasets, the local (specific regression and non-linear) method can be used, for example the local method of Shenk et al. . This local method builds a specific PLS regression with a low number of samples for each sample by selecting its most similar spectral neighbors from the library based on the highest correlation between the spectra. The local method enables on a large dataset (eg. multiproduct) to be accurate in terms of prediction error and to have realistic prediction performances because of the specific regressions built for each sample. It enables to cope with non-linearity and non-homogeneity of a large dataset [4,5,6].
The aim of this study is to compare the reliability of local (specific regression and non-linear) and PLS (linear regression) NIR models to predict the main chemical characteristics of fibrous plant biomasses using multispecies datasets.
2 Material and methods
The analyzed biomasses consist of fiber corn, fiber sorghum, grasses (tall fescue, cocksfoot, immature rye, immature spelt), hemp, Jerusalem artichoke leaves and stalks, miscanthus giganteus, spelt straw and switchgrass coming from trials which were performed at different sites and with different harvest periods, cultivars and/or nitrogen fertilization levels. The harvested samples were dried at 60°C for 72 h and milled to pass through a 1 mm screen.
The analyzed chemical characteristics on the assessed biomasses were the NDF (neutral detergent fiber residue; n=1169; median=66.77; median SD=18.04; unit=g 100g-1 DM), ADF (acid detergent fiber residue; n=1167; median=42.31; median SD=15.08; SEL=0.30; unit=g 100g-1 DM), ADL (acid detergent lignin; n=1167; median=6.54; median SD=3.91; SEL=0.15; unit=g 100g-1 DM) and the mineral compounds (n=1377; median=6.53; median SD=3.38; SEL=0.10; unit=g 100g-1 DM). NDF, ADF and ADL were determined by the Van Soest (VS) method . These residues can be used to estimate the cellulose VS (ADF-ADL), hemicelluloses VS (NDF-ADF) and lignin VS (ADL) contents of plant biomasses. The mineral compounds content was determined by use of a muffle furnace set at 550°C for 3 h. The dry matter (DM) content was determined at 103°C for 4 h.
The near infrared (NIR) reflectance spectrums were taken by a NIRSystems 5000 (FOSS, Hillerød, Denmark). Each spectrum was collected in the range of 1100-2498 nm and was the average of 32 scans. The spectrums were normalized by a standard normal variate (SNV) transformation followed by a first order derivation (1, 4, 4, 1; 1st derivative, 4 nm gap, 4 points of first smoothing, 1 point of second smoothing).
The local (specific regression and non-linear procedure; ) and partial least square (PLS) (Modified-PLS algorithm; linear regression model) techniques were used to develop prediction models.
The prediction performances of the models were evaluated using a highly independent validation (V) datasets in addition to a leave-one-out full cross-validation (CV-LOO). The cross-validation CV-LOO is designed to assess the prediction error for the samples of the library. Validation V1 is designed to assess the prediction error of future new samples of plant species contained in the library. Validation V2 is designed to assess the prediction error of future new samples of plant species not contained in the library but similar to the plant species contained in the library.
To evaluate the prediction performances of the models, the following parameters were determined: the coefficient of determination of prediction based on medians (R2Med); the median standard residual error of prediction (MedRE); the ratio of the median standard deviation (SDMed) of the variable to the MedRE (RPDMed); the median spectral distance of Mahalanobis (GHMed). These parameters were calculated based on medians to be robust and to avoid deleting subjectively outlier samples (which have high residual values).
3 Results and discussion
The developed NIR models were reliable for the prediction of different main chemical characteristics of various fibrous plant species using multispecies datasets (Table 1).
The local models were more reliable in terms of prediction error compared to the PLS models (Table 1) because the local method can cope with the non-linearity and non-homogeneity of a large multispecies set.
The degree of independence of the validation set in regards to the calibration set had a major impact on the prediction performances, especially for the local method (Table 1). It affected more the local method because of the lower number of samples used in its specific regressions. There was a decrease of the reliability of local and PLS models according to the increase of the degree of independence of the validation set (ie. the similarity of the predicted samples in regard to the calibration samples) (Table 1). The degree of independence of the validation set is the lowest for the cross-validation CV-LOO, increases for validation V1 and is the highest for validation V2. The estimation of the accuracy by cross-validation CV-LOO is too optimistic (overestimated) compared to the more usual and independent validation set such as validation V1, especially for the local method (Table 1). An independent validation such as V1 should be used to determine the prediction performances of NIR models for agricultural products. The validation V1 set is made of independent and representative samples which do not come from the same cropping site, year, or harvest period in regard to the samples of the calibration set. Due to their degree of independence, the prediction performance of a validation such as validation V1 (calibration containing samples of the predicted plant species group) should be considered for future new samples of plant species contained in the library, whereas a validation such as validation V2 (calibration not containing samples of the predicted plant species group) should be considered for future new samples of plant species not contained in the library but similar to the plant species contained in the library.
The additions of a few independent samples of the predicted plant species group to their calibration set of validation V2 (calibration not containing samples of the predicted plant species group) enable to improve prediction performances of multispecies models, especially for the local method (Table 2). However, these performances begin to stabilize with the last sample additions (20 and 25 samples) (Table 2).
The type of NIR models developed in the present study, especially the local method, can be used for screening, ranking and quantitative analyses of the main chemical components contents in fibrous biomasses of the library, and for the assessment of their suitability to be converted into biofuels. Furthermore, the use of the local method is also interesting for predictions of a given plant species when there are only a few samples of them which are present in a large multispecies dataset of similar plant species samples. This enables to pursue fast cost-effective NIR screening, ranking and quantitative analyses of the main chemical characteristics of new plant biomasses which are similar to those of the library.
 Kamm, B. & Kamm, M. Principles of biorefineries. Appl. Microbiol. Biotechnol. 64, 137-145, 2004.
 Godin, B., Lamaudière, S., Agneessens, R., Schmit, T., Goffart, J.-P., Stilmants, D., Gerin P. & Delcarte, J. Chemical composition and biofuel potentials of a wide diversity of plant biomasses. Energy and Fuels 27, 2588-2598, 2013.
 Bertrand D. & Dufour, E: La spectroscopie infrarouge et ses applications analytiques. 2ième édition. Lavoisier, Paris, France, 2006.
 Godin, B., Mayer F., Agneessens, R., Gerin, P., Dardenne, P., Delfosse, P. & Delcarte, J. Biochemical methane potential prediction of plant biomasses: comparing chemical composition versus near infrared methods and linear versus non-linear models. Bioresour. Technol. doi: 10.1016/j.biortech.2014.10.115, 2014.
 Berzaghi, P., Shenk J. & Westerhaus, M. Local prediction with near infrared multi-product databases. J. Near Infrared Spectrosc. 8, 1-9, 2000.
 Shenk, J., Westerhaus, M. & Berzaghi, P. Investigation of a local calibration procedure for near infrared instruments. J. Near Infrared Spectrosc. 5, 223-232, 1997.