Validation and verification of regression in small data sets

Martens, H. & Dardenne, P. (1998). Validation and verification of regression in small data sets. Chemom. Intell. Lab. Syst. 44: (1-2), 99-121.

Type	Journal Article
Year	1998
Title	Validation and verification of regression in small data sets
Journal	Chemom. Intell. Lab. Syst.
Label	U15-0454-Dardenne-1998
Recnumber	430
Volume	44
Issue	1-2
Pages	99-121
Date	14/12/1998
Endnote keywords	Pr-STRATFEED Do-Farines animales Do-Alimentation animale Th-Qualité, traçabilité et sécurité alimentaire
Previous label	Stratfeed-26
Endnote Keywords	Small data sets\|Multivariate modelling\|Monte Carlo\|Multivariate calibration\|PLS\|Regression\|Small sample statistics\|
Abstract	Four different methods of using small data sets in multivariate modelling are compared w.r.t. predictive precision in the long-run. The modelling in this case concerns multivariate calibration: =f(X). The study consists of a Monte Carlo simulation within a large data base of real data; X=NIR reflectance spectra and y=protein percentage, measured in 922 whole maize plant samples. Small data sets (40?120 objects) were repeatedly selected at random from the data base, each time simulating the situation of having only a small set of samples available for estimating, optimizing and assessing the calibration model. The ?true' apparent prediction error was each time controlled in the remaining data base. This was replicated 100 times in order to study the statistical performance of the four different validation methods. In each Monte Carlo replicate, the splitting of the available data set into calibration set and test set was compared to full cross validation. The results demonstrated that removing samples from an already limited set of available samples to an independent VALIDATION TEST SET seriously reduced the predictive performance of the calibrated models, and at the same time gave uncertain, systematically over-optimistic assessment of the models' predictive performance. Full CROSS VALIDATION gave improved predictive performance, and gave only slightly over-optimistic assessment of this predictive performance. Further removal of even more of the available samples for use in an independent VERIFICATION TEST SET gave in-the-long-run correct, although uncertain estimates of the predictive performance of the calibrated models, but this performance level had seriously deteriorated. Alternative verification of the model's predictive performance by the method of CROSS VERIFICATION gave results very similar to those of the cross validation. These results from real data correspond closely to previous findings for artificially simulated data. It appears that full cross validation is superior to both the use of independent validation test set and independent verification test set.
Author address	Dardenne Pierre, Quality Department of Agro-food Products, Walloon Agricultural Research Centre (CRA-W), Chaussée de Namur, 24, B-5030 Gembloux, dardenne@cra.wallonie.be
Fichier	Validation and verification of regression in small data sets
Lien	http://dx.doi.org/10.1016/S0169-7439(98)00167-1
Authors	Martens, H., Dardenne, P.

Read also

BASE

BASE publishes original papers in the fields of life sciences: environmental science and technology, forest and natural space management, agronomical science, and chemistry and bio-industries