Today I'm going to talk about how to go about developing a chemometric model based on ground samples. We make certain spectral considerations in this process, namely that there are multiple components in a sample and many physical phenomenon creating the spectra.
NIR is a secondary method of sample analysis, meaning that you must have a reference for the spectrum produced by NIR. The reference method must be well controlled with the lowest possible error and must be referenced to a high quality assay. To do this, we test assays with blind samples to document the Standard Error of Laboratory (SEL) and perturb reference data with added noise to understand the effect on the results. Then, we must solve spectral collection issues such as the need to validate the ASD systems with wavelength standards under controlled conditions (temperature, moisture, particle size, etc.) using the same sample presentation methods.
The number of samples needed for test and calibration varies, but we typically start with 20% of the samples for this phase. For a feasibility study we like to see a sample population of 60-90 and for a starting model a population of 120-180. For a production model the sample number is typically greater than 180. Calibration and validation sets must contain the diversity, both spectral and compositional, that we would expect to encounter in routine samples. Samples can be calibrated and validated as long as the range of composition is provided, the samples are scanned in the same form and the samples are composed of natural blends.
A test set needs to be representative (not collected later), it should use the same range of composition and is should be collected in a simple way. Then, the samples are ranked and every 5th sample is withheld for further testing.
Commercial chemometric programs can be used to create multivariate models. The key to creation of a good multivariate model is to understand the statistics, especially Standard Error of Cross Validation (SECV), as R-squared can be misleading. It’s possible to use R-squared in this process, but SECV is more accurate. In a probationary use of the model, we monitor early results and cross check with reference methods to validate, but we don’t use a different lab or lab technique as such a change could introduce error in the model.
After the probationary period, we review the model, watching for high residuals or samples at the extreme limits of compositions. If you change labs, you need to re-validate. After the probationary period we define appropriate criteria for revision of the model (e.g., a time or validate error criterion).
While multivariate models can be quantitative, they require careful calibration and test sets and proper monitoring to ensure the model is well maintained. Good NIR models do work; they just require a little more effort on our part to succeed.