Author ORCID Identifier

https://orcid.org/0009-0001-6372-5071

Date Available

8-1-2026

Year of Publication

2025

Document Type

Master's Thesis

Degree Name

Master of Science in Biosystems and Agricultural Engineering (MSBiosyAgE)

College

Agriculture; Engineering

Department/School/Program

Biosystems and Agricultural Engineering

Faculty

Dr. Akinbode Adedeji

Faculty

Dr. Michael Sama

Abstract

Cross-contamination of food with gluten-rich grains such as wheat, barley, and rye remains a significant challenge in gluten-free food production, posing health risks to individuals with gluten-related disorders. Traditional detection methods are often time-consuming and labor-intensive, highlighting the need for rapid and reliable alternatives. This study investigates the use of hyperspectral imaging (HSI) combined with machine learning (ML) and deep learning (DL) techniques for the detection and quantification of gluten contamination in grain-based foods through three main approaches: (i) classification of gluten source contaminants in gluten-free flour, (ii) quantification of gluten content from wheat, rye, and barley flours, and (iii) reconstruction of hyperspectral data from RGB images to enhance accessibility and consumer-level application.

Corn flour (CF) was contaminated with three gluten sources which are wheat flour (WF), rye flour (RF), and barley flour (BF) across varying contamination levels (CL): 0-2.4% (in 0.1% increments) for low-level contamination and 2.5-10% (in 0.5% increments) for high-level contamination. Spectral data were acquired using a visible–near-infrared HSI system (400–1000 nm), and RGB images were captured using a Samsung Galaxy Tab S9 Ultra. Preprocessing techniques including Standard Normal Variate (SNV), Savitzky-Golay (SG), Multiplicative Scatter Correction (MSC), First Derivative (FD), and Second Derivative (SD) were applied to enhance spectral data quality. Feature selection methods were employed to reduce dimensionality and identify the most informative wavelengths for the detection and quantification tasks.

In the classification task, random forest (RF), linear discriminant analysis (LDA), and K-nearest neighbors (KNN) models were developed to detect the presence and source of gluten contamination. RF with FD achieved 98.2% accuracy at low contamination levels, while LDA with SG preprocessing reached 98.5% accuracy across the full contamination range for binary classification. For multiclass classification, RF with FD preprocessing achieved 97.8% accuracy, and RF with SG achieved 98.5% accuracy in differentiating among the three gluten sources. Regression models, including partial least squares regression (PLSR), adaptive boosting (ADA), and RF regression, were developed for the quantification purpose for each gluten source. The best results for rye contamination were achieved using PLSR with SG preprocessing (R²P = 0.98, RMSEP = 0.07 at low contamination) and RF with SNV for the full contamination range (R²P = 0.99, RMSEP = 0.10). For wheat, SNV with PLSR gave the best performance at low contamination (R²P = 0.94, RMSEP = 0.16), while PLSR with SG was optimal across the full range (R²P = 0.99, RMSEP = 0.22). For barley, SNV with PLSR was best at low contamination (R²P = 0.88, RMSEP = 0.29), and SG with PLSR performed best at higher levels (R²p = 0.98, RMSEP = 0.40). To enhance accessibility, a deep learning-based pipeline was developed to reconstruct HSI data from RGB images. A Hierarchical Regression Network (HRNET) and enhanced deep super resolution (EDSR) were trained using RGB inputs and ground-truth hyperspectral cubes as labels. The HRNET model achieved an MRAE of 0.152, an RMSE of 0.0307, and a PSNR of 28.06, while EDSR achieved an MRAE of 0.204, an RMSE of 0.205, and a PSNR of 30.10 for detection. For quantification, HRNET achieved an MRAE of 0.332, an RMSE of 0.062, and a PSNR of 24.25, while EDSR achieved an MRAE of 0.342, an RMSE of 0.050, and a PSNR of 26.19. ML models trained on the reconstructed data showed good performance, particularly for gluten detection. In the spectra reconstructed by HRNET, the random forest (RF) model outperformed K-nearest neighbors (KNN) in detection, achieving 94.6% accuracy and recall on the test set. Similarly, in spectra reconstructed by EDSR, RF outperformed KNN with a testing accuracy and recall of 93.1%. For quantification, RF regression achieved an R²P of 0.49 and an RMSEP of 2.09 from HRNET spectra, and an R²p of 0.45 and an RMSEP of 2.16 from EDSR spectra.

This study demonstrates the effectiveness of combining HSI, ML, and DL for gluten detection and quantification in industrial and consumer-level applications. Reconstructing HSI data from RGB images enables non-destructive, real-time gluten analysis using portable devices, supporting food safety and regulatory compliance in the gluten-free production line.

KEYWORDS: Gluten, Machine Learning, Deep Learning, Hyperspectral Reconstruction, Non-Destructive Methods, Feature Selection

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2025.374

Funding Information

Funds Awarded to research lab:

1. UKY Provost Institutional Multidisciplinary Paradigm to Accelerate Collaboration and Transformation (IMPACT) Grant

2.United States Department of Agriculture's National Institute of Food and Agriculture Hatch and Multistate Grants.

Available for download on Saturday, August 01, 2026

Share

COinS