Multispectral imaging (MSI) combined with machine learning has quietly become a powerhouse for conservators trying to untangle the visible surface from hidden histories, and the detection of overpaint and historical retouches is a perfect use case. At its simplest, MSI captures how materials reflect and fluoresce across multiple wavelengths—ultraviolet, visible, near‑infrared, and sometimes shortwave infrared—while machine learning learns the patterns that distinguish original pigment layers, varnishes, and later interventions. Together they form a non‑invasive, full‑field diagnostic pipeline that can flag suspicious areas for targeted microsampling, prioritize conservation interventions, and reduce needless invasive testing. The key is building a reproducible workflow that starts with disciplined capture and calibration, feeds clean, aligned multispectral stacks into feature pipelines, and uses robust training strategies so the models generalize across artists, mediums, and studios. This article lays out an end‑to‑end approach: how to choose modalities, acquire and calibrate data, preprocess and engineer features, select and train machine learning models, validate outputs with real world sampling, and integrate results into conservation decision‑making. The goal is practical: give conservators and imaging specialists a usable roadmap to detect overpaints effectively while preserving the integrity of the artwork.

Scope and objectives: defining overpaint detection goals, sensitivity thresholds, and conservation decision criteria
Start by clarifying what you want the MSI+ML system to do, because overpaint detection isn’t one-size-fits-all. Are you seeking to flag any later addition, even subtle varnish retouches, or do you only care about substantial overpaints that mask lost original work? Define sensitivity and specificity thresholds up front—too sensitive and you’ll drown conservators in false positives; too conservative and you’ll miss small but important retouches. Set concrete decision criteria tied to conservation outcomes: for instance, prioritize areas where a positive flag changes the microsampling plan or where removing overpaint would reveal significant iconographic information. Document what constitutes operational success—percentage of true positives recovered in a test set, maximum acceptable false positive rate, and minimum area size of detectible interventions. Also consider who will use the output: a conservator needs interpretable heatmaps and confidence scores, while a curator might want a ranked list of intervention priorities. Finally, consider legal and ethical constraints: some retouches are historically significant and intentionally left in place, so detection should inform, not dictate, treatment. Framing your project with these scope elements keeps the technical work focused on conservation-relevant outcomes rather than abstract algorithmic performance.
Multispectral imaging modalities and selection: UV, visible, NIR, SWIR, push‑broom vs snapshot hyperspectral tradeoffs
Choosing the right spectral range and capture hardware drives what you can see. UV fluorescence and UV‑induced visible range imaging are excellent for revealing organic varnishes and resins used in retouching, while visible bands capture colorimetric differences and NIR often penetrates varnishes to show underpaint or carbon-based underdrawing that can hint at original vs later layers. SWIR expands material discrimination further, particularly for certain organics and pigments that show distinct absorption features beyond the NIR. Hardware choices matter: snapshot hyperspectral cameras capture wide spectral cubes quickly and are convenient for in‑situ work, but they may lack spectral resolution or SNR compared with push‑broom systems that achieve higher spectral fidelity at the cost of longer capture times and more complex motion control. Choose modalities based on accessibility, object sensitivity, and the spectral signatures you expect: pigments like ultramarine, lead white, or modern organic dyes exhibit different behaviors across bands, and if you expect varnish or proteinaceous retouches, UV and NIR are essential. Practical projects often blend modalities—UV, VIS, NIR—and accept that perfect hyperspectral finesse is less important than consistent, calibrated multispectral stacks that you can compare over time and across objects.
Acquisition protocols and standardized capture: illumination geometry, filter sets, exposure bracketing, tiling and focus strategies
Consistent, repeatable acquisition beats fancy algorithms when it comes to real-world conservation. Establish a capture protocol covering illumination geometry (diffuse, cross‑polarized, and angled raking as needed), filter band selection (narrowband UV, standardized VIS bands, and NIR/SWIR bands if available), and exposure bracketing to protect highlight and shadow detail. Use cross‑polarized illumination for gloss minimization, and consider raking light to add texture information helpful for distinguishing surface retouch textures from underlying paint. For large canvases a tiled capture strategy with overlap is essential; ensure consistent focus and exposure across tiles so stitching and registration later aren’t compromised. Record every parameter—lamp type, spectral power distribution, camera settings, filter IDs, distance—and include calibration targets (reflectance standards and fluorescence references) in every session. That metadata is the backbone of radiometric normalization and ensures your learned models don’t overfit to idiosyncratic lighting conditions. In short: the more consistent your capture, the more reliable your subsequent machine learning inference will be, and the more defensible your recommendations to conservators who will act on those results.
Radiometric and geometric calibration: reflectance/fluorescence standards, color targets, dark‑frame correction, and mosaicking
Calibration turns raw camera outputs into meaningful spectral reflectance data and removes session-specific artifacts. Start with radiometric calibration: capture dark frames to subtract sensor bias and use reflectance standards (Spectralon panels) to convert raw counts to relative reflectance across bands. For fluorescence captures, include fluorescence standards so intensity is comparable across sessions. Geometric calibration ensures that each spectral band lines up pixel-for-pixel; use fiducial markers or robust feature-matching algorithms to correct parallax and lens distortion before stacking. For tiled captures, mosaicking needs careful overlap management and blending to avoid spectral discontinuities at seams—record tile coordinates and use software that preserves radiometric consistency during stitching. Proper calibration also lets you construct derived products—spectral indices, normalized difference maps, and reflectance-converted cubes—that feed directly into your feature pipelines without band-to-band bias. Calibration is not optional: models trained on uncalibrated data often learn session artifacts rather than material contrasts, producing brittle performance. Invest time in calibration protocols, automate metadata capture, and bake calibration into routine imaging workflows so datasets remain comparable across projects and years.
Preprocessing pipeline: band registration, denoising, stray light correction, and spectral normalization
Once images are captured and calibrated, preprocessing prepares the stacks for meaningful analysis. Band registration aligns each spectral channel to sub‑pixel accuracy—critical because machine learning models expect consistent spatial correspondence across wavelengths. Apply denoising algorithms adapted to the noise model of your sensor—variance-stabilizing transforms followed by wavelet denoising or BM3D variants work well for low-light bands. Stray light correction and vignetting compensation prevent false spectral features at edges; measure these effects with calibration frames and correct accordingly. Spectral normalization—using methods like continuum removal, multiplicative scatter correction, or per-band z‑scoring—reduces illumination and surface scattering variance so that material signatures dominate. If fluorescence captures are included, normalize fluorescence intensity by excitation energy and reference signal to compare across sessions. Keep a clear provenance of preprocessing steps—versioned pipelines and logs—so outputs are reproducible and audit trails exist for conservators. Good preprocessing reduces false positives and allows machine learning to focus on real material differences rather than noise and capture artifacts.
Derived spectral products and feature engineering: PCA, continuum removal, spectral indices, texture, and fluorescence lifetime proxies
Raw spectral stacks are rich but noisy; derived products distill discriminative information for models. Principal component analysis (PCA) compresses correlated bands into orthogonal components that often isolate varnish effects or pigment contrasts. Continuum removal and derivative spectra emphasize absorption features useful for pigment discrimination. Spectral indices—ratios or normalized differences between bands sensitive to specific pigments or varnish features—can be crafted empirically or learned from training data. Don’t forget spatial features: texture measures (local variance, co-occurrence matrices, multi-scale Gabor filters) capture brushwork and retouch texture differences, while morphological measures highlight blistering or brushstroke mismatch. For fluorescence data, lifetime proxies or intensity decay measures can indicate organic retouches versus original varnish. Combining spectral and spatial features gives machine learning models a richer palette to distinguish overpaints from originals. Feature engineering is iterative: exploratory data analysis on labeled regions often reveals simple indices that outperform complex black-box features, so mix domain knowledge with data-driven discovery for robust pipelines.
Spectral unmixing and endmember extraction: linear, non‑linear, and sparse unmixing to separate varnish, original paint, and overpaints
Spectral unmixing tries to decompose each pixel into contributions from pure materials—endmembers—such as varnish, original pigment, ground, or later retouch medium. Linear unmixing assumes additive mixing and is computationally efficient; it works well when materials form thin, separate layers or when scattering is limited. Non‑linear unmixing models multiple scattering and layered interactions, which better reflect paint systems but are computationally heavier and require more parametric assumptions. Sparse unmixing emphasizes a small set of endmembers per pixel and helps when you expect only a few contributors locally—useful for detecting small overpaints against complex originals. Endmember extraction can be supervised, using reference spectra from lab samples, or unsupervised, using algorithms like N-FINDR or VCA to find candidate signatures in the scene. The practical payoff is clear: unmixed abundance maps often reveal subtle retouches invisible in RGB, and they provide interpretable features for classifiers. Validate unmixing results against known cross-sections and ensure unmixing artifacts aren’t mistaken for retouches.
Machine learning architectures for overpaint detection: supervised classifiers, CNNs on multispectral stacks, and unsupervised anomaly detection
Selecting the right ML model depends on labeled data availability and the complexity of the task. Classic supervised classifiers—SVM, random forest, gradient boosting—work well when you have curated labeled patches and engineered features. Convolutional neural networks (CNNs) that consume multispectral stacks can learn joint spectral-spatial features automatically and often excel when plenty of labeled data exists; 3D CNNs or spectral‑spatial architectures handle spectral cubes elegantly. When labeled data is scarce, unsupervised anomaly detection methods—autoencoders, one‑class SVMs, or clustering on embeddings—can flag regions that deviate from the majority material signature, surfacing candidate overpaints for expert review. Hybrid approaches often yield the best practicality: use unsupervised models to propose candidates, then iteratively label and train supervised models that refine detection. Make explainability a priority: simple classifiers with interpretable features may be preferable to black-box deep nets when conservators need clear rationales for sampling decisions.
Training data strategy and augmentation: curated labeled patches, synthetic overpaint simulations, transfer learning, and cross‑institutional datasets
Good training data is the linchpin. Start with curated labeled patches identified by conservators across varied artworks to capture painterly diversity—different pigments, varnishes, and retouch materials. When real labeled examples are scarce, synthetic augmentation helps: simulate overpaints by digitally layering plausible retouch textures and spectral shifts onto originals, or create physical mockups with historically accurate retouch materials and image them with the same MSI setup. Transfer learning from large hyperspectral datasets or pretrained vision networks can accelerate convergence, especially for deep models. Cross‑institutional datasets multiply diversity and robustness but require harmonized capture and calibration protocols so models don’t learn instrument-specific quirks. Maintain a validation set from unseen works and continually add new labeled samples in an active‑learning loop where models request human labeling for high‑uncertainty regions. This iterative strategy keeps models current and reduces bias toward a single studio or medium.
Multimodal fusion strategies: integrating MSI with IRR, XRF, raking light, and archival imagery for robust inference
MSI shines, but combining modalities enhances confidence and reduces ambiguity. Infrared reflectography (IRR) reveals underdrawings and dense carbon-based features; if MSI flags a discrepancy where IRR shows original underdrawing alignment, that strengthens the hypothesis of overpaint. XRF elemental maps identify inorganic pigments and confirm the presence or absence of particular elements—helpful when overpaint uses modern pigments absent from the original palette. Raking light exposes surface texture differences like brushwork mismatch, complementing spectral differences. Historical photographs and conservation reports provide temporal context—retouches might appear between photos and help ground truth algorithm outputs. Fusion strategies range from early fusion—concatenating multimodal features into a single model—to late fusion—combining modality-specific confidences into a final decision. Weight modalities by reliability for the object: if XRF coverage is sparse, rely more on MSI; if IRR clearly reveals an underdrawing, use that as a strong prior. Multimodal fusion yields robust, interpretable recommendations rather than single-modality guesses.
Explainability and uncertainty quantification: saliency/attention maps, confidence scoring, and interpretable outputs for conservators
Conservators need to trust and interrogate model outputs, so build explainability into the pipeline. Provide saliency maps or attention heatmaps that show which bands or spatial regions drove a positive detection. Offer per‑pixel confidence scores and thresholded heatmaps so conservators can choose conservative or exploratory modes: high-confidence maps for sampling and lower thresholds for survey purposes. Quantify uncertainty with Bayesian methods or ensemble variance measures and present these metrics alongside visual results. Include provenance metadata: which training examples influenced the decision, and what preprocessing steps were applied. Present recommendations as ranked candidates with clear reasons—spectral mismatch in NIR, fluorescence anomaly, texture discordance—rather than cryptic binary labels. This transparency helps conservators weigh model output against their own inspection and reduces the chance of over-relying on algorithmic suggestions for invasive sampling.
Prioritization framework for microsampling and intervention: ranking likely overpaint zones, risk vs information gain heuristics, and sampling constraints
Not every flagged pixel warrants a micro‑sample. Build prioritization logic that balances risk (potential damage from sampling), information gain (how likely a sample is to answer a conservation question), and practical constraints (access, funding, legal). Rank zones by combined score: detection confidence × estimated information gain ÷ sampling cost. Incorporate constraints like minimum sampling distance from previous interventions and consideration for culturally sensitive areas. Provide conservators with prioritized pick lists and suggested sampling strategies—surface swabs for organic retouch identification, cross‑section sampling for stratigraphy, or targeted FTIR/Raman analysis. This triage approach minimizes invasive action while maximizing the value of each micro‑analysis. It also allows institutions to plan sampling campaigns efficiently and justify sampling choices with data-driven reasoning when reporting to stakeholders or funders.
Validation and ground‑truthing: micro‑sample analysis workflows, cross‑validation metrics, and iterative model refinement
Models must be validated against ground truth before they guide irreversible actions. Implement a validation loop where a subset of high‑priority flagged regions receives microsampling for cross‑section microscopy, Raman, FTIR, or GC‑MS to confirm original vs retouch layers. Use cross‑validation with held‑out works to estimate generalization error, and report metrics that matter to conservators—precision at top K candidates, spatial overlap with true retouch areas, and false negative rates on historically confirmed spots. Use discrepancies as training data to refine feature sets and adjust capture or preprocessing steps. Track model drift and retrain periodically as new data accumulates. Ultimately, validation closes the loop: it shows conservators where the model succeeds and where it needs human judgment, and it turns an initial experimental tool into an integrated part of the conservation workflow.
Integration into conservation workflows and documentation: reporting formats, metadata standards, provenance of derived maps, and stakeholder communication
To be useful, MSI+ML outputs must plug into existing conservation workflows seamlessly. Standardize reporting formats—georeferenced heatmaps, calibrated reflectance cubes, and human‑readable summaries—and attach metadata describing capture settings, preprocessing steps, model version, and confidence thresholds. Store derived maps alongside original images in the object’s conservation record with checksums and version histories so future teams can reproduce or re-evaluate findings. Provide conservators with clear interpretive guidance: how to read heatmaps, suggested sampling tactics, and recommended follow-up imaging modalities. Communicate results to curators, stakeholders, and funders with balanced language that pairs algorithmic insights with human expertise. When done right, MSI and machine learning become decision support tools that increase the precision of conservation actions while respecting the conservator’s final authority.