Regression

Regression predicts a continuous value instead of a class label. Typical HSI examples are moisture, fat, protein, sugar content, dilution, concentration, or any lab-measured quality value.

For problems with a chemical concentration basis — such as fat content in milk — absorbance is often a better modelling space than reflectance because it is closer to the linear Beer-Lambert relationship used in spectroscopy. Still validate both on held-out samples; scattering, path length, surface roughness, and illumination differences can all change what works best.

Data requirements

Use the same preprocessing, band count, band order, and wavelength calibration for training, prediction maps, and camera pipelines. Regression models can extrapolate far outside the training range on background pixels or unfamiliar samples, so validate on independent scans or physical samples before using the numbers quantitatively.

Downloaded scripts

The downloadable .py files can be run without editing paths by setting environment variables such as HSI_EXAMPLE_BASE_DIR. See Running Downloaded Scripts for the full list of supported overrides.

For the milk-fat examples, set HSI_EXAMPLE_BASE_DIR to the folder containing milk.pam, milk_fat_roi.json, and the matching reference captures: dark_ref.pam and white_ref.pam.

Train a Regression Model From ROIs

This example assumes you have an annotated milk datacube. The training annotations contain numeric target values in properties["fat"] and class labels in properties["type"]. Set HSI_EXAMPLE_TARGET_PROPERTY if your annotations use another numeric property, such as concentration or value. Background ROIs are ignored; only ROIs with type == "milk" are used for regression.

For reusable annotation loading, plotting, and ROI extraction patterns, see Annotations and ROIs.

Saved regression model

Creates: regression_model.joblib, or HSI_EXAMPLE_REGRESSION_MODEL if set.
Used by: the training-fit check, prediction-map example, and streaming regressor example.
Run first when testing the downloaded regression scripts.

from sklearn.cross_decomposition import PLSRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


dark_ref, white_ref = make_references()


def open_absorbance_cube(cube_name):
    img = hs.open(str(required_data_path(cube_name, "milk datacube")))
    reflectance = reflectance_calibration(img, white_ref, dark_ref, clip=True)
    reflectance = reflectance.ensure_dtype(hs.float32).clip(1e-6, 1.0)
    return reflectance.ufunc(lambda meta, plane: -np.log10(np.clip(plane, 1e-6, 1.0)))


def load_milk_annotations():
    annotations_path = required_data_path(MILK_ANNOTATIONS, "milk annotations JSON")
    return hs.annotations.open(str(annotations_path))


def extract_regression_pixels(cube, ann_file, target_property=TARGET_PROPERTY):
    pixels_list = []
    targets_list = []

    for annot in ann_file.annotations:
        properties = annot.properties
        if annotation_value(properties.get("type")) != SAMPLE_TYPE:
            continue
        if target_property not in properties:
            continue

        selected = cube.select_mask_from_descriptor(annot.descriptor)
        spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
        target = float(annotation_value(properties[target_property]))

        pixels_list.append(spectra)
        targets_list.append(np.full(spectra.shape[0], target, dtype=np.float32))

    if not pixels_list:
        raise SystemExit("No labelled milk ROIs found in the annotations file.")
    return np.concatenate(pixels_list), np.concatenate(targets_list)


milk_annotations = load_milk_annotations()
train_absorbance = open_absorbance_cube(MILK_TRAIN_CUBE)
reg_pixels, reg_targets = extract_regression_pixels(train_absorbance, milk_annotations)

reg = make_pipeline(
    StandardScaler(),
    PLSRegression(n_components=8),
)
reg.fit(reg_pixels, reg_targets)

joblib.dump(reg, REGRESSION_MODEL_PATH)

print(f"PLS training pixels: {reg_pixels.shape[0]}")
print(f"{TARGET_PROPERTY} range: {reg_targets.min():.2f} to {reg_targets.max():.2f}")
print(f"Saved regressor to {REGRESSION_MODEL_PATH}")

Download this script

Example output:

PLS training pixels: 12500
fat range: 0.00 to 36.00
Saved regressor to regression_model.joblib

Validation

Do not validate by randomly splitting pixels from the same ROI into train and test sets. Pixels from one physical sample are highly similar, so that split can make the model look better than it really is. Use a separate annotated test cube or independent lab measurements when you need quantitative error metrics.

Other Regression Models

The downloaded script uses PLS because it is a common chemometrics choice for spectral regression. The ROI extraction gives ordinary scikit-learn training data: reg_pixels has shape (n_pixels, n_bands) and reg_targets has one numeric value per pixel. You can swap in another regressor without changing the prediction-map or streaming examples, as long as the saved object provides a scikit-learn-style predict() method.

Other useful starting points:

pls is a standard chemometrics choice for spectral regression.
ridge is a simple linear baseline and is usually fast per line.
random_forest can model non-linear relationships, but often needs more data and can be slower during prediction.

Save and load with joblib

Use joblib to save the whole scikit-learn regressor or pipeline, including any preprocessing steps:

import joblib


def save_regression_model(reg, model_path):
    joblib.dump(reg, model_path)


def load_regression_model(model_path):
    if not model_path.exists():
        raise SystemExit(
            f"Regression model not found at {model_path}. "
            "Run the first regression example first."
        )
    return joblib.load(model_path)

Evaluate Training Fit on ROIs

Before interpreting a prediction map, first check whether the saved model from the training example fits the ROIs it was trained on. This example loads the model from regression_model.joblib, predicts the annotated training ROIs, and prints fit metrics.

The ROI-mean metrics are often the most meaningful summary when the target value comes from a lab measurement of the whole physical sample. Pixel-level metrics are still useful for showing how noisy the prediction is inside each ROI.

caution

This is a training-fit sanity check, not an independent validation. Good numbers here only show that the model can reproduce the annotated ROIs it was trained with. Use held-out scans or independent physical samples for real performance estimates.

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score


dark_ref, white_ref = make_references()


def open_absorbance_cube(cube_name):
    img = hs.open(str(required_data_path(cube_name, "milk datacube")))
    reflectance = reflectance_calibration(img, white_ref, dark_ref, clip=True)
    reflectance = reflectance.ensure_dtype(hs.float32).clip(1e-6, 1.0)
    return reflectance.ufunc(lambda meta, plane: -np.log10(np.clip(plane, 1e-6, 1.0)))


def load_milk_annotations():
    annotations_path = required_data_path(MILK_ANNOTATIONS, "milk annotations JSON")
    return hs.annotations.open(str(annotations_path))


def extract_roi_records(cube, ann_file, target_property=TARGET_PROPERTY):
    roi_records = []

    for roi_index, annot in enumerate(ann_file.annotations):
        properties = annot.properties
        if annotation_value(properties.get("type")) != SAMPLE_TYPE:
            continue
        if target_property not in properties:
            continue

        selected = cube.select_mask_from_descriptor(annot.descriptor)
        spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
        target = float(annotation_value(properties[target_property]))

        roi_records.append(
            {
                "name": f"ROI {roi_index}",
                "spectra": spectra,
                "target": target,
            }
        )

    if not roi_records:
        raise SystemExit("No labelled milk ROIs found in the annotations file.")
    return roi_records


reg = load_regression_model()
milk_annotations = load_milk_annotations()
train_absorbance = open_absorbance_cube(MILK_TRAIN_CUBE)
roi_records = extract_roi_records(train_absorbance, milk_annotations)

# This is a training-fit check: it measures how well the saved model predicts
# the same labelled ROIs it was trained from. Use separate scans/ROIs for a real
# validation score.
pixel_true = []
pixel_pred = []
roi_true = []
roi_pred = []

for record in roi_records:
    predictions = reg.predict(record["spectra"]).ravel()
    target_values = np.full(predictions.shape, record["target"], dtype=np.float32)

    pixel_true.append(target_values)
    pixel_pred.append(predictions)
    roi_true.append(record["target"])
    roi_pred.append(float(predictions.mean()))

    print(
        f"{record['name']}: target={record['target']:.2f}, "
        f"predicted mean={predictions.mean():.2f}, "
        f"pixel range={predictions.min():.2f} to {predictions.max():.2f}"
    )

pixel_true = np.concatenate(pixel_true)
pixel_pred = np.concatenate(pixel_pred)
roi_true = np.asarray(roi_true)
roi_pred = np.asarray(roi_pred)

pixel_rmse = np.sqrt(mean_squared_error(pixel_true, pixel_pred))
roi_rmse = np.sqrt(mean_squared_error(roi_true, roi_pred))

print("\nPixel-level training-fit metrics")
print(f"MAE:  {mean_absolute_error(pixel_true, pixel_pred):.2f}")
print(f"RMSE: {pixel_rmse:.2f}")
print(f"R2:   {r2_score(pixel_true, pixel_pred):.3f}")

print("\nROI-mean training-fit metrics")
print(f"MAE:  {mean_absolute_error(roi_true, roi_pred):.2f}")
print(f"RMSE: {roi_rmse:.2f}")
print(f"R2:   {r2_score(roi_true, roi_pred):.3f}")

Download this script

Example output:

Pixel-level training-fit metrics
MAE:  1.86
RMSE: 2.38
R2:   0.971

ROI-mean training-fit metrics
MAE:  0.59
RMSE: 0.80
R2:   0.997

Predict a Regression Map

After training, wrap the regression model with hs.util.predictor() and apply it lazily to a test cube. The result is an Image with one numeric prediction per spatial pixel. Use the same preprocessing used during training; here that means absorbance.

The visualization is clipped to the expected milk-fat range (0.5 to 36.0 by default) so a few extrapolated pixels do not flatten the color scale. Override this with HSI_EXAMPLE_REGRESSION_VMIN and HSI_EXAMPLE_REGRESSION_VMAX when using another target range.

import matplotlib.pyplot as plt
from qtec_hv_sdk.util import predictor


reg = load_regression_model()
test_absorbance = open_absorbance_cube(MILK_TEST_CUBE)

start_line = 0
n_lines = 800
test_crop = test_absorbance[start_line:start_line + n_lines, :, :]

hs_regressor = predictor(reg)
prediction = hs_regressor(test_crop)
prediction_map = prediction.to_numpy_with_interleave(hs.bip)[:, :, 0]
display_map = np.clip(prediction_map, DISPLAY_MIN, DISPLAY_MAX)

print(
    f"Raw predicted {TARGET_PROPERTY}: "
    f"{prediction_map.min():.2f} to {prediction_map.max():.2f}"
)

image = plt.imshow(
    display_map,
    cmap="viridis",
    vmin=DISPLAY_MIN,
    vmax=DISPLAY_MAX,
)
plt.colorbar(image, label=f"Predicted {TARGET_PROPERTY}", extend="both")
plt.title(f"{TARGET_PROPERTY} prediction, clipped for display")
plt.axis("off")
plt.show()

Download this script

Camera pipeline

For camera deployment, keep the regressor in the streaming chapter. The stream with a saved regressor example shows how to use cam.to_hs_image(), absorbance preprocessing, and hs.util.predictor() in one lazy camera pipeline.

For quantitative work, validate against independent samples not used for training.

Train a Regression Model From ROIs​

Save and load with joblib​

Evaluate Training Fit on ROIs​

Predict a Regression Map​

Train a Regression Model From ROIs

Save and load with joblib

Evaluate Training Fit on ROIs

Predict a Regression Map