Regression
Regression predicts a continuous value instead of a class label. Typical HSI examples are moisture, fat, protein, sugar content, dilution, concentration, or any lab-measured quality value.
For problems with a chemical concentration basis — such as fat content in milk — absorbance is often a better modelling space than reflectance because it is closer to the linear Beer-Lambert relationship used in spectroscopy. Still validate both on held-out samples; scattering, path length, surface roughness, and illumination differences can all change what works best.
Use the same preprocessing, band count, band order, and wavelength calibration for training, prediction maps, and camera pipelines. Regression models can extrapolate far outside the training range on background pixels or unfamiliar samples, so validate on independent scans or physical samples before using the numbers quantitatively.
The downloadable .py files can be run without editing paths by setting
environment variables such as HSI_EXAMPLE_BASE_DIR. See
Running Downloaded Scripts
for the full list of supported overrides.
For the milk-fat examples, set HSI_EXAMPLE_BASE_DIR to the folder containing
milk.pam, milk_fat_roi.json, and
the matching reference captures:
dark_ref.pam and
white_ref.pam.
Train a Regression Model From ROIs
This example assumes you have an annotated milk datacube. The training
annotations contain numeric target values in properties["fat"] and class
labels in properties["type"]. Set HSI_EXAMPLE_TARGET_PROPERTY if your
annotations use another numeric property, such as concentration or value.
Background ROIs are ignored; only ROIs with type == "milk" are used for
regression.
For reusable annotation loading, plotting, and ROI extraction patterns, see Annotations and ROIs.
- Creates:
regression_model.joblib, orHSI_EXAMPLE_REGRESSION_MODELif set. - Used by: the training-fit check, prediction-map example, and streaming regressor example.
- Run first when testing the downloaded regression scripts.
from sklearn.cross_decomposition import PLSRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
dark_ref, white_ref = make_references()
def open_absorbance_cube(cube_name):
img = hs.open(str(required_data_path(cube_name, "milk datacube")))
reflectance = reflectance_calibration(img, white_ref, dark_ref, clip=True)
reflectance = reflectance.ensure_dtype(hs.float32).clip(1e-6, 1.0)
return reflectance.ufunc(lambda meta, plane: -np.log10(np.clip(plane, 1e-6, 1.0)))
def load_milk_annotations():
annotations_path = required_data_path(MILK_ANNOTATIONS, "milk annotations JSON")
return hs.annotations.open(str(annotations_path))
def extract_regression_pixels(cube, ann_file, target_property=TARGET_PROPERTY):
pixels_list = []
targets_list = []
for annot in ann_file.annotations:
properties = annot.properties
if annotation_value(properties.get("type")) != SAMPLE_TYPE:
continue
if target_property not in properties:
continue
selected = cube.select_mask_from_descriptor(annot.descriptor)
spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
target = float(annotation_value(properties[target_property]))
pixels_list.append(spectra)
targets_list.append(np.full(spectra.shape[0], target, dtype=np.float32))
if not pixels_list:
raise SystemExit("No labelled milk ROIs found in the annotations file.")
return np.concatenate(pixels_list), np.concatenate(targets_list)
milk_annotations = load_milk_annotations()
train_absorbance = open_absorbance_cube(MILK_TRAIN_CUBE)
reg_pixels, reg_targets = extract_regression_pixels(train_absorbance, milk_annotations)
reg = make_pipeline(
StandardScaler(),
PLSRegression(n_components=8),
)
reg.fit(reg_pixels, reg_targets)
joblib.dump(reg, REGRESSION_MODEL_PATH)
print(f"PLS training pixels: {reg_pixels.shape[0]}")
print(f"{TARGET_PROPERTY} range: {reg_targets.min():.2f} to {reg_targets.max():.2f}")
print(f"Saved regressor to {REGRESSION_MODEL_PATH}")
Example output:
PLS training pixels: 12500
fat range: 0.00 to 36.00
Saved regressor to regression_model.joblib
Do not validate by randomly splitting pixels from the same ROI into train and test sets. Pixels from one physical sample are highly similar, so that split can make the model look better than it really is. Use a separate annotated test cube or independent lab measurements when you need quantitative error metrics.
The downloaded script uses PLS because it is a common chemometrics choice for
spectral regression. The ROI extraction gives ordinary scikit-learn training
data: reg_pixels has shape (n_pixels, n_bands) and reg_targets has one
numeric value per pixel. You can swap in another regressor without changing the
prediction-map or streaming examples, as long as the saved object provides a
scikit-learn-style predict() method.
Other useful starting points:
plsis a standard chemometrics choice for spectral regression.ridgeis a simple linear baseline and is usually fast per line.random_forestcan model non-linear relationships, but often needs more data and can be slower during prediction.
Save and load with joblib
Use joblib to save the whole scikit-learn regressor or pipeline, including
any preprocessing steps:
import joblib
def save_regression_model(reg, model_path):
joblib.dump(reg, model_path)
def load_regression_model(model_path):
if not model_path.exists():
raise SystemExit(
f"Regression model not found at {model_path}. "
"Run the first regression example first."
)
return joblib.load(model_path)
Evaluate Training Fit on ROIs
Before interpreting a prediction map, first check whether the saved model from
the training example fits the ROIs it was trained on. This example loads the
model from regression_model.joblib, predicts the annotated training ROIs, and
prints fit metrics.
The ROI-mean metrics are often the most meaningful summary when the target value comes from a lab measurement of the whole physical sample. Pixel-level metrics are still useful for showing how noisy the prediction is inside each ROI.
This is a training-fit sanity check, not an independent validation. Good numbers here only show that the model can reproduce the annotated ROIs it was trained with. Use held-out scans or independent physical samples for real performance estimates.
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
dark_ref, white_ref = make_references()
def open_absorbance_cube(cube_name):
img = hs.open(str(required_data_path(cube_name, "milk datacube")))
reflectance = reflectance_calibration(img, white_ref, dark_ref, clip=True)
reflectance = reflectance.ensure_dtype(hs.float32).clip(1e-6, 1.0)
return reflectance.ufunc(lambda meta, plane: -np.log10(np.clip(plane, 1e-6, 1.0)))
def load_milk_annotations():
annotations_path = required_data_path(MILK_ANNOTATIONS, "milk annotations JSON")
return hs.annotations.open(str(annotations_path))
def extract_roi_records(cube, ann_file, target_property=TARGET_PROPERTY):
roi_records = []
for roi_index, annot in enumerate(ann_file.annotations):
properties = annot.properties
if annotation_value(properties.get("type")) != SAMPLE_TYPE:
continue
if target_property not in properties:
continue
selected = cube.select_mask_from_descriptor(annot.descriptor)
spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
target = float(annotation_value(properties[target_property]))
roi_records.append(
{
"name": f"ROI {roi_index}",
"spectra": spectra,
"target": target,
}
)
if not roi_records:
raise SystemExit("No labelled milk ROIs found in the annotations file.")
return roi_records
reg = load_regression_model()
milk_annotations = load_milk_annotations()
train_absorbance = open_absorbance_cube(MILK_TRAIN_CUBE)
roi_records = extract_roi_records(train_absorbance, milk_annotations)
# This is a training-fit check: it measures how well the saved model predicts
# the same labelled ROIs it was trained from. Use separate scans/ROIs for a real
# validation score.
pixel_true = []
pixel_pred = []
roi_true = []
roi_pred = []
for record in roi_records:
predictions = reg.predict(record["spectra"]).ravel()
target_values = np.full(predictions.shape, record["target"], dtype=np.float32)
pixel_true.append(target_values)
pixel_pred.append(predictions)
roi_true.append(record["target"])
roi_pred.append(float(predictions.mean()))
print(
f"{record['name']}: target={record['target']:.2f}, "
f"predicted mean={predictions.mean():.2f}, "
f"pixel range={predictions.min():.2f} to {predictions.max():.2f}"
)
pixel_true = np.concatenate(pixel_true)
pixel_pred = np.concatenate(pixel_pred)
roi_true = np.asarray(roi_true)
roi_pred = np.asarray(roi_pred)
pixel_rmse = np.sqrt(mean_squared_error(pixel_true, pixel_pred))
roi_rmse = np.sqrt(mean_squared_error(roi_true, roi_pred))
print("\nPixel-level training-fit metrics")
print(f"MAE: {mean_absolute_error(pixel_true, pixel_pred):.2f}")
print(f"RMSE: {pixel_rmse:.2f}")
print(f"R2: {r2_score(pixel_true, pixel_pred):.3f}")
print("\nROI-mean training-fit metrics")
print(f"MAE: {mean_absolute_error(roi_true, roi_pred):.2f}")
print(f"RMSE: {roi_rmse:.2f}")
print(f"R2: {r2_score(roi_true, roi_pred):.3f}")
Example output:
Pixel-level training-fit metrics
MAE: 1.86
RMSE: 2.38
R2: 0.971
ROI-mean training-fit metrics
MAE: 0.59
RMSE: 0.80
R2: 0.997
Predict a Regression Map
After training, wrap the regression model with hs.util.predictor() and apply
it lazily to a test cube. The result is an Image with one numeric prediction
per spatial pixel. Use the same preprocessing used during training; here that
means absorbance.
The visualization is clipped to the expected milk-fat range (0.5 to 36.0 by
default) so a few extrapolated pixels do not flatten the color scale. Override
this with HSI_EXAMPLE_REGRESSION_VMIN and HSI_EXAMPLE_REGRESSION_VMAX when
using another target range.
import matplotlib.pyplot as plt
from qtec_hv_sdk.util import predictor
reg = load_regression_model()
test_absorbance = open_absorbance_cube(MILK_TEST_CUBE)
start_line = 0
n_lines = 800
test_crop = test_absorbance[start_line:start_line + n_lines, :, :]
hs_regressor = predictor(reg)
prediction = hs_regressor(test_crop)
prediction_map = prediction.to_numpy_with_interleave(hs.bip)[:, :, 0]
display_map = np.clip(prediction_map, DISPLAY_MIN, DISPLAY_MAX)
print(
f"Raw predicted {TARGET_PROPERTY}: "
f"{prediction_map.min():.2f} to {prediction_map.max():.2f}"
)
image = plt.imshow(
display_map,
cmap="viridis",
vmin=DISPLAY_MIN,
vmax=DISPLAY_MAX,
)
plt.colorbar(image, label=f"Predicted {TARGET_PROPERTY}", extend="both")
plt.title(f"{TARGET_PROPERTY} prediction, clipped for display")
plt.axis("off")
plt.show()
For camera deployment, keep the regressor in the streaming chapter. The
stream with a saved regressor
example shows how to use cam.to_hs_image(), absorbance preprocessing, and
hs.util.predictor() in one lazy camera pipeline.
For quantitative work, validate against independent samples not used for training.