Skip to main content

Classification

Classification workflows use labeled spectra to assign a class to each pixel or to each foreground object pixel.

Downloaded scripts

The downloadable .py files can be run without editing paths by setting environment variables such as HSI_EXAMPLE_BASE_DIR. See Running Downloaded Scripts for the full list of supported overrides.

Lazy Loading

The SDK uses lazy processing. Opening a cube or slicing an Image describes the work to do; data is read first when you export to NumPy, write a file, display an image, or otherwise request actual values.

Data requirements

Train and predict in the same data space. The training cube, test cube, and camera pipeline should use the same reflectance or absorbance preprocessing, band count, band order, and wavelength calibration. Use independent scans, held-out ROIs, or separate physical samples when you need a real performance estimate.

Train a Pixel Classifier From ROIs

The classification workflow is:

  1. Calibrate the training cube.
  2. Extract pixels from annotated ROIs.
  3. Fit a classifier on (n_pixels, n_bands) spectra.
  4. Save the trained model for later examples.

This example loads annotations exported from HV Explorer. Rectangle, polygon, and ellipse ROIs are supported. The SDK annotation loader provides descriptors that can be passed directly to select_mask_from_descriptor() to extract ROI spectra.

For reusable annotation loading, plotting, and ROI extraction patterns, see Annotations and ROIs.

Saved classifier model
  • Creates: pixel_classifier.joblib, or HSI_EXAMPLE_CLASSIFIER_MODEL if set.
  • Used by: the training-ROI check, scan-line classification, result visualization, foreground examples, and classifier streaming examples.
  • Run first when testing the downloaded classification scripts.
reflectance = open_reflectance_cube()
ann_file = load_annotations()

pixels_list = []
labels_list = []

for annot in ann_file.annotations:
class_name = annotation_value(annot.properties["type"])
selected = reflectance.select_mask_from_descriptor(annot.descriptor)
spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
pixels_list.append(spectra)
labels_list.append(np.full(spectra.shape[0], class_name))

pixels = np.concatenate(pixels_list)
labels = np.concatenate(labels_list)

model_name = "nearest_centroid"
clf = NearestCentroid()
clf.fit(pixels, labels)

# Save model
joblib.dump(clf, CLASSIFIER_MODEL_PATH)

print(f"Classes: {list(clf.classes_)}")
print(f"Saved classifier to {CLASSIFIER_MODEL_PATH}")

Download this script

Example output:

nearest_centroid training pixels: 15000
Classes: ['almond' 'background' 'hazelnut' ...]
Saved classifier to pixel_classifier.joblib

Save and load with joblib

Use joblib to save the whole scikit-learn classifier or pipeline, including any preprocessing steps:

import joblib


def save_classifier_model(clf, model_path):
joblib.dump(clf, model_path)


def load_classifier_model(model_path):
if not model_path.exists():
raise SystemExit(
f"Classifier model not found at {model_path}. "
"Run the first classification example first."
)
return joblib.load(model_path)

Load a model exported from HV Explorer

HV Explorer exports classification models as .pkl files. These are Python pickle dicts with four keys: model (the scikit-learn estimator), property_name (the annotation property used as class label, e.g. "type"), name, and type. The companion annotations .json file contains a property_desc field whose labels list maps each numeric class id back to the original class name — labels[i] is the name for clf.classes_[i].

Set HSI_EXAMPLE_HV_EXPLORER_MODEL to your .pkl file and HSI_EXAMPLE_HV_EXPLORER_ANNOTATIONS to the matching .json before running. Relative paths are resolved under HSI_EXAMPLE_BASE_DIR; absolute paths can be used when the exported model or annotations live somewhere else.

import pickle

from qtec_hv_sdk.util import predictor


def load_hv_explorer_model():
model_path = path_from_base(HV_EXPLORER_MODEL)
if not model_path.exists():
raise SystemExit(
f"HV Explorer model not found at {model_path}. "
"Export a classifier from HV Explorer and set HSI_EXAMPLE_HV_EXPLORER_MODEL."
)
with open(model_path, "rb") as f:
return pickle.load(f)


hv_model = load_hv_explorer_model()

# HV Explorer stores the fitted sklearn-compatible model together with the
# annotation property that was used as the class label.
clf = hv_model["model"]
property_name = hv_model["property_name"]

ann_file = load_hv_explorer_annotations()

# The exported model predicts numeric class IDs. The annotation file keeps the
# human-readable label names in the same order.
labels = ann_file.property_desc[property_name].labels
id_to_label = dict(enumerate(labels))

print(f"Model: {hv_model['name']!r} type: {hv_model['type']}")
print(f"Property: {property_name!r}")
print(f"Classes: {id_to_label}")

start_line = 0
n_lines = 300

test_reflectance = open_reflectance_cube()
crop = test_reflectance[start_line:start_line + n_lines, :, :]

# predictor() lets the model run directly on the lazy Image crop.
hs_clf = predictor(clf)
prediction = hs_clf(crop)
label_map = prediction.to_numpy_with_interleave(hs.bip)[:, :, 0].astype(int)

ids, counts = np.unique(label_map, return_counts=True)
print("\nPer-class pixel counts:")
for class_id, count in zip(ids, counts):
label = id_to_label.get(int(class_id), str(class_id))
print(f" {label}: {count}")

Download this script

Evaluate Training ROI Predictions

After fitting the classifier, evaluate the saved model on the same ROI pixels used for training. This is not an independent validation score, but it is a useful sanity check: it catches mismatched preprocessing, missing model files, label mapping mistakes, and annotations that do not match the cube.

Training-set check

High accuracy here does not prove the model generalizes. Use a separate scan, held-out ROIs, or cross-validation when you need a real performance estimate. This example is mainly a quick sanity check that the saved model, preprocessing, and annotation labels are aligned.

reflectance = open_reflectance_cube()
clf = load_classifier_model()
ann_file = load_annotations()

expected_list = []
predicted_list = []

for annot in ann_file.annotations:
expected_class = annotation_value(annot.properties["type"])
selected = reflectance.select_mask_from_descriptor(annot.descriptor)
spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]

expected_list.append(np.full(spectra.shape[0], expected_class))
predicted_list.append(clf.predict(spectra))

expected = np.concatenate(expected_list)
predicted = np.concatenate(predicted_list)

classes = np.array(clf.classes_)
accuracy = np.mean(predicted == expected)
matrix = confusion_matrix(expected, predicted, labels=classes)
per_class_accuracy = matrix.diagonal() / np.maximum(matrix.sum(axis=1), 1)

print(f"Training ROI pixel accuracy: {accuracy:.3f}")
for class_name, class_accuracy in zip(classes, per_class_accuracy):
print(f"{class_name}: {class_accuracy:.3f}")

print("Confusion matrix rows=true, columns=predicted")
print(classes)
print(matrix)

Download this script

Nearest centroid

Nearest centroid is a useful baseline because it stores one average spectrum per class and is fast to apply.

Other Pixel Classifiers

The ROI extraction above gives ordinary scikit-learn training data: pixels has shape (n_pixels, n_bands) and labels has one class name per pixel. You can swap in different classifiers without changing the later line classification or visualization code. For example, a linear SVM:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

model_name = "linear_svm"
clf = make_pipeline(
StandardScaler(),
LinearSVC(C=1.0, class_weight="balanced", dual=False, max_iter=10000),
)
clf.fit(pixels, labels)

Using Mean Spectra Instead of ROI Pixels

HV Explorer can export the average spectrum for each ROI from the spectra analysis view. The examples in this section use that export as a compact training set, similar to using a small external spectral library.

Set HSI_EXAMPLE_MEAN_SPECTRA to a CSV spectra export from HV Explorer. The file must use the wide layout: one row per ROI and one numeric column per wavelength. It must also include a class property column (default: type, override with HSI_EXAMPLE_CLASS_PROPERTY). The examples do not load a separate annotations file, so class labels must be present in the export itself.

name,file,type,430.0,431.412,432.825,...
almond1,mix1.pam,almond,0.135,0.138,0.141,...

Use mean spectra when you want to classify from a small set of representative ROI spectra instead of every pixel in every ROI. The exported spectra and the datacube you classify must be in the same preprocessing space. These examples open the cube as reflectance, so export the mean spectra from reflectance-calibrated data. If you export absorbance spectra instead, classify absorbance pixels.

Spectral Libraries

This is the same basic pattern you would use with a spectral library: load reference spectra, align them to the cube wavelengths, make sure the library and cube use the same preprocessing, and then train or match against those references.

Train a Classifier From Mean Spectra

This example uses the mean spectra as compact training data for a simple scikit-learn classifier, then applies it to one line from a test cube.

def load_mean_spectra(csv_path, n_bands):
csv_path = Path(csv_path) if Path(csv_path).is_absolute() else BASE_DIR / csv_path
if not csv_path.exists():
raise SystemExit("Set HSI_EXAMPLE_MEAN_SPECTRA to the exported mean spectra CSV file.")

table = pd.read_csv(csv_path)
wavelength_columns = sorted(
[c for c in table.columns if c.replace(".", "", 1).isdigit()], key=float
)
if "name" not in table.columns or not wavelength_columns:
raise SystemExit(
"Mean spectra CSV must have a 'name' column and numeric wavelength columns."
)

spectra = table[["name", *wavelength_columns]].groupby("name", sort=True).mean()
roi_names = spectra.index.to_numpy()
csv_wavelengths = np.array([float(c) for c in wavelength_columns])
cube_wavelengths = wavelengths_for_bands(n_bands)
values = np.vstack([
np.interp(cube_wavelengths, csv_wavelengths, row)
for row in spectra.to_numpy()
]).astype(np.float32)
return roi_names, values


test_reflectance = open_reflectance_cube(TEST_CUBE)

roi_names, mean_spectra = load_mean_spectra(
MEAN_SPECTRA_CSV,
n_bands=test_reflectance.shape.bands,
)

table = pd.read_csv(BASE_DIR / MEAN_SPECTRA_CSV)
if CLASS_PROPERTY not in table.columns:
raise SystemExit(
f"The exported mean spectra file must include a '{CLASS_PROPERTY}' column. "
"Export spectra from HV Explorer with properties included."
)

class_by_name = dict(zip(table["name"], table[CLASS_PROPERTY]))
class_labels = np.array([class_by_name[name] for name in roi_names])

clf = NearestCentroid()
clf.fit(mean_spectra, class_labels)

line_index = 100
frame_arr = test_reflectance.array_plane(line_index, hs.lines)
pixels_frame = frame_arr.T

predicted_class = clf.predict(pixels_frame)
classes, counts = np.unique(predicted_class, return_counts=True)
print(dict(zip(classes, counts.astype(int))))

Download this script

Mean spectra vs ROI

Mean spectra are compact and easy to inspect, but they do not capture within-ROI variation. Pixel-level ROI training is usually better when texture, gradients, or mixed pixels matter.

Classify With Spectral Angle Mapper

Mean spectra can also be used directly as reference spectra. Spectral angle mapper compares the shape of each pixel spectrum with each reference spectrum and assigns the closest match. This is reference matching, not a trained classifier, so it is useful as a simple baseline or when you only have a small set of representative spectra.

The same preprocessing rule still applies: the reference spectra and cube pixels must be in the same space. If the mean spectra were exported from reflectance images, compare them with reflectance pixels.

When to prefer SAM

Use spectral angle mapper when the class shape matters more than absolute brightness. Use the ROI-trained classifier when you have enough labeled pixels and need the model to learn within-class variation.

class SpectralAngleMapper:
def __init__(self, reference_spectra, reference_labels):
self.classes_ = np.unique(reference_labels)
self.class_to_id_ = {
class_name: class_id
for class_id, class_name in enumerate(self.classes_)
}
self.reference_labels_ = np.array(reference_labels)
self.reference_class_ids_ = np.array([
self.class_to_id_[label]
for label in self.reference_labels_
], dtype=np.uint8)

reference_spectra = np.asarray(reference_spectra, dtype=np.float32)
reference_norm = np.linalg.norm(reference_spectra, axis=1, keepdims=True)
self.reference_spectra_ = reference_spectra / np.maximum(reference_norm, 1e-12)

def predict(self, pixels):
pixels = np.asarray(pixels, dtype=np.float32)
pixel_norm = np.linalg.norm(pixels, axis=1, keepdims=True)
normalized_pixels = pixels / np.maximum(pixel_norm, 1e-12)

# Maximizing cosine similarity is equivalent to minimizing spectral angle.
cosine = normalized_pixels @ self.reference_spectra_.T
best_reference = np.argmax(cosine, axis=1)
return self.reference_class_ids_[best_reference]


test_reflectance = open_reflectance_cube(TEST_CUBE)

roi_names, mean_spectra = load_mean_spectra(
MEAN_SPECTRA_CSV,
n_bands=test_reflectance.shape.bands,
)

table = pd.read_csv(BASE_DIR / MEAN_SPECTRA_CSV)
if CLASS_PROPERTY not in table.columns:
raise SystemExit(
f"The exported mean spectra file must include a '{CLASS_PROPERTY}' column. "
"Export spectra from HV Explorer with properties included."
)

class_by_name = dict(zip(table["name"], table[CLASS_PROPERTY]))
reference_labels = np.array([class_by_name[name] for name in roi_names])

sam = SpectralAngleMapper(mean_spectra, reference_labels)

start_line = 0
n_lines = 300
overlay_alpha = 0.5

stop_line = min(start_line + n_lines, test_reflectance.shape.lines)
preview_crop = test_reflectance[start_line:stop_line, :, :]

classified = predictor(sam)(preview_crop)
label_map = classified.to_numpy_with_interleave(hs.bip)[:, :, 0].astype(np.uint8)

color_map = np.zeros((len(sam.classes_), 3), dtype=np.uint8)
cmap = plt.get_cmap("tab10", len(sam.classes_))
for class_id, _class_name in enumerate(sam.classes_):
color_map[class_id] = (np.array(cmap(class_id)[:3]) * 255).astype(np.uint8)

class_rgb = color_map[label_map]

gray = preview_crop.mean_axis(hs.bands).to_numpy_with_interleave(hs.bip)
gray = np.clip(gray[:, :, 0] * 255, 0, 255).astype(np.uint8)
gray_rgb = np.stack([gray, gray, gray], axis=-1)

preview = (overlay_alpha * class_rgb + (1 - overlay_alpha) * gray_rgb).astype(np.uint8)

legend_handles = [
Patch(
color=color_map[class_id] / 255.0,
label=str(class_name),
)
for class_id, class_name in enumerate(sam.classes_)
]

fig, ax = plt.subplots()
ax.imshow(preview)
ax.set_title("Spectral angle mapper from mean spectra")
ax.axis("off")
ax.legend(
handles=legend_handles,
loc="upper left",
bbox_to_anchor=(1.02, 1.0),
borderaxespad=0,
title="Class",
)
fig.tight_layout()
plt.show()

Download this script

Visualize Classification Results

After training, first apply the saved classifier to the same cube that provided the ROIs. This is a quick sanity check: the overlay should agree with the annotated regions before you trust the model on new data.

A classification preview is easiest to read as an overlay. Use hs.util.predictor() to classify the preview crop lazily, use mean reflectance as a grayscale background, then blend class colors on top of the predicted labels.

class NumericLabelClassifier:
# hs.util.predictor() returns an Image, so the prediction output must be
# numeric image data. This adapter keeps the original string-label
# classifier, but maps class names to uint8 ids before returning labels.
def __init__(self, clf):
self.clf = clf
self.classes_ = np.array(clf.classes_)
self.class_to_id_ = {
class_name: class_id
for class_id, class_name in enumerate(self.classes_)
}

def predict(self, pixels):
labels = self.clf.predict(pixels)
return np.array([self.class_to_id_[label] for label in labels], dtype=np.uint8)


ann_file = load_annotations()
clf = load_classifier_model()

train_reflectance = open_reflectance_cube()

class_colors_rgb = {
annotation_value(annot.properties["type"]): hex_to_rgb(annot.color)
for annot in ann_file.annotations
}

start_line = 0
n_lines = 300
overlay_alpha = 0.5

stop_line = min(start_line + n_lines, train_reflectance.shape.lines)
preview_crop = train_reflectance[start_line:stop_line, :, :]

numeric_clf = NumericLabelClassifier(clf)
classified = predictor(numeric_clf)(preview_crop)
label_map = classified.to_numpy_with_interleave(hs.bip)[:, :, 0].astype(np.uint8)

color_map = np.zeros((len(numeric_clf.classes_), 3), dtype=np.uint8)
for class_id, class_name in enumerate(numeric_clf.classes_):
color_map[class_id] = class_colors_rgb[class_name]

class_rgb = color_map[label_map]

gray = preview_crop.mean_axis(hs.bands).to_numpy_with_interleave(hs.bip)
gray = np.clip(gray[:, :, 0] * 255, 0, 255).astype(np.uint8)
gray_rgb = np.stack([gray, gray, gray], axis=-1)

preview = (overlay_alpha * class_rgb + (1 - overlay_alpha) * gray_rgb).astype(np.uint8)

legend_handles = [
Patch(
color=color_map[class_id] / 255.0,
label=str(class_name),
)
for class_id, class_name in enumerate(numeric_clf.classes_)
]

fig, ax = plt.subplots()
ax.imshow(preview)
ax.set_title("Training cube classification overlay")
ax.axis("off")
ax.legend(
handles=legend_handles,
loc="upper left",
bbox_to_anchor=(1.02, 1.0),
borderaxespad=0,
title="Class",
)
fig.tight_layout()
plt.show()

Download this script

Apply the same model to another cube

The same lazy predictor() pattern can be used on a test cube or any other cube with the same preprocessing and wavelength order. Open that cube with open_reflectance_cube(TEST_CUBE), take the crop you want, and pass it through the same saved classifier before building the overlay.

Classify One Scan Line

To classify one line, convert the line from (bands, samples) to (samples, bands) and pass it to the classifier.

clf = load_classifier_model()

test_reflectance = open_reflectance_cube(TEST_CUBE)

line_index = 100
frame_arr = test_reflectance.array_plane(line_index, hs.lines)
pixels_frame = frame_arr.T

predicted = clf.predict(pixels_frame)
classes, counts = np.unique(predicted, return_counts=True)

print(dict(zip(classes, counts.astype(int))))

Download this script

Scan lines and camera pipelines

This is the same data shape you get when classifying frames from a push-broom stream. Use this direct NumPy pattern for one-off inspection or microbenchmarks.

For camera deployment, use cam.to_hs_image() and hs.util.predictor() to build a lazy camera pipeline. See the stream with a saved classifier example for reference.

Foreground First, Then Classify Objects

For sorting or inspection tasks, it is often better to separate background removal from object classification. First decide whether each pixel is object or background, then classify only object pixels. This example visualizes a test cube preview after both stages.

Training
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC


class NumericForegroundClassifier:
def __init__(self, clf):
self.clf = clf

def predict(self, pixels):
return self.clf.predict(pixels).astype(np.uint8)


reflectance = open_reflectance_cube()
ann_file = load_annotations()

pixels_list = []
labels_list = []

for annot in ann_file.annotations:
class_name = annotation_value(annot.properties["type"])
selected = reflectance.select_mask_from_descriptor(annot.descriptor)
spectra = selected.to_numpy_with_interleave(hs.bip)[:, 0, :]
pixels_list.append(spectra)
labels_list.append(np.full(spectra.shape[0], class_name))

pixels = np.concatenate(pixels_list)
labels = np.concatenate(labels_list)

foreground_labels = labels != "background"

foreground_clf = make_pipeline(
StandardScaler(),
LogisticRegression(C=1.0, class_weight="balanced", max_iter=2000),
)
foreground_clf.fit(pixels, foreground_labels)

object_pixels = pixels[foreground_labels]
object_labels = labels[foreground_labels]

object_clf = make_pipeline(
StandardScaler(),
LinearSVC(C=1.0, class_weight="balanced", dual=False, max_iter=10000),
)
object_clf.fit(object_pixels, object_labels)
Testing
test_reflectance = open_reflectance_cube(TEST_CUBE)

start_line = 0
n_lines = 300
stop_line = min(start_line + n_lines, test_reflectance.shape.lines)
preview_crop = test_reflectance[start_line:stop_line, :, :]

foreground_mask = predictor(NumericForegroundClassifier(foreground_clf))(preview_crop)
foreground_mask = foreground_mask.to_numpy_with_interleave(hs.bip)[:, :, 0].astype(bool)

numeric_object_clf = NumericLabelClassifier(object_clf)
object_label_map = predictor(numeric_object_clf)(preview_crop)
object_label_map = object_label_map.to_numpy_with_interleave(hs.bip)[:, :, 0].astype(np.uint8)

label_map = np.zeros_like(object_label_map, dtype=np.uint8)
label_map[foreground_mask] = object_label_map[foreground_mask] + 1
classes = np.array(["background", *numeric_object_clf.classes_])

plt.imshow(label_map, aspect="auto", interpolation="nearest")
plt.ylabel("Line")
plt.xlabel("Sample")
plt.title("Foreground first classification")
plt.colorbar(ticks=np.arange(len(classes)), label="Class id")
plt.show()

Download this script

This can improve both speed and results when most line pixels are background.

Simple Foreground Threshold

If objects are clearly brighter or darker than the background, a mean-reflectance threshold can be enough. This example visualizes the foreground mask for a test cube preview.

brightness = pixels.mean(axis=1)
foreground_labels = labels != "background"

background_mean = brightness[~foreground_labels].mean()
foreground_mean = brightness[foreground_labels].mean()
threshold = (background_mean + foreground_mean) / 2
foreground_is_brighter = foreground_mean >= background_mean

test_reflectance = open_reflectance_cube(TEST_CUBE)

start_line = 0
n_lines = 300
stop_line = min(start_line + n_lines, test_reflectance.shape.lines)
preview_crop = test_reflectance[start_line:stop_line, :, :]
brightness_map = preview_crop.mean_axis(hs.bands).to_numpy_with_interleave(hs.bip)[:, :, 0]

if foreground_is_brighter:
foreground_mask = brightness_map >= threshold
else:
foreground_mask = brightness_map <= threshold

plt.imshow(foreground_mask.astype(np.uint8), aspect="auto", interpolation="nearest", cmap="gray")
plt.xlabel("Sample")
plt.ylabel("Line")
plt.title("Foreground mask from mean-reflectance threshold")
plt.show()

Download this script

This is faster than a trained foreground classifier, but it only works when brightness separates foreground and background reliably.