Models Overview¶

This section covers several of Kornia’s built-in models for key computer vision tasks. Each model is documented with its respective API and example usage.

RTDETRDetectorBuilder¶

The RTDETRDetectorBuilder class is a builder for constructing a detection model based on the RT-DETR architecture, which is designed for real-time object detection. It is capable of detecting multiple objects within an image and provides efficient inference suitable for real-world applications.

Key Methods:

build: Constructs and returns an instance of the RTDETR detection model.
save: Saves the processed image or results after applying the detection model.

class kornia.contrib.object_detection.RTDETRDetectorBuilder[source]¶

Bases: object

A builder class for constructing RT-DETR object detection models.

This class provides static methods to:

Build an object detection model from a model name or configuration.
Export the model to ONNX format for inference.

Note

To use this model, load image tensors and call model.save(images).

Example

The following code demonstrates how to use RTDETRDetectorBuilder to detect objects in an image:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.contrib.object_detection.RTDETRDetectorBuilder.build()
model.save(image)

static build(model_name=None, config=None, pretrained=True, image_size=None, confidence_threshold=None, confidence_filtering=None)[source]¶

Build and returns an RT-DETR object detector model.

Either model_name or config must be provided. If neither is provided, a default pretrained model (rtdetr_r18vd) will be built.

Parameters:

model_name (Optional[str], optional) – Name of the RT-DETR model to load. Can be one of the available pretrained models. Including ‘rtdetr_r18vd’, ‘rtdetr_r34vd’, ‘rtdetr_r50vd_m’, ‘rtdetr_r50vd’, ‘rtdetr_r101vd’. Default: None
config (Optional[Any], optional) – A custom configuration object for building the RT-DETR model. Default: None
pretrained (bool, optional) – Whether to load a pretrained version of the model (applies when model_name is provided). Default: True
image_size (Optional[int], optional) – The size to which input images will be resized during preprocessing. If None, no resizing will be inferred from config file. Recommended scales include [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]. Default: None
confidence_threshold (Optional[float], optional) – Threshold to filter results based on confidence scores. Default: None
confidence_filtering (Optional[bool], optional) – Whether to filter results based on confidence scores. Default: None

Return type:

ObjectDetector

Returns:

ObjectDetector: An object detector instance initialized with the specified model, preprocessor, and post-processor.

EdgeDetectorBuilder¶

The EdgeDetectorBuilder class implements a state-of-the-art edge detection model based on DexiNed, which excels at detecting fine-grained edges in images. This model is well-suited for tasks like medical imaging, object contour detection, and more.

Key Methods:

build: Builds and returns an instance of the DexiNed edge detection model.
save: Saves the detected edges for further processing or visualization.

class kornia.contrib.edge_detection.EdgeDetectorBuilder[source]¶

Bases: object

EdgeDetectorBuilder is a class that builds an edge detection model.

This is a high-level API that builds edge detection models like kornia.models.DexiNed and wraps them with EdgeDetector.

Note

To use this model, load image tensors and call model.save(images).

Example

The following code shows how to use the EdgeDetectorBuilder to detect edges in an image:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.contrib.edge_detection.EdgeDetectorBuilder.build()
model.save(image)

static build(model_name='dexined', pretrained=True, image_size=352)[source]¶

Build an edge detection model.

Parameters:

model_name (str, optional) – Name of the model to build. Currently only “dexined” is supported. Default: "dexined"
pretrained (bool, optional) – If True, loads pretrained weights. Default: True
image_size (int, optional) – Size to which input images will be resized during preprocessing. Default: 352

Return type:

EdgeDetector

Returns:

EdgeDetector instance configured with the specified model.

Example

>>> detector = EdgeDetectorBuilder.build(pretrained=True, image_size=352)
>>> img = torch.rand(1, 3, 320, 320)
>>> out = detector(img)

SegmentationModelsBuilder¶

The SegmentationModelsBuilder class offers a flexible API for implementing and running various segmentation models. It supports a variety of architectures such as UNet, FPN, and others, making it highly adaptable for tasks like semantic segmentation, instance segmentation, and more.

Key Methods:

__init__: Initializes a segmentation model based on the chosen architecture (e.g., UNet, DeepLabV3, etc.).
forward: Runs inference on an input tensor and returns segmented output.

Parameters:

model_name: (str) Name of the segmentation architecture to use, e.g., “Unet”, “DeepLabV3”.
classes: (int) The number of output classes for segmentation.

class kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder[source]¶

Bases: object

Provide a factory to build various semantic segmentation models.

This builder simplifies the creation of models like UNet or DeepLabV3 by providing a unified interface for configuration and weight loading.

Example

Here’s an example of how to use SegmentationModelsBuilder for binary segmentation:

import kornia
input_tensor = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder.build()
segmented_output = model(input_tensor)
print(segmented_output.shape)

static build(model_name='Unet', encoder_name='resnet34', encoder_weights='imagenet', in_channels=3, classes=1, activation='softmax', **kwargs)[source]¶

SegmentationModel is a module that wraps a segmentation model.

This module uses SegmentationModel library for segmentation.

Parameters:

model_name (str, optional) – Name of the model to use. Valid options are: “Unet”, “UnetPlusPlus”, “MAnet”, “LinkNet”, “FPN”, “PSPNet”, “PAN”, “DeepLabV3”, “DeepLabV3Plus”. Default: "Unet"
encoder_name (str, optional) – Name of the encoder to use. Default: "resnet34"
encoder_depth – Depth of the encoder.
encoder_weights (Optional[str], optional) – Weights of the encoder. Default: "imagenet"
decoder_channels – Number of channels in the decoder.
in_channels (int, optional) – Number of channels in the input. Default: 3
classes (int, optional) – Number of classes to predict. Default: 1
activation (str, optional) – Type of activation layer. Default: "softmax"
**kwargs (Any) – Additional arguments to pass to the model. Detailed arguments can be found at: https://github.com/qubvel-org/segmentation_models.pytorch/tree/main/segmentation_models_pytorch/decoders

Return type:

SemanticSegmentation

Note

Only encoder weights are available. Pretrained weights for the whole model are not available.

static get_preprocessing_pipeline(preproc_params)[source]¶

Build the preprocessing pipeline expected by a segmentation model.

Parameters:: preproc_params (dict[str, Any]) – Dictionary from the segmentation-model metadata. It must describe the input color space, value range, mean, and standard deviation used by the pretrained encoder.
Return type:: ImageSequential
Returns:: ImageSequential containing ONNX-friendly color conversion, rescaling, and normalization steps.

BoxMotTracker¶

The BoxMotTracker class is used for multi-object tracking in video streams. It is designed to track bounding boxes of objects across multiple frames, supporting various tracking algorithms for object detection and tracking continuity.

Key Methods:

__init__: Initializes the multi-object tracker.
update: Updates the tracker with a new image frame.
save: Saves the tracked object data or visualization for post-processing.

Parameters:

max_lost: (int) The maximum number of frames where an object can be lost before it is removed from the tracker.

class kornia.contrib.boxmot_tracker.BoxMotTracker(detector='rtdetr_r18vd', tracker_model_name='DeepOCSORT', tracker_model_weights='osnet_x0_25_msmt17.pt', device='cpu', fp16=False, **kwargs)[source]¶

Bases: object

BoxMotTracker is a module that wraps a detector and a tracker model.

This module uses BoxMot library for tracking.

Parameters:

detector (Union[ObjectDetector, str], optional) – The detector model. Default: "rtdetr_r18vd"
tracker_model_name (str, optional) – The name of the tracker model. Valid options are: - “BoTSORT” - “DeepOCSORT” - “OCSORT” - “HybridSORT” - “ByteTrack” - “StrongSORT” - “ImprAssoc” Default: "DeepOCSORT"
tracker_model_weights (str, optional) – Path to the model weights for ReID (Re-Identification). Default: "osnet_x0_25_msmt17.pt"
device (str, optional) – Union[str, torch.device, None] on which to run the model (e.g., ‘cpu’ or ‘cuda’). Default: "cpu"
fp16 (bool, optional) – Whether to use half-precision (fp16) for faster inference on compatible devices. Default: False
per_class – Whether to perform per-class tracking.
track_high_thresh – High threshold for detection confidence. Detections above this threshold are used in the first association round.
track_low_thresh – Low threshold for detection confidence. Detections below this threshold are ignored.
new_track_thresh – Threshold for creating a new track. Detections above this threshold will be considered as potential new tracks.
track_buffer – Number of frames to keep a track alive after it was last detected.
match_thresh – Threshold for the matching step in data association.
proximity_thresh – Threshold for IoU (Intersection over Union) distance in first-round association.
appearance_thresh – Threshold for appearance embedding distance in the ReID module.
cmc_method – Method for correcting camera motion. Options include “sof” (simple optical flow).
frame_rate – Frame rate of the video being processed. Used to scale the track buffer size.
fuse_first_associate – Whether to fuse appearance and motion information during the first association step.
with_reid – Whether to use ReID (Re-Identification) features for association.

Note

To use this tracker, load image tensors (shape: (1, C, H, W)) and call model.update(image) for at least 4 frames before calling model.save(image).

Note

At least 4 frames are needed to initialize the tracking position.

Example

The following example demonstrates how to track objects across multiple frames using BoxMotTracker:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.contrib.boxmot_tracker.BoxMotTracker()
for i in range(4):
    model.update(image)  # Update the tracker with new frames
model.save(image)       # Save the tracking result

name: str = 'boxmot_tracker'¶

save(image, show_trajectories=True, directory=None)[source]¶

Save the model to ONNX format.

Parameters:

image (Tensor) – The input image.
show_trajectories (bool, optional) – Whether to visualize trajectories. Default: True
directory (Optional[str], optional) – Where to save the file(s). Default: None

Return type:

None

update(image)[source]¶

Update the tracker with a new image.

Parameters:: image (Tensor) – The input image.
Return type:: None

visualize(image, show_trajectories=True)[source]¶

Visualize the results of the tracker.

Parameters:

image (Tensor) – The input image.
show_trajectories (bool, optional) – Whether to show the trajectories. Default: True

Return type:

Tensor

Returns:

The image with the results of the tracker.

—

Note

This documentation provides detailed information about each model class, its methods, and usage examples. For further details on individual methods and arguments, refer to the respective code documentation.