Models Overview

This section covers several of Kornia’s built-in models for key computer vision tasks. Each model is documented with its respective API and example usage.

RTDETRDetectorBuilder

The RTDETRDetectorBuilder class is a builder for constructing a detection model based on the RT-DETR architecture, which is designed for real-time object detection. It is capable of detecting multiple objects within an image and provides efficient inference suitable for real-world applications.

Key Methods:

  • build: Constructs and returns an instance of the RTDETR detection model.

  • save: Saves the processed image or results after applying the detection model.

class kornia.models.detection.rtdetr.RTDETRDetectorBuilder

Bases: object

A builder class for constructing RT-DETR object detection models.

This class provides static methods to:
  • Build an object detection model from a model name or configuration.

  • Export the model to ONNX format for inference.

images = kornia.utils.sample.get_sample_images()
model = RTDETRDetectorBuilder.build()
model.save(images)

Example

The following code demonstrates how to use RTDETRDetectorBuilder to detect objects in an image:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.models.detection.rtdetr.RTDETRDetectorBuilder.build()
model.save(image)
static build(model_name=None, config=None, pretrained=True, image_size=None, confidence_threshold=None, confidence_filtering=None)

Builds and returns an RT-DETR object detector model.

Either model_name or config must be provided. If neither is provided, a default pretrained model (rtdetr_r18vd) will be built.

Parameters:
  • model_name (Optional[str], optional) – Name of the RT-DETR model to load. Can be one of the available pretrained models. Including ‘rtdetr_r18vd’, ‘rtdetr_r34vd’, ‘rtdetr_r50vd_m’, ‘rtdetr_r50vd’, ‘rtdetr_r101vd’. Default: None

  • config (Optional[RTDETRConfig], optional) – A custom configuration object for building the RT-DETR model. Default: None

  • pretrained (bool, optional) – Whether to load a pretrained version of the model (applies when model_name is provided). Default: True

  • image_size (Optional[int], optional) – The size to which input images will be resized during preprocessing. If None, no resizing will be inferred from config file. Recommended scales include [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]. Default: None

Return type:

ObjectDetector

Returns:

ObjectDetector

An object detector instance initialized with the specified model, preprocessor, and post-processor.

DexiNedBuilder

The DexiNedBuilder class implements a state-of-the-art edge detection model based on DexiNed, which excels at detecting fine-grained edges in images. This model is well-suited for tasks like medical imaging, object contour detection, and more.

Key Methods:

  • build: Builds and returns an instance of the DexiNed edge detection model.

  • save: Saves the detected edges for further processing or visualization.

class kornia.models.edge_detection.dexined.DexiNedBuilder

Bases: object

DexiNedBuilder is a class that builds a DexiNed model.

images = kornia.utils.sample.get_sample_images()
model = DexiNedBuilder.build()
model.save(images)

Example

The following code shows how to use the DexiNedBuilder to detect edges in an image:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.models.edge_detection.dexined.DexiNedBuilder.build()
model.save(image)
static build(model_name='dexined', pretrained=True, image_size=352)
Return type:

EdgeDetector

SegmentationModelsBuilder

The SegmentationModelsBuilder class offers a flexible API for implementing and running various segmentation models. It supports a variety of architectures such as UNet, FPN, and others, making it highly adaptable for tasks like semantic segmentation, instance segmentation, and more.

Key Methods:

  • __init__: Initializes a segmentation model based on the chosen architecture (e.g., UNet, DeepLabV3, etc.).

  • forward: Runs inference on an input tensor and returns segmented output.

Parameters:

  • model_name: (str) Name of the segmentation architecture to use, e.g., “Unet”, “DeepLabV3”.

  • classes: (int) The number of output classes for segmentation.

class kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder

Bases: object

Example

Here’s an example of how to use SegmentationModelsBuilder for binary segmentation:

import kornia
input_tensor = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder.build()
segmented_output = model(input_tensor)
print(segmented_output.shape)
static build(model_name='Unet', encoder_name='resnet34', encoder_weights='imagenet', in_channels=3, classes=1, activation='softmax', **kwargs)

SegmentationModel is a module that wraps a segmentation model.

This module uses SegmentationModel library for segmentation.

Parameters:
  • model_name (str, optional) – Name of the model to use. Valid options are: “Unet”, “UnetPlusPlus”, “MAnet”, “LinkNet”, “FPN”, “PSPNet”, “PAN”, “DeepLabV3”, “DeepLabV3Plus”. Default: "Unet"

  • encoder_name (str, optional) – Name of the encoder to use. Default: "resnet34"

  • encoder_depth – Depth of the encoder.

  • encoder_weights (Optional[str], optional) – Weights of the encoder. Default: "imagenet"

  • decoder_channels – Number of channels in the decoder.

  • in_channels (int, optional) – Number of channels in the input. Default: 3

  • classes (int, optional) – Number of classes to predict. Default: 1

  • **kwargs (Any) – Additional arguments to pass to the model. Detailed arguments can be found at: https://github.com/qubvel-org/segmentation_models.pytorch/tree/main/segmentation_models_pytorch/decoders

Return type:

SemanticSegmentation

Note

Only encoder weights are available. Pretrained weights for the whole model are not available.

static get_preprocessing_pipeline(preproc_params)
Return type:

ImageSequential

BoxMotTracker

The BoxMotTracker class is used for multi-object tracking in video streams. It is designed to track bounding boxes of objects across multiple frames, supporting various tracking algorithms for object detection and tracking continuity.

Key Methods:

  • __init__: Initializes the multi-object tracker.

  • update: Updates the tracker with a new image frame.

  • save: Saves the tracked object data or visualization for post-processing.

Parameters:

  • max_lost: (int) The maximum number of frames where an object can be lost before it is removed from the tracker.

class kornia.models.tracking.boxmot_tracker.BoxMotTracker(detector='rtdetr_r18vd', tracker_model_name='DeepOCSORT', tracker_model_weights='osnet_x0_25_msmt17.pt', device='cpu', fp16=False, **kwargs)

Bases: object

BoxMotTracker is a module that wraps a detector and a tracker model.

This module uses BoxMot library for tracking.

Parameters:
  • detector (Union[ObjectDetector, str], optional) – ObjectDetector: The detector model. Default: "rtdetr_r18vd"

  • tracker_model_name (str, optional) – The name of the tracker model. Valid options are: - “BoTSORT” - “DeepOCSORT” - “OCSORT” - “HybridSORT” - “ByteTrack” - “StrongSORT” - “ImprAssoc” Default: "DeepOCSORT"

  • tracker_model_weights (str, optional) – Path to the model weights for ReID (Re-Identification). Default: "osnet_x0_25_msmt17.pt"

  • device (str, optional) – Device on which to run the model (e.g., ‘cpu’ or ‘cuda’). Default: "cpu"

  • fp16 (bool, optional) – Whether to use half-precision (fp16) for faster inference on compatible devices. Default: False

  • per_class – Whether to perform per-class tracking

  • track_high_thresh – High threshold for detection confidence. Detections above this threshold are used in the first association round.

  • track_low_thresh – Low threshold for detection confidence. Detections below this threshold are ignored.

  • new_track_thresh – Threshold for creating a new track. Detections above this threshold will be considered as potential new tracks.

  • track_buffer – Number of frames to keep a track alive after it was last detected.

  • match_thresh – Threshold for the matching step in data association.

  • proximity_thresh – Threshold for IoU (Intersection over Union) distance in first-round association.

  • appearance_thresh – Threshold for appearance embedding distance in the ReID module.

  • cmc_method – Method for correcting camera motion. Options include “sof” (simple optical flow).

  • frame_rate – Frame rate of the video being processed. Used to scale the track buffer size.

  • fuse_first_associate – Whether to fuse appearance and motion information during the first association step.

  • with_reid – Whether to use ReID (Re-Identification) features for association.

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = BoxMotTracker()
for i in range(4):  # At least 4 frames are needed to initialize the tracking position
    model.update(image)
model.save(image)

Note

At least 4 frames are needed to initialize the tracking position.

Example

The following example demonstrates how to track objects across multiple frames using BoxMotTracker:

import kornia
image = kornia.utils.sample.get_sample_images()[0][None]
model = kornia.models.tracking.boxmot_tracker.BoxMotTracker()
for i in range(4):
    model.update(image)  # Update the tracker with new frames
model.save(image)       # Save the tracking result
name: str = 'boxmot_tracker'
save(image, show_trajectories=True, directory=None)

Save the model to ONNX format.

Parameters:

image (Tensor) – The input image.

Return type:

None

update(image)

Update the tracker with a new image.

Parameters:

image (Tensor) – The input image.

Return type:

None

visualize(image, show_trajectories=True)

Visualize the results of the tracker.

Parameters:
  • image (Tensor) – The input image.

  • show_trajectories (bool, optional) – Whether to show the trajectories. Default: True

Return type:

Tensor

Returns:

The image with the results of the tracker.

Note

This documentation provides detailed information about each model class, its methods, and usage examples. For further details on individual methods and arguments, refer to the respective code documentation.