Models Overview¶
This section covers several of Kornia’s built-in models for key computer vision tasks. Each model is documented with its respective API and example usage.
RTDETRDetectorBuilder¶
The RTDETRDetectorBuilder class is a builder for constructing a detection model based on the RT-DETR architecture, which is designed for real-time object detection. It is capable of detecting multiple objects within an image and provides efficient inference suitable for real-world applications.
Key Methods:
build: Constructs and returns an instance of the RTDETR detection model.
save: Saves the processed image or results after applying the detection model.
- class kornia.models.detection.rtdetr.RTDETRDetectorBuilder¶
Bases:
object
A builder class for constructing RT-DETR object detection models.
- This class provides static methods to:
Build an object detection model from a model name or configuration.
Export the model to ONNX format for inference.
images = kornia.utils.sample.get_sample_images() model = RTDETRDetectorBuilder.build() model.save(images)
Example
The following code demonstrates how to use RTDETRDetectorBuilder to detect objects in an image:
import kornia image = kornia.utils.sample.get_sample_images()[0][None] model = kornia.models.detection.rtdetr.RTDETRDetectorBuilder.build() model.save(image)
- static build(model_name=None, config=None, pretrained=True, image_size=None, confidence_threshold=None, confidence_filtering=None)¶
Builds and returns an RT-DETR object detector model.
Either model_name or config must be provided. If neither is provided, a default pretrained model (rtdetr_r18vd) will be built.
- Parameters:
model_name (
Optional
[str
], optional) – Name of the RT-DETR model to load. Can be one of the available pretrained models. Including ‘rtdetr_r18vd’, ‘rtdetr_r34vd’, ‘rtdetr_r50vd_m’, ‘rtdetr_r50vd’, ‘rtdetr_r101vd’. Default:None
config (
Optional
[RTDETRConfig
], optional) – A custom configuration object for building the RT-DETR model. Default:None
pretrained (
bool
, optional) – Whether to load a pretrained version of the model (applies when model_name is provided). Default:True
image_size (
Optional
[int
], optional) – The size to which input images will be resized during preprocessing. If None, no resizing will be inferred from config file. Recommended scales include [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]. Default:None
- Return type:
ObjectDetector
- Returns:
- ObjectDetector
An object detector instance initialized with the specified model, preprocessor, and post-processor.
DexiNedBuilder¶
The DexiNedBuilder class implements a state-of-the-art edge detection model based on DexiNed, which excels at detecting fine-grained edges in images. This model is well-suited for tasks like medical imaging, object contour detection, and more.
Key Methods:
build: Builds and returns an instance of the DexiNed edge detection model.
save: Saves the detected edges for further processing or visualization.
- class kornia.models.edge_detection.dexined.DexiNedBuilder¶
Bases:
object
DexiNedBuilder is a class that builds a DexiNed model.
images = kornia.utils.sample.get_sample_images() model = DexiNedBuilder.build() model.save(images)
Example
The following code shows how to use the DexiNedBuilder to detect edges in an image:
import kornia image = kornia.utils.sample.get_sample_images()[0][None] model = kornia.models.edge_detection.dexined.DexiNedBuilder.build() model.save(image)
- static build(model_name='dexined', pretrained=True, image_size=352)¶
- Return type:
EdgeDetector
SegmentationModelsBuilder¶
The SegmentationModelsBuilder class offers a flexible API for implementing and running various segmentation models. It supports a variety of architectures such as UNet, FPN, and others, making it highly adaptable for tasks like semantic segmentation, instance segmentation, and more.
Key Methods:
__init__: Initializes a segmentation model based on the chosen architecture (e.g., UNet, DeepLabV3, etc.).
forward: Runs inference on an input tensor and returns segmented output.
Parameters:
model_name: (str) Name of the segmentation architecture to use, e.g., “Unet”, “DeepLabV3”.
classes: (int) The number of output classes for segmentation.
- class kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder¶
Bases:
object
Example
Here’s an example of how to use SegmentationModelsBuilder for binary segmentation:
import kornia input_tensor = kornia.utils.sample.get_sample_images()[0][None] model = kornia.models.segmentation.segmentation_models.SegmentationModelsBuilder.build() segmented_output = model(input_tensor) print(segmented_output.shape)
- static build(model_name='Unet', encoder_name='resnet34', encoder_weights='imagenet', in_channels=3, classes=1, activation='softmax', **kwargs)¶
SegmentationModel is a module that wraps a segmentation model.
This module uses SegmentationModel library for segmentation.
- Parameters:
model_name (
str
, optional) – Name of the model to use. Valid options are: “Unet”, “UnetPlusPlus”, “MAnet”, “LinkNet”, “FPN”, “PSPNet”, “PAN”, “DeepLabV3”, “DeepLabV3Plus”. Default:"Unet"
encoder_name (
str
, optional) – Name of the encoder to use. Default:"resnet34"
encoder_depth – Depth of the encoder.
encoder_weights (
Optional
[str
], optional) – Weights of the encoder. Default:"imagenet"
decoder_channels – Number of channels in the decoder.
in_channels (
int
, optional) – Number of channels in the input. Default:3
classes (
int
, optional) – Number of classes to predict. Default:1
**kwargs (
Any
) – Additional arguments to pass to the model. Detailed arguments can be found at: https://github.com/qubvel-org/segmentation_models.pytorch/tree/main/segmentation_models_pytorch/decoders
- Return type:
SemanticSegmentation
Note
Only encoder weights are available. Pretrained weights for the whole model are not available.
- static get_preprocessing_pipeline(preproc_params)¶
- Return type:
BoxMotTracker¶
The BoxMotTracker class is used for multi-object tracking in video streams. It is designed to track bounding boxes of objects across multiple frames, supporting various tracking algorithms for object detection and tracking continuity.
Key Methods:
__init__: Initializes the multi-object tracker.
update: Updates the tracker with a new image frame.
save: Saves the tracked object data or visualization for post-processing.
Parameters:
max_lost: (int) The maximum number of frames where an object can be lost before it is removed from the tracker.
- class kornia.models.tracking.boxmot_tracker.BoxMotTracker(detector='rtdetr_r18vd', tracker_model_name='DeepOCSORT', tracker_model_weights='osnet_x0_25_msmt17.pt', device='cpu', fp16=False, **kwargs)¶
Bases:
object
BoxMotTracker is a module that wraps a detector and a tracker model.
This module uses BoxMot library for tracking.
- Parameters:
detector (
Union
[ObjectDetector
,str
], optional) – ObjectDetector: The detector model. Default:"rtdetr_r18vd"
tracker_model_name (
str
, optional) – The name of the tracker model. Valid options are: - “BoTSORT” - “DeepOCSORT” - “OCSORT” - “HybridSORT” - “ByteTrack” - “StrongSORT” - “ImprAssoc” Default:"DeepOCSORT"
tracker_model_weights (
str
, optional) – Path to the model weights for ReID (Re-Identification). Default:"osnet_x0_25_msmt17.pt"
device (
str
, optional) – Device on which to run the model (e.g., ‘cpu’ or ‘cuda’). Default:"cpu"
fp16 (
bool
, optional) – Whether to use half-precision (fp16) for faster inference on compatible devices. Default:False
per_class – Whether to perform per-class tracking
track_high_thresh – High threshold for detection confidence. Detections above this threshold are used in the first association round.
track_low_thresh – Low threshold for detection confidence. Detections below this threshold are ignored.
new_track_thresh – Threshold for creating a new track. Detections above this threshold will be considered as potential new tracks.
track_buffer – Number of frames to keep a track alive after it was last detected.
match_thresh – Threshold for the matching step in data association.
proximity_thresh – Threshold for IoU (Intersection over Union) distance in first-round association.
appearance_thresh – Threshold for appearance embedding distance in the ReID module.
cmc_method – Method for correcting camera motion. Options include “sof” (simple optical flow).
frame_rate – Frame rate of the video being processed. Used to scale the track buffer size.
fuse_first_associate – Whether to fuse appearance and motion information during the first association step.
with_reid – Whether to use ReID (Re-Identification) features for association.
import kornia image = kornia.utils.sample.get_sample_images()[0][None] model = BoxMotTracker() for i in range(4): # At least 4 frames are needed to initialize the tracking position model.update(image) model.save(image)
Note
At least 4 frames are needed to initialize the tracking position.
Example
The following example demonstrates how to track objects across multiple frames using BoxMotTracker:
import kornia image = kornia.utils.sample.get_sample_images()[0][None] model = kornia.models.tracking.boxmot_tracker.BoxMotTracker() for i in range(4): model.update(image) # Update the tracker with new frames model.save(image) # Save the tracking result
- save(image, show_trajectories=True, directory=None)¶
Save the model to ONNX format.
- update(image)¶
Update the tracker with a new image.
—
Note
This documentation provides detailed information about each model class, its methods, and usage examples. For further details on individual methods and arguments, refer to the respective code documentation.