kornia.contrib#

Models#

Base#

class kornia.contrib.models.base.ModelBase[source]#

Abstract model class with some utilities function.

compile(*, fullgraph=False, dynamic=False, backend='inductor', mode=None, options={}, disable=False)[source]#
Return type:

ModelBase[ModelConfig]

abstract static from_config(config)[source]#

This function should build/load the model.

Parameters:

config (TypeVar(ModelConfig)) – The specifications for the model be build/loaded

Return type:

ModelBase[TypeVar(ModelConfig)]

load_checkpoint(checkpoint, device=None)[source]#

Load checkpoint from a given url or file.

Parameters:
  • checkpoint (str) – The url or filepath for the respective checkpoint

  • device (torch.device | None, optional) – The desired device to load the weights and move the model Default: None

Return type:

None

training: bool#

Structures#

class kornia.contrib.models.SegmentationResults(logits, scores, mask_threshold=0.0)[source]#

Encapsulate the results obtained by a Segmentation model.

Parameters:
  • logits (Tensor) – Results logits with shape \((B, C, H, W)\), where \(C\) refers to the number of predicted masks

  • scores (Tensor) – The scores from the logits. Shape \((B,)\)

  • mask_threshold (float, optional) – The threshold value to generate the binary_masks from the logits Default: 0.0

property binary_masks: Tensor#

Binary mask generated from logits considering the mask_threshold.

Shape will be the same of logits \((B, C, H, W)\) where \(C\) is the number masks predicted.

logits: Tensor#
mask_threshold: float = 0.0#
original_res_logits(input_size, original_size, image_size_encoder)[source]#

Remove padding and upscale the logits to the original image size.

Resize to image encoder input -> remove padding (bottom and right) -> Resize to original size

Parameters:
  • input_size – The size of the image input to the model, in (H, W) format. Used to remove padding.

  • original_size – The original size of the image before resizing for input to the model, in (H, W) format.

  • image_size_encoder – The size of the input image for image encoder, in (H, W) format. Used to resize the logits back to encoder resolution before remove the padding.

Returns:

Batched logits in \((K, C, H, W)\) format, where (H, W) is given by original_size.

scores: Tensor#
squeeze(dim=0)[source]#

Realize a squeeze for the dim given for all properties.

Return type:

SegmentationResults

class kornia.contrib.models.Prompts(points=None, boxes=None, masks=None)[source]#

Encapsulate the prompts inputs for a Model.

Parameters:
  • points (optional) – A tuple with the keypoints (coordinates x, y) and their respective labels. Shape \((K, N, 2)\) for the keypoints, and \((K, N)\) Default: None

  • boxes (optional) – Batched box inputs, with shape \((K, 4)\). Expected to be into xyxy format. Default: None

  • masks (optional) – Batched mask prompts to the model with shape \((K, 1, H, W)\) Default: None

boxes: Tensor | None = None#
property keypoints: Tensor | None#

The keypoints from the points

property keypoints_labels: Tensor | None#

The keypoints labels from the points

masks: Tensor | None = None#
points: tuple[Tensor, Tensor] | None = None#

VisualPrompter#

class kornia.contrib.visual_prompter.VisualPrompter(config=SamConfig(model_type='vit_h', checkpoint='https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth'), device=None, dtype=None)[source]#

This class allow the user to run multiple query with multiple prompts for a model.

At the moment, we just support the SAM model. The model is loaded based on the given config.

For default the images are transformed to have their long side with size of the image_encoder.img_size. This Prompter class ensure to transform the images and the prompts before prediction. Also, the image is passed automatically for the method preprocess_image, which is responsible for normalize the image and pad it to have the right size for the SAM model \(( ext{image_encoder.img_size}, ext{image_encoder.img_size})\). For default the image is normalized by the mean and standard deviation of the SAM dataset values.

Parameters:
  • config (SamConfig, optional) – A model config to generate the model. Now just the SAM model is supported. Default: SamConfig(model_type='vit_h', checkpoint='https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth')

  • device (torch.device | None, optional) – The desired device to use the model. Default: None

  • dtype (torch.dtype | None, optional) – The desired dtype to use the model. Default: None

Example

>>> # prompter = VisualPrompter() # Will load the vit h for default
>>> # You can load a custom SAM type for modifying the config
>>> prompter = VisualPrompter(SamConfig('vit_b'))
>>> image = torch.rand(3, 25, 30)
>>> prompter.set_image(image)
>>> boxes = Boxes(
...    torch.tensor(
...         [[[[0, 0], [0, 10], [10, 0], [10, 10]]]],
...         device=prompter.device,
...         dtype=torch.float32
...    ),
...    mode='xyxy'
... )
>>> prediction = prompter.predict(boxes=boxes)
>>> prediction.logits.shape
torch.Size([1, 3, 256, 256])
compile(*, fullgraph=False, dynamic=False, backend='inductor', mode=None, options={}, disable=False)[source]#

Applies torch.compile(…)/dynamo API into the VisualPrompter API.

Note

For more information about the dynamo API check the official docs https://pytorch.org/docs/stable/generated/torch.compile.html

Parameters:
  • fullgraph (bool, optional) – Whether it is ok to break model into several subgraphs Default: False

  • dynamic (bool, optional) – Use dynamic shape tracing Default: False

  • backend (str, optional) – backend to be used Default: 'inductor'

  • mode (str | None, optional) – Can be either “default”, “reduce-overhead” or “max-autotune” Default: None

  • options (dict[Any, Any], optional) – A dictionary of options to pass to the backend. Default: {}

  • disable (bool, optional) – Turn torch.compile() into a no-op for testing Default: False

Return type:

None

Example

>>> # prompter = VisualPrompter()
>>> # prompter.compile() # You should have torch >= 2.0.0 installed
>>> # Use the prompter methods ...
predict(keypoints=None, keypoints_labels=None, boxes=None, masks=None, multimask_output=True, output_original_size=True)[source]#

Predict masks for the given image based on the input prompts.

Parameters:
  • keypoints (Keypoints | Tensor | None, optional) – Point prompts to the model. Each point is in (X,Y) in pixels. Shape \((K, N, 2)\). Where N is the number of points and K the number of prompts. Default: None

  • keypoint_labels – Labels for the point prompts. 1 indicates a foreground point and 0 indicates a background point. Shape \((K, N)\). Where N is the number of points, and K the number of prompts.

  • boxes (Boxes | Tensor | None, optional) – A box prompt to the model. If a tensor, should be in a xyxy mode. Shape \((K, 4)\) Default: None

  • masks (Tensor | None, optional) – A low resolution mask input to the model, typically coming from a previous prediction iteration. Has shape \((K, 1, H, W)\), where for SAM, H=W=256. Default: None

  • multimask_output (bool, optional) – If true, the model will return three masks. For ambiguous input prompts (such as a single click), this will often produce better masks than a single prediction. If only a single mask is needed, the model’s predicted quality score can be used to select the best mask. For non-ambiguous prompts, such as multiple input prompts, multimask_output=False can give better results. Default: True

  • output_original_size (bool, optional) – If true, the logits of SegmentationResults will be post-process to match the original input image size. Default: True

Return type:

SegmentationResults

Returns:

A prediction with the logits and scores (IoU of each predicted mask)

preprocess_image(x, mean=None, std=None)[source]#

Normalize and pad a tensor.

For normalize the tensor: will priorize the mean and std passed as argument, if None will use the default Sam Dataset values.

For pad the tensor: Will pad the tensor into the right and bottom to match with the size of self.model.image_encoder.img_size

Parameters:
  • x (Tensor) – The image to be preprocessed

  • mean (Tensor | None, optional) – Mean for each channel. Default: None

  • std (Tensor | None, optional) – Standard deviations for each channel. Default: None

Return type:

Tensor

Returns:

The image preprocessed (normalized if has mean and str available and padded to encoder size)

preprocess_prompts(keypoints=None, keypoints_labels=None, boxes=None, masks=None)[source]#

Validate and preprocess the given prompts to be aligned with the input image.

Return type:

Prompts

set_image(image, mean=None, std=None)[source]#

Set the embeddings from the given image with image_decoder of the model.

Prepare the given image with the selected transforms and the preprocess method.

Parameters:

image (Tensor) – RGB image. Normally images with range of [0-1], the model preprocess normalize the pixel values with the mean and std defined in its initialization. Expected to be into a float32 dtype. Shape \((3, H, W)\).

Return type:

None

Edge Detection#

class kornia.contrib.EdgeDetector[source]#

Detect edges in a given image using a CNN.

By default, it uses the method described in [SRS20].

Returns:

A tensor of shape \((B,1,H,W)\).

Example

>>> img = torch.rand(1, 3, 320, 320)
>>> detect = EdgeDetector()
>>> out = detect(img)
>>> out.shape
torch.Size([1, 1, 320, 320])

Face Detection#

class kornia.contrib.FaceDetector(top_k=5000, confidence_threshold=0.3, nms_threshold=0.3, keep_top_k=750)[source]#

Detect faces in a given image using a CNN.

By default, it uses the method described in [FYP+21].

Parameters:
  • top_k (int, optional) – the maximum number of detections to return before the nms. Default: 5000

  • confidence_threshold (float, optional) – the threshold used to discard detections. Default: 0.3

  • nms_threshold (float, optional) – the threshold used by the nms for iou. Default: 0.3

  • keep_top_k (int, optional) – the maximum number of detections to return after the nms. Default: 750

Returns:

A list of B tensors with shape \((N,15)\) to be used with kornia.contrib.FaceDetectorResult.

Example

>>> img = torch.rand(1, 3, 320, 320)
>>> detect = FaceDetector()
>>> res = detect(img)
class kornia.contrib.FaceKeypoint(value)[source]#

Define the keypoints detected in a face.

The left/right convention is based on the screen viewer.

EYE_LEFT = 0#
EYE_RIGHT = 1#
MOUTH_LEFT = 3#
MOUTH_RIGHT = 4#
NOSE = 2#
class kornia.contrib.FaceDetectorResult(data)[source]#

Encapsulate the results obtained by the kornia.contrib.FaceDetector.

Parameters:

data (Tensor) – the encoded results coming from the feature detector with shape \((14,)\).

property bottom_left: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

property bottom_right: Tensor#

The [x y] position of the bottom-right coordinate of the bounding box.

get_keypoint(keypoint)[source]#

The [x y] position of a given facial keypoint.

Parameters:

keypoint (FaceKeypoint) – the keypoint type to return the position.

Return type:

Tensor

property height: Tensor#

The bounding box height.

property score: Tensor#

The detection score.

to(device=None, dtype=None)[source]#

Like torch.nn.Module.to() method.

Return type:

FaceDetectorResult

property top_left: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

property top_right: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

property width: Tensor#

The bounding box width.

property xmax: Tensor#

The bounding box bottom-right x-coordinate.

property xmin: Tensor#

The bounding box top-left x-coordinate.

property ymax: Tensor#

The bounding box bottom-right y-coordinate.

property ymin: Tensor#

The bounding box top-left y-coordinate.

Interactive Demo#

Visit the Kornia face detection demo on the Hugging Face Spaces.

Image Segmentation#

kornia.contrib.connected_components(image, num_iterations=100)[source]#

Computes the Connected-component labelling (CCL) algorithm.

https://github.com/kornia/data/raw/main/cells_segmented.png

The implementation is an adaptation of the following repository:

https://gist.github.com/efirdc/5d8bd66859e574c683a504a4690ae8bc

Warning

This is an experimental API subject to changes and optimization improvements.

Note

See a working example here.

Parameters:
  • image (Tensor) – the binarized input image with shape \((*, 1, H, W)\). The image must be in floating point with range [0, 1].

  • num_iterations (int, optional) – the number of iterations to make the algorithm to converge. Default: 100

Return type:

Tensor

Returns:

The labels image with the same shape of the input image.

Example

>>> img = torch.rand(2, 1, 4, 5)
>>> img_labels = connected_components(img, num_iterations=100)

Segment Anything (SAM)#

class kornia.contrib.models.sam.SamModelType(value)[source]#

Map the SAM model types.

vit_b = 2#
vit_h = 0#
vit_l = 1#
class kornia.contrib.models.sam.SamConfig(model_type=None, checkpoint=None, encoder_embed_dim=None, encoder_depth=None, encoder_num_heads=None, encoder_global_attn_indexes=None)[source]#

Encapsulate the Config to build a SAM model.

Parameters:
  • model_type (str | int | SamModelType | None, optional) –

    the available models are: Default: None

    • 0, ‘vit_h’ or kornia.contrib.sam.SamModelType.vit_h()

    • 1, ‘vit_l’ or kornia.contrib.sam.SamModelType.vit_l()

    • 2, ‘vit_b’ or kornia.contrib.sam.SamModelType.vit_b()

  • checkpoint (str | None, optional) – URL or a path for a file with the weights of the model Default: None

  • encoder_embed_dim (int | None, optional) – Patch embedding dimension. Default: None

  • encoder_depth (int | None, optional) – Depth of ViT. Default: None

  • encoder_num_heads (int | None, optional) – Number of attention heads in each ViT block. Default: None

  • encoder_global_attn_indexes (tuple[int, ...] | None, optional) – Encoder indexes for blocks using global attention. Default: None

checkpoint: str | None = None#
encoder_depth: int | None = None#
encoder_embed_dim: int | None = None#
encoder_global_attn_indexes: tuple[int, ...] | None = None#
encoder_num_heads: int | None = None#
model_type: str | int | SamModelType | None = None#
class kornia.contrib.models.sam.Sam(image_encoder, prompt_encoder, mask_decoder)[source]#
__init__(image_encoder, prompt_encoder, mask_decoder)[source]#

SAM predicts object masks from an image and input prompts.

Parameters:
  • image_encoder (ImageEncoderViT) – The backbone used to encode the image into image embeddings that allow for efficient mask prediction.

  • prompt_encoder (PromptEncoder) – Encodes various types of input prompts.

  • mask_decoder (MaskDecoder) – Predicts masks from the image embeddings and encoded prompts.

forward(images, batched_prompts, multimask_output)[source]#

Predicts masks end-to-end from provided images and prompts.

This method expects that the images have already been pre-processed, at least been normalized, resized and padded to be compatible with the self.image_encoder.

Parameters:
  • images – The image as a torch tensor in \((B, 3, H, W)\) format, already transformed for input to the model.

  • batched_prompts

    A list over the batch of images (list length should be \(B\)), each a dictionary with

    the following keys. If it does not have the respective prompt, it should not be included in this dictionary. The options are:

    • ”points”: tuple of (Tensor, Tensor) within the coordinate keypoints and their respective labels.

      the tuple should look like (keypoints, labels), where:

      • The keypoints (a tensor) are a batched point prompts for this image, with shape \((K, N, 2)\). Already transformed to the input frame of the model.

      • The labels (a tensor) are a batched labels for point prompts, with shape \((K, N)\). Where 1 indicates a foreground point and 0 indicates a background point.

    • ”boxes”: (Tensor) Batched box inputs, with shape \((K, 4)\). Already transformed to the input

      frame of the model.

    • ”mask_inputs”: (Tensor) Batched mask inputs to the model, in the form \((K, 1, H, W)\).

  • multimask_output – Whether the model should predict multiple disambiguating masks, or return a single mask.

Returns:

A list over input images, where each element is as SegmentationResults the following.
  • logits: Low resolution logits with shape \((K, C, H, W)\). Can be passed as mask input to

    subsequent iterations of prediction. Where \(K\) is the number of input prompts, \(C\) is determined by multimask_output, and \(H=W=256\) are the model output size.

  • scores: The model’s predictions of mask quality (iou prediction), in shape BxC.

static from_config(config)[source]#

Build/load the SAM model based on it’s config.

Parameters:

config (SamConfig) – The SamConfig data structure. If the model_type is available, build from it, otherwise will use the parameters set.

Return type:

Sam

Returns:

The respective SAM model

Example

>>> from kornia.contrib.models.sam import SamConfig
>>> sam_model = Sam.from_config(SamConfig('vit_b'))
load_checkpoint(checkpoint, device=None)#

Load checkpoint from a given url or file.

Parameters:
  • checkpoint (str) – The url or filepath for the respective checkpoint

  • device (torch.device | None, optional) – The desired device to load the weights and move the model Default: None

Return type:

None

Image Patches#

kornia.contrib.compute_padding(original_size, window_size)[source]#

Compute required padding to ensure chaining of extract_tensor_patches() and combine_tensor_patches() produces expected result.

Parameters:
  • original_size (Union[int, Tuple[int, int]]) – the size of the original tensor.

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window used while extracting patches.

Return type:

Tuple[int, int, int, int]

Returns:

The required padding for (top, bottom, left, right) as a tuple of 4 ints.

Example

>>> image = torch.arange(12).view(1, 1, 4, 3)
>>> padding = compute_padding((4,3), (3,3))
>>> out = extract_tensor_patches(image, window_size=(3, 3), stride=(3, 3), padding=padding)
>>> combine_tensor_patches(out, original_size=(4, 3), window_size=(3, 3), stride=(3, 3), unpadding=padding)
tensor([[[[ 0,  1,  2],
          [ 3,  4,  5],
          [ 6,  7,  8],
          [ 9, 10, 11]]]])

Note

This function is supposed to be used in conjunction with extract_tensor_patches() and combine_tensor_patches().

kornia.contrib.extract_tensor_patches(input, window_size, stride=1, padding=0)[source]#

Function that extract patches from tensors and stack them.

See ExtractTensorPatches for details.

Parameters:
  • input (Tensor) – tensor image where to extract the patches with shape \((B, C, H, W)\).

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window and the output patch size.

  • stride (Union[int, Tuple[int, int]], optional) – stride of the sliding window. Default: 1

  • padding (Union[int, Tuple[int, int], Tuple[int, int, int, int]], optional) – Zero-padding added to both side of the input. Default: 0

Return type:

Tensor

Returns:

the tensor with the extracted patches with shape \((B, N, C, H_{out}, W_{out})\).

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
kornia.contrib.combine_tensor_patches(patches, original_size, window_size, stride, unpadding=0)[source]#

Restore input from patches.

See CombineTensorPatches for details.

Parameters:
  • patches (Tensor) – patched tensor with shape \((B, N, C, H_{out}, W_{out})\).

  • original_size (Union[int, Tuple[int, int]]) – the size of the original tensor and the output patch size.

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window used while extracting patches.

  • stride (Union[int, Tuple[int, int]]) – stride of the sliding window.

  • unpadding (Union[int, Tuple[int, int], Tuple[int, int, int, int]], optional) – remove the padding added to both side of the input. Default: 0

Return type:

Tensor

Returns:

The combined patches in an image tensor with shape \((B, C, H, W)\).

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])

Note

This function is supposed to be used in conjunction with extract_tensor_patches().

class kornia.contrib.ExtractTensorPatches(window_size, stride=1, padding=0)[source]#

Module that extract patches from tensors and stack them.

In the simplest case, the output value of the operator with input size \((B, C, H, W)\) is \((B, N, C, H_{out}, W_{out})\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • window_size is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.

  • stride controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.

  • padding controls the amount of implicit zeros-paddings on both sizes at each dimension.

The parameters window_size, stride and padding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

padding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters:
Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, N, C, H_{out}, W_{out})\)

Returns:

the tensor with the extracted patches.

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
class kornia.contrib.CombineTensorPatches(original_size, window_size, unpadding=0)[source]#

Module that combine patches from tensors.

In the simplest case, the output value of the operator with input size \((B, N, C, H_{out}, W_{out})\) is \((B, C, H, W)\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • original_size is the size of the original image prior to extracting tensor patches and defines the shape of the output patch.

  • window_size is the size of the sliding window used while extracting tensor patches.

  • unpadding is the amount of padding to be removed. This value must be the same as padding used while extracting tensor patches.

The parameters original_size, window_size, and unpadding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

unpadding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters:
Shape:
  • Input: \((B, N, C, H_{out}, W_{out})\)

  • Output: \((B, C, H, W)\)

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])

Note

This function is supposed to be used in conjunction with ExtractTensorPatches.

Image Classification#

class kornia.contrib.VisionTransformer(image_size=224, patch_size=16, in_channels=3, embed_dim=768, depth=12, num_heads=12, dropout_rate=0.0, dropout_attn=0.0, backbone=None)[source]#

Vision transformer (ViT) module.

The module is expected to be used as operator for different vision tasks.

The method is inspired from existing implementations of the paper [DBK+21].

Warning

This is an experimental API subject to changes in favor of flexibility.

Parameters:
  • image_size (int, optional) – the size of the input image. Default: 224

  • patch_size (int, optional) – the size of the patch to compute the embedding. Default: 16

  • in_channels (int, optional) – the number of channels for the input. Default: 3

  • embed_dim (int, optional) – the embedding dimension inside the transformer encoder. Default: 768

  • depth (int, optional) – the depth of the transformer. Default: 12

  • num_heads (int, optional) – the number of attention heads. Default: 12

  • dropout_rate (float, optional) – dropout rate. Default: 0.0

  • dropout_attn (float, optional) – attention dropout rate. Default: 0.0

  • backbone (Optional[Module], optional) – an nn.Module to compute the image patches embeddings. Default: None

Example

>>> img = torch.rand(1, 3, 224, 224)
>>> vit = VisionTransformer(image_size=224, patch_size=16)
>>> vit(img).shape
torch.Size([1, 197, 768])
class kornia.contrib.MobileViT(mode='xxs', in_channels=3, patch_size=(2, 2), dropout=0.0)[source]#

Module MobileViT. Default arguments is for MobileViT XXS.

Paper: https://arxiv.org/abs/2110.02178 Based on: https://github.com/chinhsuanwu/mobilevit-pytorch

Parameters:
  • mode (str, optional) – ‘xxs’, ‘xs’ or ‘s’, defaults to ‘xxs’. Default: 'xxs'

  • in_channels (int, optional) – the number of channels for the input image. Default: 3

  • patch_size (Tuple[int, int], optional) – image_size must be divisible by patch_size. Default: (2, 2)

  • dropout (float, optional) – dropout ratio in Transformer. Default: 0.0

Example

>>> img = torch.rand(1, 3, 256, 256)
>>> mvit = MobileViT(mode='xxs')
>>> mvit(img).shape
torch.Size([1, 320, 8, 8])
class kornia.contrib.ClassificationHead(embed_size=768, num_classes=10)[source]#

Module to be used as a classification head.

Parameters:
  • embed_size (int, optional) – the logits tensor coming from the networks. Default: 768

  • num_classes (int, optional) – an integer representing the numbers of classes to classify. Default: 10

Example

>>> feat = torch.rand(1, 256, 256)
>>> head = ClassificationHead(256, 10)
>>> head(feat).shape
torch.Size([1, 10])

Image Stitching#

class kornia.contrib.ImageStitcher(matcher, estimator='ransac', blending_method='naive')[source]#

Stitch two images with overlapping fields of view.

Parameters:
  • matcher (Module) – image feature matching module.

  • estimator (str, optional) – method to compute homography, either “vanilla” or “ransac”. “ransac” is slower with a better accuracy. Default: 'ransac'

  • blending_method (str, optional) – method to blend two images together. Only “naive” is currently supported. Default: 'naive'

Note

Current implementation requires strict image ordering from left to right.

IS = ImageStitcher(KF.LoFTR(pretrained='outdoor'), estimator='ransac').cuda()
# Compute the stitched result with less GPU memory cost.
with torch.inference_mode():
    out = IS(img_left, img_right)
# Show the result
plt.imshow(K.tensor_to_image(out))

Lambda#

class kornia.contrib.Lambda(func)[source]#

Applies user-defined lambda as a transform.

Parameters:

func (Callable[..., Tensor]) – Callable function.

Returns:

The output of the user-defined lambda.

Example

>>> import kornia
>>> x = torch.rand(1, 3, 5, 5)
>>> f = Lambda(lambda x: kornia.color.rgb_to_grayscale(x))
>>> f(x).shape
torch.Size([1, 1, 5, 5])

Distance Transform#

kornia.contrib.distance_transform(image, kernel_size=3, h=0.35)[source]#

Approximates the Manhattan distance transform of images using cascaded convolution operations.

The value at each pixel in the output represents the distance to the nearest non-zero pixel in the image image. It uses the method described in [PDP20]. The transformation is applied independently across the channel dimension of the images.

Parameters:
  • image (Tensor) – Image with shape \((B,C,H,W)\).

  • kernel_size (int, optional) – size of the convolution kernel. Default: 3

  • h (float, optional) – value that influence the approximation of the min function. Default: 0.35

Return type:

Tensor

Returns:

tensor with shape \((B,C,H,W)\).

Example

>>> tensor = torch.zeros(1, 1, 5, 5)
>>> tensor[:,:, 1, 2] = 1
>>> dt = kornia.contrib.distance_transform(tensor)
kornia.contrib.diamond_square(output_size, roughness=0.5, random_scale=1.0, random_fn=torch.rand, normalize_range=None, device=None, dtype=None)[source]#

Generates Plasma Fractal Images using the diamond square algorithm.

See: https://en.wikipedia.org/wiki/Diamond-square_algorithm

Parameters:
  • output_size (Tuple[int, int, int, int]) – a tuple of integers with the BxCxHxW of the image to be generated.

  • roughness (Union[float, Tensor], optional) – the scale value to apply at each recursion step. Default: 0.5

  • random_scale (Union[float, Tensor], optional) – the initial value of the scale for recursion. Default: 1.0

  • random_fn (Callable[..., Tensor], optional) – the callable function to use to sample a random tensor. Default: torch.rand

  • normalize_range (Optional[Tuple[float, float]], optional) – whether to normalize using min-max the output map. In case of a range is specified, min-max norm is applied between the provided range. Default: None

  • device (Optional[device], optional) – the torch device to place the output map. Default: None

  • dtype (Optional[dtype], optional) – the torch dtype to place the output map. Default: None

Return type:

Tensor

Returns:

A tensor with shape \((B,C,H,W)\) containing the fractal image.

class kornia.contrib.DistanceTransform(kernel_size=3, h=0.35)[source]#

Module that approximates the Manhattan (city block) distance transform of images using convolutions.

Parameters:
  • kernel_size (int, optional) – size of the convolution kernel. Default: 3

  • h (float, optional) – value that influence the approximation of the min function. Default: 0.35