kornia.contrib

Face Detection

class kornia.contrib.FaceDetector(top_k=5000, confidence_threshold=0.3, nms_threshold=0.3, keep_top_k=750)[source]

Detect faces in a given image using a CNN.

By default, it uses the method described in [FYP+21].

Parameters
  • top_k (int, optional) – the maximum number of detections to return before the nms. Default: 5000

  • confidence_threshold (float, optional) – the threshold used to discard detections. Default: 0.3

  • nms_threshold (float, optional) – the threshold used by the nms for iou. Default: 0.3

  • keep_top_k (int, optional) – the maximum number of detections to return after the nms. Default: 750

Returns

A tensor of shape \((N,15)\) to be used with kornia.contrib.FaceDetectorResult.

Example

>>> img = torch.rand(1, 3, 320, 320)
>>> detect = FaceDetector()
>>> res = detect(img)
class kornia.contrib.FaceKeypoint(value)[source]

Define the keypoints detected in a face.

The left/right convention is based on the screen viewer.

EYE_LEFT = 0
EYE_RIGHT = 1
MOUTH_LEFT = 3
MOUTH_RIGHT = 4
NOSE = 2
class kornia.contrib.FaceDetectorResult(data)[source]

Encapsulate the results obtained by the kornia.contrib.FaceDetector.

Parameters

data (Tensor) – the encoded results coming from the feature detector with shape \((14,)\).

property bottom_left: torch.Tensor

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property bottom_right: torch.Tensor

The [x y] position of the bottom-right coordinate of the bounding box.

Return type

Tensor

get_keypoint(keypoint)[source]

The [x y] position of a given facial keypoint.

Parameters

keypoint (FaceKeypoint) – the keypoint type to return the position.

Return type

Tensor

property height: torch.Tensor

The bounding box height.

Return type

Tensor

property score: torch.Tensor

The detection score.

Return type

Tensor

to(device=None, dtype=None)[source]

Like torch.nn.Module.to() method.

Return type

FaceDetectorResult

property top_left: torch.Tensor

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property top_right: torch.Tensor

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property width: torch.Tensor

The bounding box width.

Return type

Tensor

property xmax: torch.Tensor

The bounding box bottom-right x-coordinate.

Return type

Tensor

property xmin: torch.Tensor

The bounding box top-left x-coordinate.

Return type

Tensor

property ymax: torch.Tensor

The bounding box bottom-right y-coordinate.

Return type

Tensor

property ymin: torch.Tensor

The bounding box top-left y-coordinate.

Return type

Tensor

Image Segmentation

kornia.contrib.connected_components(image, num_iterations=100)[source]

Computes the Connected-component labelling (CCL) algorithm.

https://github.com/kornia/data/raw/main/cells_segmented.png

The implementation is an adaptation of the following repository:

https://gist.github.com/efirdc/5d8bd66859e574c683a504a4690ae8bc

Warning

This is an experimental API subject to changes and optimization improvements.

Note

See a working example here.

Parameters
  • image (Tensor) – the binarized input image with shape \((*, 1, H, W)\). The image must be in floating point with range [0, 1].

  • num_iterations (int, optional) – the number of iterations to make the algorithm to converge. Default: 100

Return type

Tensor

Returns

The labels image with the same shape of the input image.

Example

>>> img = torch.rand(2, 1, 4, 5)
>>> img_labels = connected_components(img, num_iterations=100)

Image Patches

kornia.contrib.extract_tensor_patches(input, window_size, stride=1, padding=0)[source]

Function that extract patches from tensors and stack them.

See ExtractTensorPatches for details.

Parameters
  • input (Tensor) – tensor image where to extract the patches with shape \((B, C, H, W)\).

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window and the output patch size.

  • stride (Union[int, Tuple[int, int]], optional) – stride of the sliding window. Default: 1

  • padding (Union[int, Tuple[int, int]], optional) – Zero-padding added to both side of the input. Default: 0

Return type

Tensor

Returns

the tensor with the extracted patches with shape \((B, N, C, H_{out}, W_{out})\).

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
kornia.contrib.combine_tensor_patches(patches, window_size=(4, 4), stride=(4, 4), unpadding=None)[source]

Restore input from patches.

Parameters
  • patches (Tensor) – patched tensor with shape \((B, N, C, H_{out}, W_{out})\).

  • window_size (Tuple[int, int], optional) – the size of the sliding window and the output patch size. Default: (4, 4)

  • stride (Tuple[int, int], optional) – stride of the sliding window. Default: (4, 4)

  • unpadding (Optional[Tuple[int, int, int, int]], optional) – remove the padding added to both side of the input. Default: None

Return type

Tensor

Returns

The combined patches in an image tensor with shape \((B, C, H, W)\).

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])
class kornia.contrib.ExtractTensorPatches(window_size, stride=1, padding=0)[source]

Module that extract patches from tensors and stack them.

In the simplest case, the output value of the operator with input size \((B, C, H, W)\) is \((B, N, C, H_{out}, W_{out})\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • window_size is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.

  • stride controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.

  • padding controls the amount of implicit zeros-paddings on both sizes at each dimension.

The parameters window_size, stride and padding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

Parameters
Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, N, C, H_{out}, W_{out})\)

Returns

the tensor with the extracted patches.

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
class kornia.contrib.CombineTensorPatches(window_size, unpadding=0)[source]

Module that combine patches from tensors.

In the simplest case, the output value of the operator with input size \((B, N, C, H_{out}, W_{out})\) is \((B, C, H, W)\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • window_size is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.

  • stride controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.

  • padding controls the amount of implicit zeros-paddings on both sizes at each dimension.

The parameters window_size, stride and padding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

Parameters
  • patches – patched tensor.

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window and the output patch size.

  • unpadding (Union[int, Tuple[int, int]], optional) – remove the padding added to both side of the input. Default: 0

Shape:
  • Input: \((B, N, C, H_{out}, W_{out})\)

  • Output: \((B, C, H, W)\)

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])

Image Classification

class kornia.contrib.VisionTransformer(image_size=224, patch_size=16, in_channels=3, embed_dim=768, depth=12, num_heads=12, dropout_rate=0.0, dropout_attn=0.0, backbone=None)[source]

Vision transformer (ViT) module.

The module is expected to be used as operator for different vision tasks.

The method is inspired from existing implementations of the paper [DBK+21].

Warning

This is an experimental API subject to changes in favor of flexibility.

Parameters
  • image_size (int, optional) – the size of the input image. Default: 224

  • patch_size (int, optional) – the size of the patch to compute the embedding. Default: 16

  • in_channels (int, optional) – the number of channels for the input. Default: 3

  • embed_dim (int, optional) – the embedding dimension inside the transformer encoder. Default: 768

  • depth (int, optional) – the depth of the transformer. Default: 12

  • num_heads (int, optional) – the number of attention heads. Default: 12

  • dropout_rate (float, optional) – dropout rate. Default: 0.0

  • dropout_attn (float, optional) – attention dropout rate. Default: 0.0

  • backbone (Optional[Module], optional) – an nn.Module to compute the image patches embeddings. Default: None

Example

>>> img = torch.rand(1, 3, 224, 224)
>>> vit = VisionTransformer(image_size=224, patch_size=16)
>>> vit(img).shape
torch.Size([1, 197, 768])
class kornia.contrib.MobileViT(mode='xxs', in_channels=3, patch_size=(2, 2), dropout=0.0)[source]

Module MobileViT. Default arguments is for MobileViT XXS.

Paper: https://arxiv.org/abs/2110.02178 Based on: https://github.com/chinhsuanwu/mobilevit-pytorch

Parameters
  • mode (str, optional) – ‘xxs’, ‘xs’ or ‘s’, defaults to ‘xxs’. Default: 'xxs'

  • in_channels (int, optional) – the number of channels for the input image. Default: 3

  • patch_size (Tuple[int, int], optional) – image_size must be divisible by patch_size. Default: (2, 2)

  • dropout (float, optional) – dropout ratio in Transformer. Default: 0.0

Example

>>> img = torch.rand(1, 3, 256, 256)
>>> mvit = MobileViT(mode='xxs')
>>> mvit(img).shape
torch.Size([1, 320, 8, 8])
class kornia.contrib.ClassificationHead(embed_size=768, num_classes=10)[source]

Module to be used as a classification head.

Parameters
  • embed_size (int, optional) – the logits tensor coming from the networks. Default: 768

  • num_classes (int, optional) – an integer representing the numbers of classes to classify. Default: 10

Example

>>> feat = torch.rand(1, 256, 256)
>>> head = ClassificationHead(256, 10)
>>> head(feat).shape
torch.Size([1, 10])

Image Stitching

class kornia.contrib.ImageStitcher(matcher, estimator='ransac', blending_method='naive')[source]

Stitch two images with overlapping fields of view.

Parameters
  • matcher (Module) – image feature matching module.

  • estimator (str, optional) – method to compute homography, either “vanilla” or “ransac”. “ransac” is slower with a better accuracy. Default: 'ransac'

  • blending_method (str, optional) – method to blend two images together. Only “naive” is currently supported. Default: 'naive'

Note

Current implementation requires strict image ordering from left to right.

IS = ImageStitcher(KF.LoFTR(pretrained='outdoor'), estimator='ransac').cuda()
# Compute the stitched result with less GPU memory cost.
with torch.inference_mode():
    out = IS(img_left, img_right)
# Show the result
plt.imshow(K.tensor_to_image(out))

Lambda

class kornia.contrib.Lambda(func)[source]

Applies user-defined lambda as a transform.

Parameters

func (Callable) – Callable function.

Returns

The output of the user-defined lambda.

Example

>>> import kornia
>>> x = torch.rand(1, 3, 5, 5)
>>> f = Lambda(lambda x: kornia.color.rgb_to_grayscale(x))
>>> f(x).shape
torch.Size([1, 1, 5, 5])