kornia.contrib#

Face Detection#

class kornia.contrib.FaceDetector(top_k=5000, confidence_threshold=0.3, nms_threshold=0.3, keep_top_k=750)[source]#

Detect faces in a given image using a CNN.

By default, it uses the method described in [FYP+21].

Parameters
  • top_k (int, optional) – the maximum number of detections to return before the nms. Default: 5000

  • confidence_threshold (float, optional) – the threshold used to discard detections. Default: 0.3

  • nms_threshold (float, optional) – the threshold used by the nms for iou. Default: 0.3

  • keep_top_k (int, optional) – the maximum number of detections to return after the nms. Default: 750

Returns

A tensor of shape \((N,15)\) to be used with kornia.contrib.FaceDetectorResult.

Example

>>> img = torch.rand(1, 3, 320, 320)
>>> detect = FaceDetector()
>>> res = detect(img)
class kornia.contrib.FaceKeypoint(value)[source]#

Define the keypoints detected in a face.

The left/right convention is based on the screen viewer.

EYE_LEFT = 0#
EYE_RIGHT = 1#
MOUTH_LEFT = 3#
MOUTH_RIGHT = 4#
NOSE = 2#
class kornia.contrib.FaceDetectorResult(data)[source]#

Encapsulate the results obtained by the kornia.contrib.FaceDetector.

Parameters

data (Tensor) – the encoded results coming from the feature detector with shape \((14,)\).

property bottom_left: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property bottom_right: Tensor#

The [x y] position of the bottom-right coordinate of the bounding box.

Return type

Tensor

get_keypoint(keypoint)[source]#

The [x y] position of a given facial keypoint.

Parameters

keypoint (FaceKeypoint) – the keypoint type to return the position.

Return type

Tensor

property height: Tensor#

The bounding box height.

Return type

Tensor

property score: Tensor#

The detection score.

Return type

Tensor

to(device=None, dtype=None)[source]#

Like torch.nn.Module.to() method.

Return type

FaceDetectorResult

property top_left: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property top_right: Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property width: Tensor#

The bounding box width.

Return type

Tensor

property xmax: Tensor#

The bounding box bottom-right x-coordinate.

Return type

Tensor

property xmin: Tensor#

The bounding box top-left x-coordinate.

Return type

Tensor

property ymax: Tensor#

The bounding box bottom-right y-coordinate.

Return type

Tensor

property ymin: Tensor#

The bounding box top-left y-coordinate.

Return type

Tensor

Image Segmentation#

kornia.contrib.connected_components(image, num_iterations=100)[source]#

Computes the Connected-component labelling (CCL) algorithm.

https://github.com/kornia/data/raw/main/cells_segmented.png

The implementation is an adaptation of the following repository:

https://gist.github.com/efirdc/5d8bd66859e574c683a504a4690ae8bc

Warning

This is an experimental API subject to changes and optimization improvements.

Note

See a working example here.

Parameters
  • image (Tensor) – the binarized input image with shape \((*, 1, H, W)\). The image must be in floating point with range [0, 1].

  • num_iterations (int, optional) – the number of iterations to make the algorithm to converge. Default: 100

Return type

Tensor

Returns

The labels image with the same shape of the input image.

Example

>>> img = torch.rand(2, 1, 4, 5)
>>> img_labels = connected_components(img, num_iterations=100)

Image Patches#

kornia.contrib.compute_padding(original_size, window_size)[source]#

Compute required padding to ensure chaining of extract_tensor_patches() and combine_tensor_patches() produces expected result.

Parameters
  • original_size (Union[int, Tuple[int, int]]) – the size of the original tensor.

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window used while extracting patches.

Return type

Tuple[int, int, int, int]

Returns

The required padding for (top, bottom, left, right) as a tuple of 4 ints.

Example

>>> image = torch.arange(12).view(1, 1, 4, 3)
>>> padding = compute_padding((4,3), (3,3))
>>> out = extract_tensor_patches(image, window_size=(3, 3), stride=(3, 3), padding=padding)
>>> combine_tensor_patches(out, original_size=(4, 3), window_size=(3, 3), stride=(3, 3), unpadding=padding)
tensor([[[[ 0,  1,  2],
          [ 3,  4,  5],
          [ 6,  7,  8],
          [ 9, 10, 11]]]])

Note

This function is supposed to be used in conjunction with extract_tensor_patches() and combine_tensor_patches().

kornia.contrib.extract_tensor_patches(input, window_size, stride=1, padding=0)[source]#

Function that extract patches from tensors and stack them.

See ExtractTensorPatches for details.

Parameters
  • input (Tensor) – tensor image where to extract the patches with shape \((B, C, H, W)\).

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window and the output patch size.

  • stride (Union[int, Tuple[int, int]], optional) – stride of the sliding window. Default: 1

  • padding (Union[int, Tuple[int, int], Tuple[int, int, int, int]], optional) – Zero-padding added to both side of the input. Default: 0

Return type

Tensor

Returns

the tensor with the extracted patches with shape \((B, N, C, H_{out}, W_{out})\).

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
kornia.contrib.combine_tensor_patches(patches, original_size, window_size, stride, unpadding=0)[source]#

Restore input from patches.

See CombineTensorPatches for details.

Parameters
  • patches (Tensor) – patched tensor with shape \((B, N, C, H_{out}, W_{out})\).

  • original_size (Union[int, Tuple[int, int]]) – the size of the original tensor and the output patch size.

  • window_size (Union[int, Tuple[int, int]]) – the size of the sliding window used while extracting patches.

  • stride (Union[int, Tuple[int, int]]) – stride of the sliding window.

  • unpadding (Union[int, Tuple[int, int], Tuple[int, int, int, int]], optional) – remove the padding added to both side of the input. Default: 0

Return type

Tensor

Returns

The combined patches in an image tensor with shape \((B, C, H, W)\).

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])

Note

This function is supposed to be used in conjunction with extract_tensor_patches().

class kornia.contrib.ExtractTensorPatches(window_size, stride=1, padding=0)[source]#

Module that extract patches from tensors and stack them.

In the simplest case, the output value of the operator with input size \((B, C, H, W)\) is \((B, N, C, H_{out}, W_{out})\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • window_size is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.

  • stride controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.

  • padding controls the amount of implicit zeros-paddings on both sizes at each dimension.

The parameters window_size, stride and padding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

padding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters
Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, N, C, H_{out}, W_{out})\)

Returns

the tensor with the extracted patches.

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
          [6., 7., 8.]]]])
class kornia.contrib.CombineTensorPatches(original_size, window_size, unpadding=0)[source]#

Module that combine patches from tensors.

In the simplest case, the output value of the operator with input size \((B, N, C, H_{out}, W_{out})\) is \((B, C, H, W)\).

where
  • \(B\) is the batch size.

  • \(N\) denotes the total number of extracted patches stacked in

  • \(C\) denotes the number of input channels.

  • \(H\), \(W\) the input height and width of the input in pixels.

  • \(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

  • original_size is the size of the original image prior to extracting tensor patches and defines the shape of the output patch.

  • window_size is the size of the sliding window used while extracting tensor patches.

  • unpadding is the amount of padding to be removed. This value must be the same as padding used while extracting tensor patches.

The parameters original_size, window_size, and unpadding can be either:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

unpadding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters
Shape:
  • Input: \((B, N, C, H_{out}, W_{out})\)

  • Output: \((B, C, H, W)\)

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])

Note

This function is supposed to be used in conjunction with ExtractTensorPatches.

Image Classification#

class kornia.contrib.VisionTransformer(image_size=224, patch_size=16, in_channels=3, embed_dim=768, depth=12, num_heads=12, dropout_rate=0.0, dropout_attn=0.0, backbone=None)[source]#

Vision transformer (ViT) module.

The module is expected to be used as operator for different vision tasks.

The method is inspired from existing implementations of the paper [DBK+21].

Warning

This is an experimental API subject to changes in favor of flexibility.

Parameters
  • image_size (int, optional) – the size of the input image. Default: 224

  • patch_size (int, optional) – the size of the patch to compute the embedding. Default: 16

  • in_channels (int, optional) – the number of channels for the input. Default: 3

  • embed_dim (int, optional) – the embedding dimension inside the transformer encoder. Default: 768

  • depth (int, optional) – the depth of the transformer. Default: 12

  • num_heads (int, optional) – the number of attention heads. Default: 12

  • dropout_rate (float, optional) – dropout rate. Default: 0.0

  • dropout_attn (float, optional) – attention dropout rate. Default: 0.0

  • backbone (Optional[Module], optional) – an nn.Module to compute the image patches embeddings. Default: None

Example

>>> img = torch.rand(1, 3, 224, 224)
>>> vit = VisionTransformer(image_size=224, patch_size=16)
>>> vit(img).shape
torch.Size([1, 197, 768])
class kornia.contrib.MobileViT(mode='xxs', in_channels=3, patch_size=(2, 2), dropout=0.0)[source]#

Module MobileViT. Default arguments is for MobileViT XXS.

Paper: https://arxiv.org/abs/2110.02178 Based on: https://github.com/chinhsuanwu/mobilevit-pytorch

Parameters
  • mode (str, optional) – ‘xxs’, ‘xs’ or ‘s’, defaults to ‘xxs’. Default: 'xxs'

  • in_channels (int, optional) – the number of channels for the input image. Default: 3

  • patch_size (Tuple[int, int], optional) – image_size must be divisible by patch_size. Default: (2, 2)

  • dropout (float, optional) – dropout ratio in Transformer. Default: 0.0

Example

>>> img = torch.rand(1, 3, 256, 256)
>>> mvit = MobileViT(mode='xxs')
>>> mvit(img).shape
torch.Size([1, 320, 8, 8])
class kornia.contrib.ClassificationHead(embed_size=768, num_classes=10)[source]#

Module to be used as a classification head.

Parameters
  • embed_size (int, optional) – the logits tensor coming from the networks. Default: 768

  • num_classes (int, optional) – an integer representing the numbers of classes to classify. Default: 10

Example

>>> feat = torch.rand(1, 256, 256)
>>> head = ClassificationHead(256, 10)
>>> head(feat).shape
torch.Size([1, 10])

Image Stitching#

class kornia.contrib.ImageStitcher(matcher, estimator='ransac', blending_method='naive')[source]#

Stitch two images with overlapping fields of view.

Parameters
  • matcher (Module) – image feature matching module.

  • estimator (str, optional) – method to compute homography, either “vanilla” or “ransac”. “ransac” is slower with a better accuracy. Default: 'ransac'

  • blending_method (str, optional) – method to blend two images together. Only “naive” is currently supported. Default: 'naive'

Note

Current implementation requires strict image ordering from left to right.

IS = ImageStitcher(KF.LoFTR(pretrained='outdoor'), estimator='ransac').cuda()
# Compute the stitched result with less GPU memory cost.
with torch.inference_mode():
    out = IS(img_left, img_right)
# Show the result
plt.imshow(K.tensor_to_image(out))

Lambda#

class kornia.contrib.Lambda(func)[source]#

Applies user-defined lambda as a transform.

Parameters

func (Callable) – Callable function.

Returns

The output of the user-defined lambda.

Example

>>> import kornia
>>> x = torch.rand(1, 3, 5, 5)
>>> f = Lambda(lambda x: kornia.color.rgb_to_grayscale(x))
>>> f(x).shape
torch.Size([1, 1, 5, 5])

Distance Transform#

kornia.contrib.distance_transform(image, kernel_size=3, h=0.35)[source]#

Approximates the Manhattan distance transform of images using cascaded convolution operations.

The value at each pixel in the output represents the distance to the nearest non-zero pixel in the image image. It uses the method described in [PDP20]. The transformation is applied independently across the channel dimension of the images.

Parameters
  • image (Tensor) – Image with shape \((B,C,H,W)\).

  • kernel_size (int, optional) – size of the convolution kernel. Default: 3

  • h (float, optional) – value that influence the approximation of the min function. Default: 0.35

Return type

Tensor

Returns

tensor with shape \((B,C,H,W)\).

Example

>>> tensor = torch.zeros(1, 1, 5, 5)
>>> tensor[:,:, 1, 2] = 1
>>> dt = kornia.contrib.distance_transform(tensor)
kornia.contrib.diamond_square(output_size, roughness=0.5, random_scale=1.0, random_fn=torch.rand, normalize_range=None, device=None, dtype=None)[source]#

Generates Plasma Fractal Images using the diamond square algorithm.

See: https://en.wikipedia.org/wiki/Diamond-square_algorithm

Parameters
  • output_size (Tuple[int, int, int, int]) – a tuple of integers with the BxCxHxW of the image to be generated.

  • roughness (Union[float, Tensor], optional) – the scale value to apply at each recursion step. Default: 0.5

  • random_scale (Union[float, Tensor], optional) – the initial value of the scale for recursion. Default: 1.0

  • random_fn (Callable, optional) – the callable function to use to sample a random tensor. Default: torch.rand

  • normalize_range (Optional[Tuple[int, int]], optional) – whether to normalize using min-max the output map. In case of a range is specified, min-max norm is applied between the provided range. Default: None

  • device (Optional[device], optional) – the torch device to place the output map. Default: None

  • dtype (Optional[dtype], optional) – the torch dtype to place the output map. Default: None

Return type

Tensor

Returns

A tensor with shape \((B,C,H,W)\) containing the fractal image.

class kornia.contrib.DistanceTransform(kernel_size=3, h=0.35)[source]#

Module that approximates the Manhattan (city block) distance transform of images using convolutions.

Parameters
  • kernel_size (int, optional) – size of the convolution kernel. Default: 3

  • h (float, optional) – value that influence the approximation of the min function. Default: 0.35