# kornia.contrib#

## Face Detection#

class kornia.contrib.FaceDetector(top_k=5000, confidence_threshold=0.3, nms_threshold=0.3, keep_top_k=750)[source]#

Detect faces in a given image using a CNN.

By default, it uses the method described in [FYP+21].

Parameters
• top_k (int, optional) – the maximum number of detections to return before the nms. Default: 5000

• confidence_threshold (float, optional) – the threshold used to discard detections. Default: 0.3

• nms_threshold (float, optional) – the threshold used by the nms for iou. Default: 0.3

• keep_top_k (int, optional) – the maximum number of detections to return after the nms. Default: 750

Returns

A tensor of shape $$(N,15)$$ to be used with kornia.contrib.FaceDetectorResult.

Example

>>> img = torch.rand(1, 3, 320, 320)
>>> detect = FaceDetector()
>>> res = detect(img)

class kornia.contrib.FaceKeypoint(value)[source]#

Define the keypoints detected in a face.

The left/right convention is based on the screen viewer.

EYE_LEFT = 0#
EYE_RIGHT = 1#
MOUTH_LEFT = 3#
MOUTH_RIGHT = 4#
NOSE = 2#
class kornia.contrib.FaceDetectorResult(data)[source]#

Encapsulate the results obtained by the kornia.contrib.FaceDetector.

Parameters

data (Tensor) – the encoded results coming from the feature detector with shape $$(14,)$$.

property bottom_left: torch.Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property bottom_right: torch.Tensor#

The [x y] position of the bottom-right coordinate of the bounding box.

Return type

Tensor

get_keypoint(keypoint)[source]#

The [x y] position of a given facial keypoint.

Parameters

keypoint (FaceKeypoint) – the keypoint type to return the position.

Return type

Tensor

property height: torch.Tensor#

The bounding box height.

Return type

Tensor

property score: torch.Tensor#

The detection score.

Return type

Tensor

to(device=None, dtype=None)[source]#

Like torch.nn.Module.to() method.

Return type

FaceDetectorResult

property top_left: torch.Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property top_right: torch.Tensor#

The [x y] position of the top-left coordinate of the bounding box.

Return type

Tensor

property width: torch.Tensor#

The bounding box width.

Return type

Tensor

property xmax: torch.Tensor#

The bounding box bottom-right x-coordinate.

Return type

Tensor

property xmin: torch.Tensor#

The bounding box top-left x-coordinate.

Return type

Tensor

property ymax: torch.Tensor#

The bounding box bottom-right y-coordinate.

Return type

Tensor

property ymin: torch.Tensor#

The bounding box top-left y-coordinate.

Return type

Tensor

## Image Segmentation#

kornia.contrib.connected_components(image, num_iterations=100)[source]#

Computes the Connected-component labelling (CCL) algorithm.

The implementation is an adaptation of the following repository:

https://gist.github.com/efirdc/5d8bd66859e574c683a504a4690ae8bc

Warning

This is an experimental API subject to changes and optimization improvements.

Note

See a working example here.

Parameters
• image (Tensor) – the binarized input image with shape $$(*, 1, H, W)$$. The image must be in floating point with range [0, 1].

• num_iterations (int, optional) – the number of iterations to make the algorithm to converge. Default: 100

Return type

Tensor

Returns

The labels image with the same shape of the input image.

Example

>>> img = torch.rand(2, 1, 4, 5)
>>> img_labels = connected_components(img, num_iterations=100)


## Image Patches#

Compute required padding to ensure chaining of extract_tensor_patches() and combine_tensor_patches() produces expected result.

Parameters
Return type
Returns

The required padding for (top, bottom, left, right) as a tuple of 4 ints.

Example

>>> image = torch.arange(12).view(1, 1, 4, 3)
>>> out = extract_tensor_patches(image, window_size=(3, 3), stride=(3, 3), padding=padding)
>>> combine_tensor_patches(out, original_size=(4, 3), window_size=(3, 3), stride=(3, 3), unpadding=padding)
tensor([[[[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]]]])


Note

This function is supposed to be used in conjunction with extract_tensor_patches() and combine_tensor_patches().

kornia.contrib.extract_tensor_patches(input, window_size, stride=1, padding=0)[source]#

Function that extract patches from tensors and stack them.

See ExtractTensorPatches for details.

Parameters
Return type

Tensor

Returns

the tensor with the extracted patches with shape $$(B, N, C, H_{out}, W_{out})$$.

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
[6., 7., 8.]]]])

kornia.contrib.combine_tensor_patches(patches, original_size, window_size, stride, unpadding=0)[source]#

Restore input from patches.

See CombineTensorPatches for details.

Parameters
Return type

Tensor

Returns

The combined patches in an image tensor with shape $$(B, C, H, W)$$.

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15]]]])


Note

This function is supposed to be used in conjunction with extract_tensor_patches().

class kornia.contrib.ExtractTensorPatches(window_size, stride=1, padding=0)[source]#

Module that extract patches from tensors and stack them.

In the simplest case, the output value of the operator with input size $$(B, C, H, W)$$ is $$(B, N, C, H_{out}, W_{out})$$.

where
• $$B$$ is the batch size.

• $$N$$ denotes the total number of extracted patches stacked in

• $$C$$ denotes the number of input channels.

• $$H$$, $$W$$ the input height and width of the input in pixels.

• $$H_{out}$$, $$W_{out}$$ denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

• window_size is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.

• stride controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.

• padding controls the amount of implicit zeros-paddings on both sizes at each dimension.

The parameters window_size, stride and padding can be either:

• a single int – in which case the same value is used for the height and width dimension.

• a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

padding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters
Shape:
• Input: $$(B, C, H, W)$$

• Output: $$(B, N, C, H_{out}, W_{out})$$

Returns

the tensor with the extracted patches.

Examples

>>> input = torch.arange(9.).view(1, 1, 3, 3)
>>> patches = extract_tensor_patches(input, (2, 3))
>>> input
tensor([[[[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]]]])
>>> patches[:, -1]
tensor([[[[3., 4., 5.],
[6., 7., 8.]]]])

class kornia.contrib.CombineTensorPatches(original_size, window_size, unpadding=0)[source]#

Module that combine patches from tensors.

In the simplest case, the output value of the operator with input size $$(B, N, C, H_{out}, W_{out})$$ is $$(B, C, H, W)$$.

where
• $$B$$ is the batch size.

• $$N$$ denotes the total number of extracted patches stacked in

• $$C$$ denotes the number of input channels.

• $$H$$, $$W$$ the input height and width of the input in pixels.

• $$H_{out}$$, $$W_{out}$$ denote to denote to the patch size defined in the function signature. left-right and top-bottom order.

• original_size is the size of the original image prior to extracting tensor patches and defines the shape of the output patch.

• window_size is the size of the sliding window used while extracting tensor patches.

• unpadding is the amount of padding to be removed. This value must be the same as padding used while extracting tensor patches.

The parameters original_size, window_size, and unpadding can be either:

• a single int – in which case the same value is used for the height and width dimension.

• a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

unpadding can also be a tuple of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.

Parameters
Shape:
• Input: $$(B, N, C, H_{out}, W_{out})$$

• Output: $$(B, C, H, W)$$

Example

>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2))
>>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2))
tensor([[[[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15]]]])


Note

This function is supposed to be used in conjunction with ExtractTensorPatches.

## Image Classification#

class kornia.contrib.VisionTransformer(image_size=224, patch_size=16, in_channels=3, embed_dim=768, depth=12, num_heads=12, dropout_rate=0.0, dropout_attn=0.0, backbone=None)[source]#

Vision transformer (ViT) module.

The module is expected to be used as operator for different vision tasks.

The method is inspired from existing implementations of the paper [DBK+21].

Warning

This is an experimental API subject to changes in favor of flexibility.

Parameters
• image_size (int, optional) – the size of the input image. Default: 224

• patch_size (int, optional) – the size of the patch to compute the embedding. Default: 16

• in_channels (int, optional) – the number of channels for the input. Default: 3

• embed_dim (int, optional) – the embedding dimension inside the transformer encoder. Default: 768

• depth (int, optional) – the depth of the transformer. Default: 12

• num_heads (int, optional) – the number of attention heads. Default: 12

• dropout_rate (float, optional) – dropout rate. Default: 0.0

• dropout_attn (float, optional) – attention dropout rate. Default: 0.0

• backbone (Optional[Module], optional) – an nn.Module to compute the image patches embeddings. Default: None

Example

>>> img = torch.rand(1, 3, 224, 224)
>>> vit = VisionTransformer(image_size=224, patch_size=16)
>>> vit(img).shape
torch.Size([1, 197, 768])

class kornia.contrib.MobileViT(mode='xxs', in_channels=3, patch_size=(2, 2), dropout=0.0)[source]#

Module MobileViT. Default arguments is for MobileViT XXS.

Parameters

Example

>>> img = torch.rand(1, 3, 256, 256)
>>> mvit = MobileViT(mode='xxs')
>>> mvit(img).shape
torch.Size([1, 320, 8, 8])


Module to be used as a classification head.

Parameters
• embed_size (int, optional) – the logits tensor coming from the networks. Default: 768

• num_classes (int, optional) – an integer representing the numbers of classes to classify. Default: 10

Example

>>> feat = torch.rand(1, 256, 256)
torch.Size([1, 10])


## Image Stitching#

class kornia.contrib.ImageStitcher(matcher, estimator='ransac', blending_method='naive')[source]#

Stitch two images with overlapping fields of view.

Parameters
• matcher (Module) – image feature matching module.

• estimator (str, optional) – method to compute homography, either “vanilla” or “ransac”. “ransac” is slower with a better accuracy. Default: 'ransac'

• blending_method (str, optional) – method to blend two images together. Only “naive” is currently supported. Default: 'naive'

Note

Current implementation requires strict image ordering from left to right.

IS = ImageStitcher(KF.LoFTR(pretrained='outdoor'), estimator='ransac').cuda()
# Compute the stitched result with less GPU memory cost.
with torch.inference_mode():
out = IS(img_left, img_right)
# Show the result
plt.imshow(K.tensor_to_image(out))


## Lambda#

class kornia.contrib.Lambda(func)[source]#

Applies user-defined lambda as a transform.

Parameters

func (Callable) – Callable function.

Returns

The output of the user-defined lambda.

Example

>>> import kornia
>>> x = torch.rand(1, 3, 5, 5)
>>> f = Lambda(lambda x: kornia.color.rgb_to_grayscale(x))
>>> f(x).shape
torch.Size([1, 1, 5, 5])


## Distance Transform#

kornia.contrib.distance_transform(image, kernel_size=3, h=0.35)[source]#

Approximates the Manhattan distance transform of images using cascaded convolution operations.

The value at each pixel in the output represents the distance to the nearest non-zero pixel in the image image. It uses the method described in [PDP20]. The transformation is applied independently across the channel dimension of the images.

Parameters
• image (Tensor) – Image with shape $$(B,C,H,W)$$.

• kernel_size (int, optional) – size of the convolution kernel. Default: 3

• h (float, optional) – value that influence the approximation of the min function. Default: 0.35

Return type

Tensor

Returns

tensor with shape $$(B,C,H,W)$$.

Example

>>> tensor = torch.zeros(1, 1, 5, 5)
>>> tensor[:,:, 1, 2] = 1
>>> dt = kornia.contrib.distance_transform(tensor)

kornia.contrib.diamond_square(output_size, roughness=0.5, random_scale=1.0, random_fn=torch.rand, normalize_range=None, device=None, dtype=None)[source]#

Generates Plasma Fractal Images using the diamond square algorithm.

Parameters
Return type

Tensor

Returns

A tensor with shape $$(B,C,H,W)$$ containing the fractal image.

class kornia.contrib.DistanceTransform(kernel_size=3, h=0.35)[source]#

Module that approximates the Manhattan (city block) distance transform of images using convolutions.

Parameters
• kernel_size (int, optional) – size of the convolution kernel. Default: 3

• h (float, optional) – value that influence the approximation of the min function. Default: 0.35