kornia.contrib#
Edge Detection#
- class kornia.contrib.EdgeDetector[source]#
Detect edges in a given image using a CNN.
By default, it uses the method described in [SRS20].
- Returns
A tensor of shape \((B,1,H,W)\).
Example
>>> img = torch.rand(1, 3, 320, 320) >>> detect = EdgeDetector() >>> out = detect(img) >>> out.shape torch.Size([1, 1, 320, 320])
Face Detection#
- class kornia.contrib.FaceDetector(top_k=5000, confidence_threshold=0.3, nms_threshold=0.3, keep_top_k=750)[source]#
Detect faces in a given image using a CNN.
By default, it uses the method described in [FYP+21].
- Parameters
top_k (
int
, optional) – the maximum number of detections to return before the nms. Default:5000
confidence_threshold (
float
, optional) – the threshold used to discard detections. Default:0.3
nms_threshold (
float
, optional) – the threshold used by the nms for iou. Default:0.3
keep_top_k (
int
, optional) – the maximum number of detections to return after the nms. Default:750
- Returns
A list of B tensors with shape \((N,15)\) to be used with
kornia.contrib.FaceDetectorResult
.
Example
>>> img = torch.rand(1, 3, 320, 320) >>> detect = FaceDetector() >>> res = detect(img)
- class kornia.contrib.FaceKeypoint(value)[source]#
Define the keypoints detected in a face.
The left/right convention is based on the screen viewer.
- EYE_LEFT = 0#
- EYE_RIGHT = 1#
- MOUTH_LEFT = 3#
- MOUTH_RIGHT = 4#
- NOSE = 2#
- class kornia.contrib.FaceDetectorResult(data)[source]#
Encapsulate the results obtained by the
kornia.contrib.FaceDetector
.- Parameters
data (
Tensor
) – the encoded results coming from the feature detector with shape \((14,)\).
- property bottom_right: Tensor#
The [x y] position of the bottom-right coordinate of the bounding box.
- get_keypoint(keypoint)[source]#
The [x y] position of a given facial keypoint.
- Parameters
keypoint (
FaceKeypoint
) – the keypoint type to return the position.- Return type
Interactive Demo#
Visit the Kornia face detection demo on the Hugging Face Spaces.
Image Segmentation#
- kornia.contrib.connected_components(image, num_iterations=100)[source]#
Computes the Connected-component labelling (CCL) algorithm.
The implementation is an adaptation of the following repository:
https://gist.github.com/efirdc/5d8bd66859e574c683a504a4690ae8bc
Warning
This is an experimental API subject to changes and optimization improvements.
Note
See a working example here.
- Parameters
- Return type
- Returns
The labels image with the same shape of the input image.
Example
>>> img = torch.rand(2, 1, 4, 5) >>> img_labels = connected_components(img, num_iterations=100)
Image Patches#
- kornia.contrib.compute_padding(original_size, window_size)[source]#
Compute required padding to ensure chaining of
extract_tensor_patches()
andcombine_tensor_patches()
produces expected result.- Parameters
- Return type
- Returns
The required padding for (top, bottom, left, right) as a tuple of 4 ints.
Example
>>> image = torch.arange(12).view(1, 1, 4, 3) >>> padding = compute_padding((4,3), (3,3)) >>> out = extract_tensor_patches(image, window_size=(3, 3), stride=(3, 3), padding=padding) >>> combine_tensor_patches(out, original_size=(4, 3), window_size=(3, 3), stride=(3, 3), unpadding=padding) tensor([[[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]]])
Note
This function is supposed to be used in conjunction with
extract_tensor_patches()
andcombine_tensor_patches()
.
- kornia.contrib.extract_tensor_patches(input, window_size, stride=1, padding=0)[source]#
Function that extract patches from tensors and stack them.
See
ExtractTensorPatches
for details.- Parameters
input (
Tensor
) – tensor image where to extract the patches with shape \((B, C, H, W)\).window_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the sliding window and the output patch size.stride (
Union
[int
,Tuple
[int
,int
]], optional) – stride of the sliding window. Default:1
padding (
Union
[int
,Tuple
[int
,int
],Tuple
[int
,int
,int
,int
]], optional) – Zero-padding added to both side of the input. Default:0
- Return type
- Returns
the tensor with the extracted patches with shape \((B, N, C, H_{out}, W_{out})\).
Examples
>>> input = torch.arange(9.).view(1, 1, 3, 3) >>> patches = extract_tensor_patches(input, (2, 3)) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> patches[:, -1] tensor([[[[3., 4., 5.], [6., 7., 8.]]]])
- kornia.contrib.combine_tensor_patches(patches, original_size, window_size, stride, unpadding=0)[source]#
Restore input from patches.
See
CombineTensorPatches
for details.- Parameters
patches (
Tensor
) – patched tensor with shape \((B, N, C, H_{out}, W_{out})\).original_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the original tensor and the output patch size.window_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the sliding window used while extracting patches.stride (
Union
[int
,Tuple
[int
,int
]]) – stride of the sliding window.unpadding (
Union
[int
,Tuple
[int
,int
],Tuple
[int
,int
,int
,int
]], optional) – remove the padding added to both side of the input. Default:0
- Return type
- Returns
The combined patches in an image tensor with shape \((B, C, H, W)\).
Example
>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2)) >>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2)) tensor([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]]])
Note
This function is supposed to be used in conjunction with
extract_tensor_patches()
.
- class kornia.contrib.ExtractTensorPatches(window_size, stride=1, padding=0)[source]#
Module that extract patches from tensors and stack them.
In the simplest case, the output value of the operator with input size \((B, C, H, W)\) is \((B, N, C, H_{out}, W_{out})\).
- where
\(B\) is the batch size.
\(N\) denotes the total number of extracted patches stacked in
\(C\) denotes the number of input channels.
\(H\), \(W\) the input height and width of the input in pixels.
\(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.
window_size
is the size of the sliding window and controls the shape of the output tensor and defines the shape of the output patch.stride
controls the stride to apply to the sliding window and regulates the overlapping between the extracted patches.padding
controls the amount of implicit zeros-paddings on both sizes at each dimension.
The parameters
window_size
,stride
andpadding
can be either:a single
int
– in which case the same value is used for the height and width dimension.a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.
padding
can also be atuple
of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.- Parameters
window_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the sliding window and the output patch size.stride (
Union
[int
,Tuple
[int
,int
],None
], optional) – stride of the sliding window. Default:1
padding (
Union
[int
,Tuple
[int
,int
],Tuple
[int
,int
,int
,int
],None
], optional) – Zero-padding added to both side of the input. Default:0
- Shape:
Input: \((B, C, H, W)\)
Output: \((B, N, C, H_{out}, W_{out})\)
- Returns
the tensor with the extracted patches.
Examples
>>> input = torch.arange(9.).view(1, 1, 3, 3) >>> patches = extract_tensor_patches(input, (2, 3)) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> patches[:, -1] tensor([[[[3., 4., 5.], [6., 7., 8.]]]])
- class kornia.contrib.CombineTensorPatches(original_size, window_size, unpadding=0)[source]#
Module that combine patches from tensors.
In the simplest case, the output value of the operator with input size \((B, N, C, H_{out}, W_{out})\) is \((B, C, H, W)\).
- where
\(B\) is the batch size.
\(N\) denotes the total number of extracted patches stacked in
\(C\) denotes the number of input channels.
\(H\), \(W\) the input height and width of the input in pixels.
\(H_{out}\), \(W_{out}\) denote to denote to the patch size defined in the function signature. left-right and top-bottom order.
original_size
is the size of the original image prior to extracting tensor patches and defines the shape of the output patch.window_size
is the size of the sliding window used while extracting tensor patches.unpadding
is the amount of padding to be removed. This value must be the same as padding used while extracting tensor patches.
The parameters
original_size
,window_size
, andunpadding
can be either:a single
int
– in which case the same value is used for the height and width dimension.a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.
unpadding
can also be atuple
of four ints – in which case, the first two ints are for the height dimension while the last two ints are for the width dimension.- Parameters
patches – patched tensor.
original_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the original tensor and the output patch size.window_size (
Union
[int
,Tuple
[int
,int
]]) – the size of the sliding window used.unpadding (
Union
[int
,Tuple
[int
,int
],Tuple
[int
,int
,int
,int
]], optional) – remove the padding added to both side of the input. Default:0
- Shape:
Input: \((B, N, C, H_{out}, W_{out})\)
Output: \((B, C, H, W)\)
Example
>>> out = extract_tensor_patches(torch.arange(16).view(1, 1, 4, 4), window_size=(2, 2), stride=(2, 2)) >>> combine_tensor_patches(out, original_size=(4, 4), window_size=(2, 2), stride=(2, 2)) tensor([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]]])
Note
This function is supposed to be used in conjunction with
ExtractTensorPatches
.
Image Classification#
- class kornia.contrib.VisionTransformer(image_size=224, patch_size=16, in_channels=3, embed_dim=768, depth=12, num_heads=12, dropout_rate=0.0, dropout_attn=0.0, backbone=None)[source]#
Vision transformer (ViT) module.
The module is expected to be used as operator for different vision tasks.
The method is inspired from existing implementations of the paper [DBK+21].
Warning
This is an experimental API subject to changes in favor of flexibility.
- Parameters
image_size (
int
, optional) – the size of the input image. Default:224
patch_size (
int
, optional) – the size of the patch to compute the embedding. Default:16
in_channels (
int
, optional) – the number of channels for the input. Default:3
embed_dim (
int
, optional) – the embedding dimension inside the transformer encoder. Default:768
depth (
int
, optional) – the depth of the transformer. Default:12
num_heads (
int
, optional) – the number of attention heads. Default:12
dropout_rate (
float
, optional) – dropout rate. Default:0.0
dropout_attn (
float
, optional) – attention dropout rate. Default:0.0
backbone (
Optional
[Module
], optional) – an nn.Module to compute the image patches embeddings. Default:None
Example
>>> img = torch.rand(1, 3, 224, 224) >>> vit = VisionTransformer(image_size=224, patch_size=16) >>> vit(img).shape torch.Size([1, 197, 768])
- class kornia.contrib.MobileViT(mode='xxs', in_channels=3, patch_size=(2, 2), dropout=0.0)[source]#
Module MobileViT. Default arguments is for MobileViT XXS.
Paper: https://arxiv.org/abs/2110.02178 Based on: https://github.com/chinhsuanwu/mobilevit-pytorch
- Parameters
mode (
str
, optional) – ‘xxs’, ‘xs’ or ‘s’, defaults to ‘xxs’. Default:'xxs'
in_channels (
int
, optional) – the number of channels for the input image. Default:3
patch_size (
Tuple
[int
,int
], optional) – image_size must be divisible by patch_size. Default:(2, 2)
dropout (
float
, optional) – dropout ratio in Transformer. Default:0.0
Example
>>> img = torch.rand(1, 3, 256, 256) >>> mvit = MobileViT(mode='xxs') >>> mvit(img).shape torch.Size([1, 320, 8, 8])
Image Stitching#
- class kornia.contrib.ImageStitcher(matcher, estimator='ransac', blending_method='naive')[source]#
Stitch two images with overlapping fields of view.
- Parameters
matcher (
Module
) – image feature matching module.estimator (
str
, optional) – method to compute homography, either “vanilla” or “ransac”. “ransac” is slower with a better accuracy. Default:'ransac'
blending_method (
str
, optional) – method to blend two images together. Only “naive” is currently supported. Default:'naive'
Note
Current implementation requires strict image ordering from left to right.
IS = ImageStitcher(KF.LoFTR(pretrained='outdoor'), estimator='ransac').cuda() # Compute the stitched result with less GPU memory cost. with torch.inference_mode(): out = IS(img_left, img_right) # Show the result plt.imshow(K.tensor_to_image(out))
Lambda#
- class kornia.contrib.Lambda(func)[source]#
Applies user-defined lambda as a transform.
- Parameters
- Returns
The output of the user-defined lambda.
Example
>>> import kornia >>> x = torch.rand(1, 3, 5, 5) >>> f = Lambda(lambda x: kornia.color.rgb_to_grayscale(x)) >>> f(x).shape torch.Size([1, 1, 5, 5])
Distance Transform#
- kornia.contrib.distance_transform(image, kernel_size=3, h=0.35)[source]#
Approximates the Manhattan distance transform of images using cascaded convolution operations.
The value at each pixel in the output represents the distance to the nearest non-zero pixel in the image image. It uses the method described in [PDP20]. The transformation is applied independently across the channel dimension of the images.
- Parameters
- Return type
- Returns
tensor with shape \((B,C,H,W)\).
Example
>>> tensor = torch.zeros(1, 1, 5, 5) >>> tensor[:,:, 1, 2] = 1 >>> dt = kornia.contrib.distance_transform(tensor)
- kornia.contrib.diamond_square(output_size, roughness=0.5, random_scale=1.0, random_fn=torch.rand, normalize_range=None, device=None, dtype=None)[source]#
Generates Plasma Fractal Images using the diamond square algorithm.
See: https://en.wikipedia.org/wiki/Diamond-square_algorithm
- Parameters
output_size (
Tuple
[int
,int
,int
,int
]) – a tuple of integers with the BxCxHxW of the image to be generated.roughness (
Union
[float
,Tensor
], optional) – the scale value to apply at each recursion step. Default:0.5
random_scale (
Union
[float
,Tensor
], optional) – the initial value of the scale for recursion. Default:1.0
random_fn (
Callable
[...
,Tensor
], optional) – the callable function to use to sample a random tensor. Default:torch.rand
normalize_range (
Optional
[Tuple
[int
,int
]], optional) – whether to normalize using min-max the output map. In case of a range is specified, min-max norm is applied between the provided range. Default:None
device (
Optional
[device
], optional) – the torch device to place the output map. Default:None
dtype (
Optional
[dtype
], optional) – the torch dtype to place the output map. Default:None
- Return type
- Returns
A tensor with shape \((B,C,H,W)\) containing the fractal image.