# kornia.feature¶

## Non Maxima Suppression¶

non_maxima_suppression2d(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor

Applies non maxima suppression to filter.

See NonMaximaSuppression2d for details.

non_maxima_suppression3d(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor

Applies non maxima suppression to filter.

See NonMaximaSuppression3d for details.

nms2d(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor[source]

Applies non maxima suppression to filter.

See NonMaximaSuppression2d for details.

nms3d(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor[source]

Applies non maxima suppression to filter.

See NonMaximaSuppression3d for details.

## Detectors¶

gftt_response(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the Shi-Tomasi cornerness function. Function does not do any normalization or nms. The response map is computed according the following formulation:

$R = min(eig(M))$

where:

$\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}$
Parameters
• input (torch.Tensor) – 4d tensor

• grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

• sigmas (optional, torch.Tensor) – coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
• Input: $$(B, C, H, W)$$

• Output: $$(B, C, H, W)$$

Examples

>>> input = torch.tensor([[[
...    [0., 0., 0., 0., 0., 0., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 0., 0., 0., 0., 0., 0.],
... ]]])  # 1x1x7x7
>>> # compute the response map
gftt_response(input)
tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155],
[0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
[0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
[0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])

harris_response(input: torch.Tensor, k: Union[torch.Tensor, float] = 0.04, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the Harris cornerness function. Function does not do any normalization or nms.The response map is computed according the following formulation:

$R = max(0, det(M) - k \cdot trace(M)^2)$

where:

$\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}$

and $$k$$ is an empirically determined constant $$k ∈ [ 0.04 , 0.06 ]$$

Parameters
• input – torch.Tensor: 4d tensor

• k (torch.Tensor) – the Harris detector free parameter.

• grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

• sigmas (optional, torch.Tensor) –

coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
• Input: $$(B, C, H, W)$$

• Output: $$(B, C, H, W)$$

Examples

>>> input = torch.tensor([[[
...    [0., 0., 0., 0., 0., 0., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 0., 0., 0., 0., 0., 0.],
... ]]])  # 1x1x7x7
>>> # compute the response map
harris_response(input, 0.04)
tensor([[[[0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012],
[0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039],
[0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020],
[0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039],
[0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012]]]])

hessian_response(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the absolute of determinant of the Hessian matrix. Function does not do any normalization or nms. The response map is computed according the following formulation:

$R = det(H)$

where:

$\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \\ \end{bmatrix}\end{split}$
Parameters
• input – torch.Tensor: 4d tensor

• grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

• sigmas (optional, torch.Tensor) –

coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
• Input: $$(B, C, H, W)$$

• Output: $$(B, C, H, W)$$

Examples

>>> input = torch.tensor([[[
...    [0., 0., 0., 0., 0., 0., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 1., 1., 1., 1., 1., 0.],
...    [0., 0., 0., 0., 0., 0., 0.],
... ]]])  # 1x1x7x7
>>> # compute the response map
hessian_response(input)
tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155],
[0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
[0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
[0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])

dog_response(input: torch.Tensor) → torch.Tensor[source]

Computes the Difference-of-Gaussian response given the Gaussian 5d input:

Parameters

input – torch.Tensor: 5d tensor

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
• Input: $$(B, C, D, H, W)$$

• Output: $$(B, C, D-1, H, W)$$

## Descriptors¶

class SIFTDescriptor(patch_size: int = 41, num_ang_bins: int = 8, num_spatial_bins: int = 4, rootsift: bool = True, clipval: float = 0.2)[source]

Module, which computes SIFT descriptors of given patches

Parameters
• patch_size – (int) Input patch size in pixels (41 is default)

• num_ang_bins – (int) Number of angular bins. (8 is default)

• num_spatial_bins – (int) Number of spatial bins (4 is default)

• clipval – (float) default 0.2

• rootsift – (bool) if True, RootSIFT (Arandjelović et. al, 2012)

• computed (is) –

Returns

SIFT descriptor of the patches

Return type

torch.Tensor

Shape:
• Input: (B, 1, num_spatial_bins, num_spatial_bins)

• Output: (B, num_ang_bins * num_spatial_bins ** 2)

Examples::
>>> input = torch.rand(23, 1, 32, 32)
>>> SIFT = SIFTDescriptor(32, 8, 4)
>>> descs = SIFT(input) # 23x128

class MKDDescriptor(patch_size: int = 32, kernel_type: str = 'concat', whitening: str = 'pcawt', training_set: str = 'liberty', output_dims: int = 128)[source]

Module that computes Multiple Kernel local descriptors.

This is based on the paper “Understanding and Improving Kernel Local Descriptors”. See [MTB+19] for more details.

Parameters
• patch_size – (int) Input patch size in pixels (32 is default).

• kernel_type – (str) Parametrization of kernel ‘concat’, ‘cart’, ‘polar’ (‘concat’ is default).

• whitening – (str) Whitening transform to apply None, ‘lw’, ‘pca’, ‘pcawt’, ‘pcaws’ (‘pcawt’ is default).

• training_set – (str) Set that model was trained on ‘liberty’, ‘notredame’, ‘yosemite’ (‘liberty’ is default).

• output_dims – (int) Dimensionality reduction (128 is default).

Returns

Explicit cartesian or polar embedding.

Return type

torch.Tensor

Shape:
• Input: $$(B, in_dims, fmap_size, fmap_size)$$.

• Output: $$(B, out_dims, fmap_size, fmap_size)$$,

Examples

>>> patches = torch.rand(23, 1, 32, 32)
>>> mkd = MKDDescriptor(patch_size=32,
...                     kernel_type='concat',
...                     whitening='pcawt',
...                     training_set='liberty',
...                     output_dims=128)
>>> desc = mkd(patches) # 23x128

class HardNet(pretrained: bool = False)[source]

Module, which computes HardNet descriptors of given grayscale patches of 32x32.

This is based on the original code from paper “Working hard to know your neighbor’s margins: Local descriptor learning loss”. See [MMRM17] for more details.

Parameters

pretrained – (bool) Download and set pretrained weights to the model. Default: false.

Returns

HardNet descriptor of the patches.

Return type

torch.Tensor

Shape:
• Input: (B, 1, 32, 32)

• Output: (B, 128)

Examples

>>> input = torch.rand(16, 1, 32, 32)
>>> hardnet = HardNet()
>>> descs = hardnet(input) # 16x128

class HardNet8(pretrained: bool = False)[source]

Module, which computes HardNet8 descriptors of given grayscale patches of 32x32.

This is based on the original code from paper “Improving the HardNet Descriptor”. See [Pul20] for more details.

Parameters

pretrained – (bool) Download and set pretrained weights to the model. Default: false.

Returns

HardNet8 descriptor of the patches.

Return type

torch.Tensor

Shape:
• Input: (B, 1, 32, 32)

• Output: (B, 128)

Examples

>>> input = torch.rand(16, 1, 32, 32)
>>> hardnet = HardNet8()
>>> descs = hardnet(input) # 16x128

class TFeat(pretrained: bool = False)[source]

Module, which computes TFeat descriptors of given grayscale patches of 32x32.

This is based on the original code from paper “Learning local feature descriptors with triplets and shallow convolutional neural networks”. See [BRPM16] for more details

Parameters

pretrained – (bool) Download and set pretrained weights to the model. Default: false.

Returns

TFeat descriptor of the patches.

Return type

torch.Tensor

Shape:
• Input: (B, 1, 32, 32)

• Output: (B, 128)

Examples

>>> input = torch.rand(16, 1, 32, 32)
>>> tfeat = TFeat()
>>> descs = tfeat(input) # 16x128

class SOSNet(pretrained: bool = False)[source]

128-dimensional SOSNet model definition for 32x32 patches.

This is based on the original code from paper “SOSNet:Second Order Similarity Regularization for Local Descriptor Learning”.

Parameters

pretrained (bool) – Download and set pretrained weights to the model. Default: false.

Shape:
• Input: (B, 1, 32, 32)

• Output: (B, 128)

Examples

>>> input = torch.rand(8, 1, 32, 32)
>>> sosnet = SOSNet()
>>> descs = sosnet(input) # 8x128


## Matching¶

match_nn(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds nearest neighbors in desc2 for each vector in desc1.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
• desc1 (torch.Tensor) – Batch of descriptors of a shape $$(B1, D)$$.

• desc2 (torch.Tensor) – Batch of descriptors of a shape $$(B2, D)$$.

• dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of $$(B1, B2)$$.

Returns

• Descriptor distance of matching descriptors, shape of $$(B1, 1)$$.

• Long tensor indexes of matching descriptors in desc1 and desc2, shape of $$(B1, 2)$$.

Return type

Tuple[torch.Tensor, torch.Tensor]

match_mnn(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds mutual nearest neighbors in desc2 for each vector in desc1.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
• desc1 (torch.Tensor) – Batch of descriptors of a shape $$(B1, D)$$.

• desc2 (torch.Tensor) – Batch of descriptors of a shape $$(B2, D)$$.

• dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of $$(B1, B2)$$.

Returns

• Descriptor distance of matching descriptors, shape of. $$(B3, 1)$$.

• Long tensor indexes of matching descriptors in desc1 and desc2, shape of $$(B3, 2)$$, where 0 <= B3 <= min(B1, B2)

Return type

Tuple[torch.Tensor, torch.Tensor]

match_snn(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds nearest neighbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
• desc1 (torch.Tensor) – Batch of descriptors of a shape $$(B1, D)$$.

• desc2 (torch.Tensor) – Batch of descriptors of a shape $$(B2, D)$$.

• th (float) – distance ratio threshold.

• dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of $$(B1, B2)$$.

Returns

• Descriptor distance of matching descriptors, shape of $$(B3, 1)$$.

• Long tensor indexes of matching descriptors in desc1 and desc2. Shape: $$(B3, 2)$$, where 0 <= B3 <= B1.

Return type

Tuple[torch.Tensor, torch.Tensor]

match_smnn(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds mutual nearest neighbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
• desc1 (torch.Tensor) – Batch of descriptors of a shape $$(B1, D)$$.

• desc2 (torch.Tensor) – Batch of descriptors of a shape $$(B2, D)$$.

• th (float) – distance ratio threshold.

• dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of $$(B1, B2)$$.

Returns

• Descriptor distance of matching descriptors, shape of. $$(B3, 1)$$.

• Long tensor indexes of matching descriptors in desc1 and desc2, shape of $$(B3, 2)$$ where 0 <= B3 <= B1.

Return type

Tuple[torch.Tensor, torch.Tensor]

## Local Affine Frames (LAF)¶

extract_patches_from_pyramid(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]

Extract patches defined by LAFs from image tensor. Patches are extracted from appropriate pyramid level

Parameters
• laf – (torch.Tensor).

• images – (torch.Tensor) images, LAFs are detected in

• PS – (int) patch size, default = 32

• normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True

Returns

(torch.Tensor) $$(B, N, CH, PS,PS)$$

Return type

patches

extract_patches_simple(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]

Extract patches defined by LAFs from image tensor. No smoothing applied, huge aliasing (better use extract_patches_from_pyramid)

Parameters
• img – (torch.Tensor) images, LAFs are detected in

• laf – (torch.Tensor).

• PS – (int) patch size, default = 32

• normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True

Returns

(torch.Tensor) $$(B, N, CH, PS,PS)$$

Return type

patches

normalize_laf(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]
Normalizes LAFs to [0,1] scale from pixel scale. See below:

B,N,H,W = images.size() MIN_SIZE = min(H,W) [a11 a21 x] [a21 a22 y] becomes: [a11/MIN_SIZE a21/MIN_SIZE x/W] [a21/MIN_SIZE a22/MIN_SIZE y/H]

Parameters
• LAF – (torch.Tensor).

• images – (torch.Tensor) images, LAFs are detected in

Returns

(torch.Tensor).

Return type

LAF

Shape:
• Input: $$(B, N, 2, 3)$$

• Output: $$(B, N, 2, 3)$$

denormalize_laf(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]
De-normalizes LAFs from scale to image scale.

B,N,H,W = images.size() MIN_SIZE = min(H,W) [a11 a21 x] [a21 a22 y] becomes [a11*MIN_SIZE a21*MIN_SIZE x*W] [a21*MIN_SIZE a22*MIN_SIZE y*H]

Parameters
• LAF – (torch.Tensor).

• images – (torch.Tensor) images, LAFs are detected in

Returns

(torch.Tensor).

Return type

LAF

Shape:
• Input: $$(B, N, 2, 3)$$

• Output: $$(B, N, 2, 3)$$

laf_to_boundary_points(LAF: torch.Tensor, n_pts: int = 50) → torch.Tensor[source]

Converts LAFs to boundary points of the regions + center. Used for local features visualization, see visualize_laf function

Parameters
• LAF – (torch.Tensor).

• n_pts – number of points to output

Returns

(torch.Tensor) tensor of boundary points

Return type

pts

Shape:
• Input: $$(B, N, 2, 3)$$

• Output: $$(B, N, n_pts, 2)$$

ellipse_to_laf(ells: torch.Tensor) → torch.Tensor[source]

Converts ellipse regions to LAF format. Ellipse (a, b, c) and upright covariance matrix [a11 a12; 0 a22] are connected by inverse matrix square root: A = invsqrt([a b; b c]) See also https://github.com/vlfeat/vlfeat/blob/master/toolbox/sift/vl_frame2oell.m

Parameters

ells – (torch.Tensor): tensor of ellipses in Oxford format [x y a b c].

Returns

(torch.Tensor) tensor of ellipses in LAF format.

Return type

LAF

Shape:
• Input: $$(B, N, 5)$$

• Output: $$(B, N, 2, 3)$$

Example

>>> input = torch.ones(1, 10, 5)  # BxNx5
>>> output = ellipse_to_laf(input)  #  BxNx2x3

make_upright(laf: torch.Tensor, eps: float = 1e-09) → torch.Tensor[source]

Rectifies the affine matrix, so that it becomes upright

Parameters
• laf – (torch.Tensor): tensor of LAFs.

• eps (float) – for safe division, (default 1e-9)

Returns

tensor of same shape.

Return type

torch.Tensor

Shape:
• Input: $$(B, N, 2, 3)$$

• Output: $$(B, N, 2, 3)$$

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = make_upright(input)  #  BxNx2x3

scale_laf(laf: torch.Tensor, scale_coef: Union[float, torch.Tensor]) → torch.Tensor[source]

Multiplies region part of LAF ([:, :, :2, :2]) by a scale_coefficient. So the center, shape and orientation of the local feature stays the same, but the region area changes.

Parameters
• laf – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].

• scale_coef – (torch.Tensor): broadcastable tensor or float.

Returns

tensor BxNx2x3 .

Return type

torch.Tensor

Shape:
• Input: :math: (B, N, 2, 3)

• Input: :math: (B, N,) or ()

• Output: :math: (B, N, 1, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> scale = 0.5
>>> output = scale_laf(input, scale)  # BxNx2x3

get_laf_scale(LAF: torch.Tensor) → torch.Tensor[source]

Returns a scale of the LAFs

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].

Returns

tensor BxNx1x1 .

Return type

torch.Tensor

Shape:
• Input: :math: (B, N, 2, 3)

• Output: :math: (B, N, 1, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = get_laf_scale(input)  # BxNx1x1

get_laf_center(LAF: torch.Tensor) → torch.Tensor[source]

Returns a center (keypoint) of the LAFs

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3].

Returns

tensor BxNx2 .

Return type

torch.Tensor

Shape:
• Input: :math: (B, N, 2, 3)

• Output: :math: (B, N, 2)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = get_laf_center(input)  # BxNx2

get_laf_orientation(LAF: torch.Tensor) → torch.Tensor[source]

Returns orientation of the LAFs, in degrees.

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3].

Returns

tensor BxNx1 .

Return type

torch.Tensor

Shape:
• Input: :math: (B, N, 2, 3)

• Output: :math: (B, N, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = get_laf_orientation(input)  # BxNx1

laf_from_center_scale_ori(xy: torch.Tensor, scale: torch.Tensor, ori: torch.Tensor) → torch.Tensor[source]

Returns orientation of the LAFs, in radians. Useful to create kornia LAFs from OpenCV keypoints

Parameters
• xy – (torch.Tensor): tensor [BxNx2].

• scale – (torch.Tensor): tensor [BxNx1x1].

• ori – (torch.Tensor): tensor [BxNx1].

Returns

tensor BxNx2x3 .

Return type

torch.Tensor

laf_is_inside_image(laf: torch.Tensor, images: torch.Tensor, border: int = 0) → torch.Tensor[source]

Checks if the LAF is touching or partly outside the image boundary. Returns the mask of LAFs, which are fully inside the image, i.e. valid.

Parameters
• laf (torch.Tensor) – $$(B, N, 2, 3)$$

• images (torch.Tensor) – images, lafs are detected in $$(B, CH, H, W)$$

• border (int) – additional border

Returns

$$(B, N)$$

Return type

laf_to_three_points(laf: torch.Tensor)[source]

Converts local affine frame(LAF) to alternative representation: coordinates of LAF center, LAF-x unit vector, LAF-y unit vector.

Parameters

laf (torch.Tensor) – $$(B, N, 2, 3)$$

Returns

$$(B, N, 2, 3)$$

Return type

threepts (torch.Tensor)

laf_from_three_points(threepts: torch.Tensor)[source]

Converts three points to local affine frame. Order is (0,0), (0, 1), (1, 0).

Parameters

threepts (torch.Tensor) – $$(B, N, 2, 3)$$

Returns

$$(B, N, 2, 3)$$

Return type

laf (torch.Tensor)

raise_error_if_laf_is_not_valid(laf: torch.Tensor) → None[source]

Auxilary function, which verifies that input is a torch.tensor of [BxNx2x3] shape

Parameters

laf

## Module¶

class NonMaximaSuppression2d(kernel_size: Tuple[int, int])[source]

Applies non maxima suppression to filter.

class NonMaximaSuppression3d(kernel_size: Tuple[int, int, int])[source]

Applies non maxima suppression to filter.

class BlobHessian(grads_mode='sobel')[source]

nn.Module that calculates Hessian blobs See hessian_response() for details.

class CornerGFTT(grads_mode='sobel')[source]

nn.Module that calculates Shi-Tomasi corners See gfft_response() for details.

class CornerHarris(k: Union[float, torch.Tensor], grads_mode='sobel')[source]

nn.Module that calculates Harris corners See harris_response() for details.

class BlobDoG[source]

nn.Module that calculates Difference-of-Gaussians blobs See dog_response() for details.

class ScaleSpaceDetector(num_features: int = 500, mr_size: float = 6.0, scale_pyr_module: torch.nn.modules.module.Module = ScalePyramid(n_levels=3, init_sigma=1.6, min_size=15, extra_levels=3, border=6, sigma_step=1.2599210498948732, double_image=False), resp_module: torch.nn.modules.module.Module = BlobHessiangrads_mode=sobel), nms_module: torch.nn.modules.module.Module = ConvSoftArgmax3d(kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), temperature=tensor(1.), normalized_coordinates=False, eps=1e-08, strict_maxima_bonus=0.0, output_value=True), ori_module: torch.nn.modules.module.Module = PassLAF(), aff_module: torch.nn.modules.module.Module = PassLAF(), minima_are_also_good: bool = False, scale_space_response=False)[source]
Module for differentiable local feature detection, as close as possible to classical

local feature detectors like Harris, Hessian-Affine or SIFT (DoG). It has 5 modules inside: scale pyramid generator, response (“cornerness”) function, soft nms function, affine shape estimator and patch orientation estimator. Each of those modules could be replaced with learned custom one, as long, as they respect output shape.

Parameters
• num_features – (int) Number of features to detect. default = 500. In order to keep everything batchable, output would always have num_features output, even for completely homogeneous images.

• mr_size – (float), default 6.0. Multiplier for local feature scale compared to the detection scale. 6.0 is matching OpenCV 12.0 convention for SIFT.

• scale_pyr_module – (nn.Module), which generates scale pyramid. See ScalePyramid for details. Default is ScalePyramid(3, 1.6, 10)

• resp_module – (nn.Module), which calculates ‘cornerness’ of the pixel. Default is BlobHessian().

• nms_module – (nn.Module), which outputs per-patch coordinates of the response maxima. See ConvSoftArgmax3d for details.

• ori_module – (nn.Module) for local feature orientation estimation. Default is PassLAF, which does nothing. See LAFOrienter for details.

• aff_module – (nn.Module) for local feature affine shape estimation. Default is PassLAF, which does nothing. See LAFAffineShapeEstimator for details.

• minima_are_also_good – (bool) if True, then both response function minima and maxima are detected Useful for symmetric response functions like DoG or Hessian. Default is False

forward(img: torch.Tensor, mask: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Three stage local feature detection. First the location and scale of interest points are determined by detect function. Then affine shape and orientation.

Parameters
• img (torch.Tensor) – image to extract features with shape [BxCxHxW]

• mask (torch.Tensor, optional) – a mask with weights where to apply the

• function. The shape must be the same as the input image. (response) –

Returns

shape [BxNx2x3]. Detected local affine frames. responses (torch.Tensor): shape [BxNx1]. Response function values for corresponding lafs

Return type

lafs (torch.Tensor)

class PassLAF[source]

Dummy module to use instead of local feature orientation or affine shape estimator

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
• laf – torch.Tensor: 4d tensor

• img (torch.Tensor) – the input image tensor

Returns

unchanged laf from the input.

Return type

torch.Tensor

class PatchAffineShapeEstimator(patch_size: int = 19, eps: float = 1e-10)[source]

Module, which estimates the second moment matrix of the patch gradients in order to determine the affine shape of the local feature as in [Baumberg00].

Parameters
• patch_size – int, default = 19

• eps – float, for safe division, default is 1e-10

forward(patch: torch.Tensor) → torch.Tensor[source]
Parameters

patch – (torch.Tensor) shape [Bx1xHxW]

Returns

ellipse_shape shape [Bx1x3]

Return type

torch.Tensor

class LAFAffineShapeEstimator(patch_size: int = 32, affine_shape_detector: Optional[torch.nn.modules.module.Module] = None)[source]

Module, which extracts patches using input images and local affine frames (LAFs), then runs PatchAffineShapeEstimator on patches to estimate LAFs shape. Then original LAF shape is replaced with estimated one. The original LAF orientation is not preserved, so it is recommended to first run LAFAffineShapeEstimator and then LAFOrienter.

Parameters
• patch_size – int, default = 32

• affine_shape_detector – nn.Module. Patch affine shape estimator, e.g. PatchAffineShapeEstimator. Default: None

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
• laf – (torch.Tensor) shape [BxNx2x3]

• img – (torch.Tensor) shape [Bx1xHxW]

Returns

laf_out shape [BxNx2x3]

Return type

torch.Tensor

class LAFOrienter(patch_size: int = 32, num_angular_bins: int = 36, angle_detector: Optional[torch.nn.modules.module.Module] = None)[source]

Module, which extracts patches using input images and local affine frames (LAFs), then runs PatchDominantGradientOrientation or OriNet on patches and then rotates the LAFs by the estimated angles

Parameters
• patch_size – int, default = 32

• num_angular_bins – int, default is 36

• angle_detector – nn.Module. Patch orientation estimator, e.g. PatchDominantGradientOrientation or OriNet. Default: None

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
• laf – (torch.Tensor), shape [BxNx2x3]

• img – (torch.Tensor), shape [Bx1xHxW]

Returns

laf_out, shape [BxNx2x3]

Return type

torch.Tensor

class PatchDominantGradientOrientation(patch_size: int = 32, num_angular_bins: int = 36, eps: float = 1e-08)[source]

Module, which estimates the dominant gradient orientation of the given patches, in radians. Zero angle points towards right.

Parameters
• patch_size – int, default = 32

• num_angular_bins – int, default is 36

• eps – float, for safe division, and arctan, default is 1e-8

forward(patch: torch.Tensor) → torch.Tensor[source]
Parameters

patch – (torch.Tensor) shape [Bx1xHxW]

Returns

angle shape [B]

Return type

torch.Tensor

class OriNet(pretrained: bool = False, eps: float = 1e-08)[source]

Network, which estimates the canonical orientation of the given 32x32 patches, in radians. Zero angle points towards right. This is based on the original code from paper “Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability””. See [MRM18] for more details.

Parameters
• pretrained – (bool) Download and set pretrained weights to the model. Default: false.

• eps – (float) to avoid division by zero in atan2. Default: 1e-6.

Returns

Return type

torch.Tensor

Shape:
• Input: (B, 1, 32, 32)

• Output: (B)

Examples

>>> input = torch.rand(16, 1, 32, 32)
>>> orinet = OriNet()
>>> angle = orinet(input) # 16

forward(patch: torch.Tensor) → torch.Tensor[source]
Parameters

patch – (torch.Tensor) shape [Bx1xHxW]

Returns

(torch.Tensor) shape [B]

Return type

patch

class LAFAffNetShapeEstimator(pretrained: bool = False)[source]

Module, which extracts patches using input images and local affine frames (LAFs), then runs AffNet on patches to estimate LAFs shape. This is based on the original code from paper “Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability””. See [MRM18] for more details. Then original LAF shape is replaced with estimated one. The original LAF orientation is not preserved, so it is recommended to first run LAFAffineShapeEstimator and then LAFOrienter.

Parameters

pretrained – (bool) Download and set pretrained weights to the model. Default: false.

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
• laf – (torch.Tensor) shape [BxNx2x3]

• img – (torch.Tensor) shape [Bx1xHxW]

Returns

laf_out shape [BxNx2x3]

Return type

torch.Tensor