kornia.feature

Non Maxima Suppression

non_maxima_suppression2d(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor

Applies non maxima suppression to filter.

See NonMaximaSuppression2d for details.

non_maxima_suppression3d(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor

Applies non maxima suppression to filter.

See NonMaximaSuppression3d for details.

nms2d(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor[source]

Applies non maxima suppression to filter.

See NonMaximaSuppression2d for details.

nms3d(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor[source]

Applies non maxima suppression to filter.

See NonMaximaSuppression3d for details.

Detectors

gftt_response(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the Shi-Tomasi cornerness function. Function does not do any normalization or nms. The response map is computed according the following formulation:

\[R = min(eig(M))\]

where:

\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}\]
Parameters
  • input (torch.Tensor) – 4d tensor

  • grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

  • sigmas (optional, torch.Tensor) – coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, C, H, W)\)

Examples

>>> input = torch.tensor([[[
    [0., 0., 0., 0., 0., 0., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 0., 0., 0., 0., 0., 0.],
]]])  # 1x1x7x7
>>> # compute the response map
>>> output = gftt_response(input)
tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155],
  [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
  [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
  [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
  [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
  [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
  [0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])
harris_response(input: torch.Tensor, k: Union[torch.Tensor, float] = 0.04, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the Harris cornerness function. Function does not do any normalization or nms.The response map is computed according the following formulation:

\[R = max(0, det(M) - k \cdot trace(M)^2)\]

where:

\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}\]

and \(k\) is an empirically determined constant \(k ∈ [ 0.04 , 0.06 ]\)

Parameters
  • input – torch.Tensor: 4d tensor

  • k (torch.Tensor) – the Harris detector free parameter.

  • grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

  • sigmas (optional, torch.Tensor) –

    coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, C, H, W)\)

Examples

>>> input = torch.tensor([[[
    [0., 0., 0., 0., 0., 0., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 0., 0., 0., 0., 0., 0.],
]]])  # 1x1x7x7
>>> # compute the response map
>>> output = harris_response(input, 0.04)
tensor([[[[0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012],
  [0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039],
  [0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020],
  [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
  [0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020],
  [0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039],
  [0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012]]]])
hessian_response(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]

Computes the absolute of determinant of the Hessian matrix. Function does not do any normalization or nms. The response map is computed according the following formulation:

\[R = det(H)\]

where:

\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \\ \end{bmatrix}\end{split}\]
Parameters
  • input – torch.Tensor: 4d tensor

  • grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid

  • sigmas (optional, torch.Tensor) –

    coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, C, H, W)\)

Examples

>>> input = torch.tensor([[[
    [0., 0., 0., 0., 0., 0., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 1., 1., 1., 1., 1., 0.],
    [0., 0., 0., 0., 0., 0., 0.],
]]])  # 1x1x7x7
>>> # compute the response map
>>> output = hessian_response(input)
tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155],
  [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
  [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
  [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
  [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194],
  [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334],
  [0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])
dog_response(input: torch.Tensor) → torch.Tensor[source]

Computes the Difference-of-Gaussian responce given the Gaussian 5d input:

Parameters

input – torch.Tensor: 5d tensor

Returns

the response map per channel.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, D, H, W)\)

  • Output: \((B, C, D-1, H, W)\)

Descriptors

class SIFTDescriptor(patch_size: int = 41, num_ang_bins: int = 8, num_spatial_bins: int = 4, rootsift: bool = True, clipval: float = 0.2)[source]

Module, which computes SIFT descriptors of given patches

Parameters
  • patch_size – (int) Input patch size in pixels (41 is default)

  • num_ang_bins – (int) Number of angular bins. (8 is default)

  • num_spatial_bins – (int) Number of spatial bins (4 is default)

  • clipval – (float) default 0.2

  • rootsift – (bool) if True, RootSIFT (Arandjelović et. al, 2012)

  • computed (is) –

Returns

SIFT descriptor of the patches

Return type

torch.Tensor

Shape:
  • Input: (B, 1, num_spatial_bins, num_spatial_bins)

  • Output: (B, num_ang_bins * num_spatial_bins ** 2)

Examples::
>>> input = torch.rand(23, 1, 32, 32)
>>> SIFT = kornia.SIFTDescriptor(32, 8, 4)
>>> descs = SIFT(input) # 23x128
class HardNet(pretrained: bool = False)[source]

Module, which computes HardNet descriptors of given grayscale patches of 32x32.

This is based on the original code from paper “Working hard to know your neighbor’s margins: Local descriptor learning loss”. See [MMRM17] for more details.

Parameters

pretrained – (bool) Download and set pretrained weights to the model. Default: false.

Returns

HardeNet descriptor of the patches.

Return type

torch.Tensor

Shape:
  • Input: (B, 1, 32, 32)

  • Output: (B, 128)

Examples

>>> input = torch.rand(16, 1, 32, 32)
>>> hardnet = kornia.feature.HardNet()
>>> descs = hardnet(input) # 16x128
class SOSNet(pretrained: bool = False)[source]

128-dimensional SOSNet model definition for 32x32 patches.

This is based on the original code from paper “SOSNet:Second Order Similarity Regularization for Local Descriptor Learning”.

Parameters

pretrained (bool) – Download and set pretrained weights to the model. Default: false.

Shape:
  • Input: (B, 1, 32, 32)

  • Output: (B, 128)

Examples

>>> input = torch.rand(8, 1, 32, 32)
>>> sosnet = kornia.feature.SOSNet()
>>> descs = sosnet(input) # 8x128

Matching

match_nn(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds nearest neightbors in desc2 for each vector in desc1.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
  • desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).

  • desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).

  • dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).

Returns

  • Descriptor distance of matching descriptors, shape of \((B1, 1)\).

  • Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B1, 2)\).

Return type

Tuple[torch.Tensor, torch.Tensor]

match_mnn(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds mutual nearest neightbors in desc2 for each vector in desc1.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
  • desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).

  • desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).

  • dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).

Returns

  • Descriptor distance of matching descriptors, shape of. \((B3, 1)\).

  • Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B3, 2)\), where 0 <= B3 <= min(B1, B2)

Return type

Tuple[torch.Tensor, torch.Tensor]

match_snn(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds nearest neightbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
  • desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).

  • desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).

  • th (float) – distance ratio threshold.

  • dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).

Returns

  • Descriptor distance of matching descriptors, shape of \((B3, 1)\).

  • Long tensor indexes of matching descriptors in desc1 and desc2. Shape: \((B3, 2)\), where 0 <= B3 <= B1.

Return type

Tuple[torch.Tensor, torch.Tensor]

match_smnn(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Function, which finds mutual nearest neightbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.

If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.

Parameters
  • desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).

  • desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).

  • th (float) – distance ratio threshold.

  • dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).

Returns

  • Descriptor distance of matching descriptors, shape of. \((B3, 1)\).

  • Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B3, 2)\) where 0 <= B3 <= B1.

Return type

Tuple[torch.Tensor, torch.Tensor]

Local Affine Frames (LAF)

extract_patches_from_pyramid(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]

Extract patches defined by LAFs from image tensor. Patches are extracted from appropriate pyramid level

Parameters
  • laf – (torch.Tensor).

  • images – (torch.Tensor) images, LAFs are detected in

  • PS – (int) patch size, default = 32

  • normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True

Returns

(torch.Tensor) \((B, N, CH, PS,PS)\)

Return type

patches

extract_patches_simple(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]

Extract patches defined by LAFs from image tensor. No smoothing applied, huge aliasing (better use extract_patches_from_pyramid)

Parameters
  • img – (torch.Tensor) images, LAFs are detected in

  • laf – (torch.Tensor).

  • PS – (int) patch size, default = 32

  • normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True

Returns

(torch.Tensor) \((B, N, CH, PS,PS)\)

Return type

patches

normalize_laf(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]
Normalizes LAFs to [0,1] scale from pixel scale. See below:
>>> B,N,H,W = images.size()
>>> MIN_SIZE = min(H,W)
[a11 a21 x]
[a21 a22 y]
becomes:
[a11/MIN_SIZE a21/MIN_SIZE x/W]
[a21/MIN_SIZE a22/MIN_SIZE y/H]
Parameters
  • LAF – (torch.Tensor).

  • images – (torch.Tensor) images, LAFs are detected in

Returns

(torch.Tensor).

Return type

LAF

Shape:
  • Input: \((B, N, 2, 3)\)

  • Output: \((B, N, 2, 3)\)

denormalize_laf(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]
De-normalizes LAFs from scale to image scale.
>>> B,N,H,W = images.size()
>>> MIN_SIZE = min(H,W)
[a11 a21 x]
[a21 a22 y]
becomes
[a11*MIN_SIZE a21*MIN_SIZE x*W]
[a21*MIN_SIZE a22*MIN_SIZE y*H]
Parameters
  • LAF – (torch.Tensor).

  • images – (torch.Tensor) images, LAFs are detected in

Returns

(torch.Tensor).

Return type

LAF

Shape:
  • Input: \((B, N, 2, 3)\)

  • Output: \((B, N, 2, 3)\)

laf_to_boundary_points(LAF: torch.Tensor, n_pts: int = 50) → torch.Tensor[source]

Converts LAFs to boundary points of the regions + center. Used for local features visualization, see visualize_laf function

Parameters
  • LAF – (torch.Tensor).

  • n_pts – number of points to output

Returns

(torch.Tensor) tensor of boundary points

Return type

pts

Shape:
  • Input: \((B, N, 2, 3)\)

  • Output: \((B, N, n_pts, 2)\)

ellipse_to_laf(ells: torch.Tensor) → torch.Tensor[source]

Converts ellipse regions to LAF format. Ellipse (a, b, c) and upright covariance matrix [a11 a12; 0 a22] are connected by inverse matrix square root: A = invsqrt([a b; b c]) See also https://github.com/vlfeat/vlfeat/blob/master/toolbox/sift/vl_frame2oell.m

Parameters

ells – (torch.Tensor): tensor of ellipses in Oxford format [x y a b c].

Returns

(torch.Tensor) tensor of ellipses in LAF format.

Return type

LAF

Shape:
  • Input: \((B, N, 5)\)

  • Output: \((B, N, 2, 3)\)

Example

>>> input = torch.ones(1, 10, 5)  # BxNx5
>>> output = kornia.ellipse_to_laf(input)  #  BxNx2x3
make_upright(laf: torch.Tensor, eps: float = 1e-09) → torch.Tensor[source]

Rectifies the affine matrix, so that it becomes upright

Parameters
  • laf – (torch.Tensor): tensor of LAFs.

  • eps (float) – for safe division, (default 1e-9)

Returns

tensor of same shape.

Return type

torch.Tensor

Shape:
  • Input: \((B, N, 2, 3)\)

  • Output: \((B, N, 2, 3)\)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = kornia.make_upright(input)  #  BxNx2x3
scale_laf(laf: torch.Tensor, scale_coef: Union[float, torch.Tensor]) → torch.Tensor[source]

Multiplies region part of LAF ([:, :, :2, :2]) by a scale_coefficient. So the center, shape and orientation of the local feature stays the same, but the region area changes.

Parameters
  • laf – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].

  • scale_coef – (torch.Tensor): broadcastable tensor or float.

Returns

tensor BxNx2x3 .

Return type

torch.Tensor

Shape:
  • Input: :math: (B, N, 2, 3)

  • Input: :math: (B, N,) or ()

  • Output: :math: (B, N, 1, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> scale = 0.5
>>> output = kornia.scale_laf(input, scale)  # BxNx2x3
get_laf_scale(LAF: torch.Tensor) → torch.Tensor[source]

Returns a scale of the LAFs

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].

Returns

tensor BxNx1x1 .

Return type

torch.Tensor

Shape:
  • Input: :math: (B, N, 2, 3)

  • Output: :math: (B, N, 1, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = kornia.get_laf_scale(input)  # BxNx1x1
get_laf_center(LAF: torch.Tensor) → torch.Tensor[source]

Returns a center (keypoint) of the LAFs

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3].

Returns

tensor BxNx2 .

Return type

torch.Tensor

Shape:
  • Input: :math: (B, N, 2, 3)

  • Output: :math: (B, N, 2)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = kornia.get_laf_center(input)  # BxNx2
get_laf_orientation(LAF: torch.Tensor) → torch.Tensor[source]

Returns orientation of the LAFs, in degrees.

Parameters

LAF – (torch.Tensor): tensor [BxNx2x3].

Returns

tensor BxNx1 .

Return type

torch.Tensor

Shape:
  • Input: :math: (B, N, 2, 3)

  • Output: :math: (B, N, 1)

Example

>>> input = torch.ones(1, 5, 2, 3)  # BxNx2x3
>>> output = kornia.get_laf_orientation(input)  # BxNx1
laf_from_center_scale_ori(xy: torch.Tensor, scale: torch.Tensor, ori: torch.Tensor) → torch.Tensor[source]

Returns orientation of the LAFs, in radians. Useful to create kornia LAFs from OpenCV keypoints

Parameters
  • xy – (torch.Tensor): tensor [BxNx2].

  • scale – (torch.Tensor): tensor [BxNx1x1].

  • ori – (torch.Tensor): tensor [BxNx1].

Returns

tensor BxNx2x3 .

Return type

torch.Tensor

laf_is_inside_image(laf: torch.Tensor, images: torch.Tensor, border: int = 0) → torch.Tensor[source]

Checks if the LAF is touching or partly outside the image boundary. Returns the mask of LAFs, which are fully inside the image, i.e. valid.

Parameters
  • laf (torch.Tensor) – \((B, N, 2, 3)\)

  • images (torch.Tensor) – images, lafs are detected in \((B, CH, H, W)\)

  • border (int) – additional border

Returns

\((B, N)\)

Return type

mask (torch.Tensor)

laf_to_three_points(laf: torch.Tensor)[source]

Converts local affine frame(LAF) to alternative representation: coordinates of LAF center, LAF-x unit vector, LAF-y unit vector.

Parameters

laf (torch.Tensor) – \((B, N, 2, 3)\)

Returns

\((B, N, 2, 3)\)

Return type

threepts (torch.Tensor)

laf_from_three_points(threepts: torch.Tensor)[source]

Converts three points to local affine frame. Order is (0,0), (0, 1), (1, 0).

Parameters

threepts (torch.Tensor) – \((B, N, 2, 3)\)

Returns

\((B, N, 2, 3)\)

Return type

laf (torch.Tensor)

raise_error_if_laf_is_not_valid(laf: torch.Tensor) → None[source]

Auxilary function, which verifies that input is a torch.tensor of [BxNx2x3] shape

Parameters

laf

Module

class NonMaximaSuppression2d(kernel_size: Tuple[int, int])[source]

Applies non maxima suppression to filter.

class NonMaximaSuppression3d(kernel_size: Tuple[int, int, int])[source]

Applies non maxima suppression to filter.

class BlobHessian(grads_mode='sobel')[source]

nn.Module that calculates Hessian blobs See hessian_response() for details.

class CornerGFTT(grads_mode='sobel')[source]

nn.Module that calculates Shi-Tomasi corners See gfft_response() for details.

class CornerHarris(k: Union[float, torch.Tensor], grads_mode='sobel')[source]

nn.Module that calculates Harris corners See harris_response() for details.

class BlobDoG[source]

nn.Module that calculates Difference-of-Gaussians blobs See dog_response() for details.

class ScaleSpaceDetector(num_features: int = 500, mr_size: float = 6.0, scale_pyr_module: torch.nn.modules.module.Module = ScalePyramid(n_levels=3, init_sigma=1.6, min_size=15, extra_levels=3, border=6, sigma_step=1.2599210498948732, double_image=False), resp_module: torch.nn.modules.module.Module = BlobHessiangrads_mode=sobel), nms_module: torch.nn.modules.module.Module = ConvSoftArgmax3d(kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), temperature=tensor(1.), normalized_coordinates=False, eps=1e-08, strict_maxima_bonus=0.0, output_value=True), ori_module: torch.nn.modules.module.Module = PassLAF(), aff_module: torch.nn.modules.module.Module = PassLAF(), minima_are_also_good: bool = False, scale_space_response=False)[source]
Module for differentiable local feature detection, as close as possible to classical

local feature detectors like Harris, Hessian-Affine or SIFT (DoG). It has 5 modules inside: scale pyramid generator, response (“cornerness”) function, soft nms function, affine shape estimator and patch orientation estimator. Each of those modules could be replaced with learned custom one, as long, as they respect output shape.

Parameters
  • num_features – (int) Number of features to detect. default = 500. In order to keep everything batchable, output would always have num_features outputed, even for completely homogeneous images.

  • mr_size – (float), default 6.0. Multiplier for local feature scale compared to the detection scale. 6.0 is matching OpenCV 12.0 convention for SIFT.

  • scale_pyr_module – (nn.Module), which generates scale pyramid. See ScalePyramid for details. Default is ScalePyramid(3, 1.6, 10)

  • resp_module – (nn.Module), which calculates ‘cornerness’ of the pixel. Default is BlobHessian().

  • nms_module – (nn.Module), which outputs per-patch coordinates of the response maxima. See ConvSoftArgmax3d for details.

  • ori_module – (nn.Module) for local feature orientation estimation. Default is PassLAF, which does nothing. See LAFOrienter for details.

  • aff_module – (nn.Module) for local feature affine shape estimation. Default is PassLAF, which does nothing. See LAFAffineShapeEstimator for details.

  • minima_are_also_good – (bool) if True, then both response function minima and maxima are detected Useful for symmetric response functions like DoG or Hessian. Default is False

forward(img: torch.Tensor, mask: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Three stage local feature detection. First the location and scale of interest points are determined by detect function. Then affine shape and orientation.

Parameters
  • img (torch.Tensor) – image to extract features with shape [BxCxHxW]

  • mask (torch.Tensor, optional) – a mask with weights where to apply the

  • function. The shae must same as the input image. (response) –

Returns

shape [BxNx2x3]. Detected local affine frames. responses (torch.Tensor): shape [BxNx1]. Response function values for corresponding lafs

Return type

lafs (torch.Tensor)

class PassLAF[source]

Dummy module to use instead of local feature orientation or affine shape estimator

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
  • laf – torch.Tensor: 4d tensor

  • img (torch.Tensor) – the input image tensor

Returns

unchanged laf from the input.

Return type

torch.Tensor

class PatchAffineShapeEstimator(patch_size: int = 19, eps: float = 1e-10)[source]

Module, which estimates the second moment matrix of the patch gradients in order to determine the affine shape of the local feature as in [Baumberg00].

Parameters
  • patch_size – int, default = 19

  • eps – float, for safe division, default is 1e-10

forward(patch: torch.Tensor) → torch.Tensor[source]
Parameters

patch – (torch.Tensor) shape [Bx1xHxW]

Returns

3d tensor, shape [Bx1x5]

Return type

ellipse_shape

class LAFAffineShapeEstimator(patch_size: int = 32)[source]

Module, which extracts patches using input images and local affine frames (LAFs), then runs PatchAffineShapeEstimator on patches to estimate LAFs shape. Then original LAF shape is replaced with estimated one. The original LAF orientation is not preserved, so it is recommended to first run LAFAffineShapeEstimator and then LAFOrienter.

Parameters

patch_size – int, default = 32

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
  • laf – (torch.Tensor) shape [BxNx2x3]

  • img – (torch.Tensor) shape [Bx1xHxW]

Returns

(torch.Tensor) shape [BxNx2x3]

Return type

laf_out

class LAFOrienter(patch_size: int = 32, num_angular_bins: int = 36)[source]

Module, which extracts patches using input images and local affine frames (LAFs), then runs PatchDominantGradientOrientation on patches and then rotates the LAFs by the estimated angles

Parameters
  • patch_size – int, default = 32

  • num_angular_bins – int, default is 36

forward(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]
Parameters
  • laf – (torch.Tensor), shape [BxNx2x3]

  • img – (torch.Tensor), shape [Bx1xHxW]

Returns

(torch.Tensor), shape [BxNx2x3]

Return type

laf_out

class PatchDominantGradientOrientation(patch_size: int = 32, num_angular_bins: int = 36, eps: float = 1e-08)[source]

Module, which estimates the dominant gradient orientation of the given patches, in radians. Zero angle points towards right.

Parameters
  • patch_size – int, default = 32

  • num_angular_bins – int, default is 36

  • eps – float, for safe division, and arctan, default is 1e-8

forward(patch: torch.Tensor) → torch.Tensor[source]
Parameters

patch – (torch.Tensor) shape [Bx1xHxW]

Returns

(torch.Tensor) shape [Bx1]

Return type

patch

MMRM17

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. Working hard to know your neighbor’s margins: local descriptor learning loss. In Proceedings of NIPS. 2017.

Zha19

Richard Zhang. Making convolutional networks shift-invariant again. In ICML. 2019.

Baumberg00

A. Baumberg. Reliable feature matching across widely separated views. In CVPR. 2000.