kornia.feature¶
Non Maxima Suppression¶
-
non_maxima_suppression2d
(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor¶ Applies non maxima suppression to filter.
See
NonMaximaSuppression2d
for details.
-
non_maxima_suppression3d
(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor¶ Applies non maxima suppression to filter.
See
NonMaximaSuppression3d
for details.
-
nms2d
(input: torch.Tensor, kernel_size: Tuple[int, int], mask_only: bool = False) → torch.Tensor[source]¶ Applies non maxima suppression to filter.
See
NonMaximaSuppression2d
for details.
-
nms3d
(input: torch.Tensor, kernel_size: Tuple[int, int, int], mask_only: bool = False) → torch.Tensor[source]¶ Applies non maxima suppression to filter.
See
NonMaximaSuppression3d
for details.
Detectors¶
-
gftt_response
(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]¶ Computes the Shi-Tomasi cornerness function. Function does not do any normalization or nms. The response map is computed according the following formulation:
\[R = min(eig(M))\]where:
\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}\]- Parameters
input (torch.Tensor) – 4d tensor
grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid
sigmas (optional, torch.Tensor) – coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat
- Returns
the response map per channel.
- Return type
- Shape:
Input: \((B, C, H, W)\)
Output: \((B, C, H, W)\)
Examples
>>> input = torch.tensor([[[ ... [0., 0., 0., 0., 0., 0., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 0., 0., 0., 0., 0., 0.], ... ]]]) # 1x1x7x7 >>> # compute the response map gftt_response(input) tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155], [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334], [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194], [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334], [0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])
-
harris_response
(input: torch.Tensor, k: Union[torch.Tensor, float] = 0.04, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]¶ Computes the Harris cornerness function. Function does not do any normalization or nms.The response map is computed according the following formulation:
\[R = max(0, det(M) - k \cdot trace(M)^2)\]where:
\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I^{2}_x & I_x I_y \\ I_x I_y & I^{2}_y \\ \end{bmatrix}\end{split}\]and \(k\) is an empirically determined constant \(k ∈ [ 0.04 , 0.06 ]\)
- Parameters
input – torch.Tensor: 4d tensor
k (torch.Tensor) – the Harris detector free parameter.
grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid
sigmas (optional, torch.Tensor) –
coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat
- Returns
the response map per channel.
- Return type
- Shape:
Input: \((B, C, H, W)\)
Output: \((B, C, H, W)\)
Examples
>>> input = torch.tensor([[[ ... [0., 0., 0., 0., 0., 0., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 0., 0., 0., 0., 0., 0.], ... ]]]) # 1x1x7x7 >>> # compute the response map harris_response(input, 0.04) tensor([[[[0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012], [0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039], [0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0020, 0.0040, 0.0029, 0.0000, 0.0029, 0.0040, 0.0020], [0.0039, 0.0065, 0.0040, 0.0000, 0.0040, 0.0065, 0.0039], [0.0012, 0.0039, 0.0020, 0.0000, 0.0020, 0.0039, 0.0012]]]])
-
hessian_response
(input: torch.Tensor, grads_mode: str = 'sobel', sigmas: Optional[torch.Tensor] = None) → torch.Tensor[source]¶ Computes the absolute of determinant of the Hessian matrix. Function does not do any normalization or nms. The response map is computed according the following formulation:
\[R = det(H)\]where:
\[\begin{split}M = \sum_{(x,y) \in W} \begin{bmatrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \\ \end{bmatrix}\end{split}\]- Parameters
input – torch.Tensor: 4d tensor
grads_mode (string) – can be ‘sobel’ for standalone use or ‘diff’ for use on Gaussian pyramid
sigmas (optional, torch.Tensor) –
coefficients to be multiplied by multichannel response. n Should be shape of (B) It is necessary for performing non-maxima-suppression across different scale pyramid levels.See vlfeat
- Returns
the response map per channel.
- Return type
- Shape:
Input: \((B, C, H, W)\)
Output: \((B, C, H, W)\)
Examples
>>> input = torch.tensor([[[ ... [0., 0., 0., 0., 0., 0., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 1., 1., 1., 1., 1., 0.], ... [0., 0., 0., 0., 0., 0., 0.], ... ]]]) # 1x1x7x7 >>> # compute the response map hessian_response(input) tensor([[[[0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155], [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334], [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0194, 0.0339, 0.0497, 0.0000, 0.0497, 0.0339, 0.0194], [0.0334, 0.0575, 0.0339, 0.0000, 0.0339, 0.0575, 0.0334], [0.0155, 0.0334, 0.0194, 0.0000, 0.0194, 0.0334, 0.0155]]]])
Descriptors¶
-
class
SIFTDescriptor
(patch_size: int = 41, num_ang_bins: int = 8, num_spatial_bins: int = 4, rootsift: bool = True, clipval: float = 0.2)[source]¶ Module, which computes SIFT descriptors of given patches
- Parameters
patch_size – (int) Input patch size in pixels (41 is default)
num_ang_bins – (int) Number of angular bins. (8 is default)
num_spatial_bins – (int) Number of spatial bins (4 is default)
clipval – (float) default 0.2
rootsift – (bool) if True, RootSIFT (Arandjelović et. al, 2012)
computed (is) –
- Returns
SIFT descriptor of the patches
- Return type
- Shape:
Input: (B, 1, num_spatial_bins, num_spatial_bins)
Output: (B, num_ang_bins * num_spatial_bins ** 2)
- Examples::
>>> input = torch.rand(23, 1, 32, 32) >>> SIFT = SIFTDescriptor(32, 8, 4) >>> descs = SIFT(input) # 23x128
-
class
MKDDescriptor
(patch_size: int = 32, kernel_type: str = 'concat', whitening: str = 'pcawt', training_set: str = 'liberty', output_dims: int = 128)[source]¶ Module that computes Multiple Kernel local descriptors.
This is based on the paper “Understanding and Improving Kernel Local Descriptors”. See [MTB+19] for more details.
- Parameters
patch_size – (int) Input patch size in pixels (32 is default).
kernel_type – (str) Parametrization of kernel ‘concat’, ‘cart’, ‘polar’ (‘concat’ is default).
whitening – (str) Whitening transform to apply None, ‘lw’, ‘pca’, ‘pcawt’, ‘pcaws’ (‘pcawt’ is default).
training_set – (str) Set that model was trained on ‘liberty’, ‘notredame’, ‘yosemite’ (‘liberty’ is default).
output_dims – (int) Dimensionality reduction (128 is default).
- Returns
Explicit cartesian or polar embedding.
- Return type
- Shape:
Input: \((B, in_dims, fmap_size, fmap_size)\).
Output: \((B, out_dims, fmap_size, fmap_size)\),
Examples
>>> patches = torch.rand(23, 1, 32, 32) >>> mkd = MKDDescriptor(patch_size=32, ... kernel_type='concat', ... whitening='pcawt', ... training_set='liberty', ... output_dims=128) >>> desc = mkd(patches) # 23x128
-
class
HardNet
(pretrained: bool = False)[source]¶ Module, which computes HardNet descriptors of given grayscale patches of 32x32.
This is based on the original code from paper “Working hard to know your neighbor’s margins: Local descriptor learning loss”. See [MMRM17] for more details.
- Parameters
pretrained – (bool) Download and set pretrained weights to the model. Default: false.
- Returns
HardNet descriptor of the patches.
- Return type
- Shape:
Input: (B, 1, 32, 32)
Output: (B, 128)
Examples
>>> input = torch.rand(16, 1, 32, 32) >>> hardnet = HardNet() >>> descs = hardnet(input) # 16x128
-
class
TFeat
(pretrained: bool = False)[source]¶ Module, which computes TFeat descriptors of given grayscale patches of 32x32.
This is based on the original code from paper “Learning local feature descriptors with triplets and shallow convolutional neural networks”. See [BRPM16] for more details
- Parameters
pretrained – (bool) Download and set pretrained weights to the model. Default: false.
- Returns
TFeat descriptor of the patches.
- Return type
- Shape:
Input: (B, 1, 32, 32)
Output: (B, 128)
Examples
>>> input = torch.rand(16, 1, 32, 32) >>> tfeat = TFeat() >>> descs = tfeat(input) # 16x128
-
class
SOSNet
(pretrained: bool = False)[source]¶ 128-dimensional SOSNet model definition for 32x32 patches.
This is based on the original code from paper “SOSNet:Second Order Similarity Regularization for Local Descriptor Learning”.
- Parameters
pretrained (bool) – Download and set pretrained weights to the model. Default: false.
- Shape:
Input: (B, 1, 32, 32)
Output: (B, 128)
Examples
>>> input = torch.rand(8, 1, 32, 32) >>> sosnet = SOSNet() >>> descs = sosnet(input) # 8x128
Matching¶
-
match_nn
(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Function, which finds nearest neighbors in desc2 for each vector in desc1.
If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.
- Parameters
desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).
desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).
dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).
- Returns
Descriptor distance of matching descriptors, shape of \((B1, 1)\).
Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B1, 2)\).
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
match_mnn
(desc1: torch.Tensor, desc2: torch.Tensor, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Function, which finds mutual nearest neighbors in desc2 for each vector in desc1.
If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.
- Parameters
desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).
desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).
dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).
- Returns
Descriptor distance of matching descriptors, shape of. \((B3, 1)\).
Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B3, 2)\), where 0 <= B3 <= min(B1, B2)
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
match_snn
(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Function, which finds nearest neighbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.
If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.
- Parameters
desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).
desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).
th (float) – distance ratio threshold.
dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).
- Returns
Descriptor distance of matching descriptors, shape of \((B3, 1)\).
Long tensor indexes of matching descriptors in desc1 and desc2. Shape: \((B3, 2)\), where 0 <= B3 <= B1.
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
match_smnn
(desc1: torch.Tensor, desc2: torch.Tensor, th: float = 0.8, dm: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Function, which finds mutual nearest neighbors in desc2 for each vector in desc1. which satisfy first to second nearest neighbor distance <= th.
If the distance matrix dm is not provided, torch.cdist(desc1, desc2) is used.
- Parameters
desc1 (torch.Tensor) – Batch of descriptors of a shape \((B1, D)\).
desc2 (torch.Tensor) – Batch of descriptors of a shape \((B2, D)\).
th (float) – distance ratio threshold.
dm (torch.Tensor, optional) – Tensor containing the distances from each descriptor in desc1 to each descriptor in desc2, shape of \((B1, B2)\).
- Returns
Descriptor distance of matching descriptors, shape of. \((B3, 1)\).
Long tensor indexes of matching descriptors in desc1 and desc2, shape of \((B3, 2)\) where 0 <= B3 <= B1.
- Return type
Tuple[torch.Tensor, torch.Tensor]
Local Affine Frames (LAF)¶
-
extract_patches_from_pyramid
(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]¶ Extract patches defined by LAFs from image tensor. Patches are extracted from appropriate pyramid level
- Parameters
laf – (torch.Tensor).
images – (torch.Tensor) images, LAFs are detected in
PS – (int) patch size, default = 32
normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True
- Returns
(torch.Tensor) \((B, N, CH, PS,PS)\)
- Return type
patches
-
extract_patches_simple
(img: torch.Tensor, laf: torch.Tensor, PS: int = 32, normalize_lafs_before_extraction: bool = True) → torch.Tensor[source]¶ Extract patches defined by LAFs from image tensor. No smoothing applied, huge aliasing (better use extract_patches_from_pyramid)
- Parameters
img – (torch.Tensor) images, LAFs are detected in
laf – (torch.Tensor).
PS – (int) patch size, default = 32
normalize_lafs_before_extraction (bool) – if True, lafs are normalized to image size, default = True
- Returns
(torch.Tensor) \((B, N, CH, PS,PS)\)
- Return type
patches
-
normalize_laf
(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]¶ - Normalizes LAFs to [0,1] scale from pixel scale. See below:
B,N,H,W = images.size() MIN_SIZE = min(H,W) [a11 a21 x] [a21 a22 y] becomes: [a11/MIN_SIZE a21/MIN_SIZE x/W] [a21/MIN_SIZE a22/MIN_SIZE y/H]
- Parameters
LAF – (torch.Tensor).
images – (torch.Tensor) images, LAFs are detected in
- Returns
(torch.Tensor).
- Return type
LAF
- Shape:
Input: \((B, N, 2, 3)\)
Output: \((B, N, 2, 3)\)
-
denormalize_laf
(LAF: torch.Tensor, images: torch.Tensor) → torch.Tensor[source]¶ - De-normalizes LAFs from scale to image scale.
B,N,H,W = images.size() MIN_SIZE = min(H,W) [a11 a21 x] [a21 a22 y] becomes [a11*MIN_SIZE a21*MIN_SIZE x*W] [a21*MIN_SIZE a22*MIN_SIZE y*H]
- Parameters
LAF – (torch.Tensor).
images – (torch.Tensor) images, LAFs are detected in
- Returns
(torch.Tensor).
- Return type
LAF
- Shape:
Input: \((B, N, 2, 3)\)
Output: \((B, N, 2, 3)\)
-
laf_to_boundary_points
(LAF: torch.Tensor, n_pts: int = 50) → torch.Tensor[source]¶ Converts LAFs to boundary points of the regions + center. Used for local features visualization, see visualize_laf function
- Parameters
LAF – (torch.Tensor).
n_pts – number of points to output
- Returns
(torch.Tensor) tensor of boundary points
- Return type
pts
- Shape:
Input: \((B, N, 2, 3)\)
Output: \((B, N, n_pts, 2)\)
-
ellipse_to_laf
(ells: torch.Tensor) → torch.Tensor[source]¶ Converts ellipse regions to LAF format. Ellipse (a, b, c) and upright covariance matrix [a11 a12; 0 a22] are connected by inverse matrix square root: A = invsqrt([a b; b c]) See also https://github.com/vlfeat/vlfeat/blob/master/toolbox/sift/vl_frame2oell.m
- Parameters
ells – (torch.Tensor): tensor of ellipses in Oxford format [x y a b c].
- Returns
(torch.Tensor) tensor of ellipses in LAF format.
- Return type
LAF
- Shape:
Input: \((B, N, 5)\)
Output: \((B, N, 2, 3)\)
Example
>>> input = torch.ones(1, 10, 5) # BxNx5 >>> output = ellipse_to_laf(input) # BxNx2x3
-
make_upright
(laf: torch.Tensor, eps: float = 1e-09) → torch.Tensor[source]¶ Rectifies the affine matrix, so that it becomes upright
- Parameters
laf – (torch.Tensor): tensor of LAFs.
eps (float) – for safe division, (default 1e-9)
- Returns
tensor of same shape.
- Return type
- Shape:
Input: \((B, N, 2, 3)\)
Output: \((B, N, 2, 3)\)
Example
>>> input = torch.ones(1, 5, 2, 3) # BxNx2x3 >>> output = make_upright(input) # BxNx2x3
-
scale_laf
(laf: torch.Tensor, scale_coef: Union[float, torch.Tensor]) → torch.Tensor[source]¶ Multiplies region part of LAF ([:, :, :2, :2]) by a scale_coefficient. So the center, shape and orientation of the local feature stays the same, but the region area changes.
- Parameters
laf – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].
scale_coef – (torch.Tensor): broadcastable tensor or float.
- Returns
tensor BxNx2x3 .
- Return type
- Shape:
Input: :math: (B, N, 2, 3)
Input: :math: (B, N,) or ()
Output: :math: (B, N, 1, 1)
Example
>>> input = torch.ones(1, 5, 2, 3) # BxNx2x3 >>> scale = 0.5 >>> output = scale_laf(input, scale) # BxNx2x3
-
get_laf_scale
(LAF: torch.Tensor) → torch.Tensor[source]¶ Returns a scale of the LAFs
- Parameters
LAF – (torch.Tensor): tensor [BxNx2x3] or [BxNx2x2].
- Returns
tensor BxNx1x1 .
- Return type
- Shape:
Input: :math: (B, N, 2, 3)
Output: :math: (B, N, 1, 1)
Example
>>> input = torch.ones(1, 5, 2, 3) # BxNx2x3 >>> output = get_laf_scale(input) # BxNx1x1
-
get_laf_center
(LAF: torch.Tensor) → torch.Tensor[source]¶ Returns a center (keypoint) of the LAFs
- Parameters
LAF – (torch.Tensor): tensor [BxNx2x3].
- Returns
tensor BxNx2 .
- Return type
- Shape:
Input: :math: (B, N, 2, 3)
Output: :math: (B, N, 2)
Example
>>> input = torch.ones(1, 5, 2, 3) # BxNx2x3 >>> output = get_laf_center(input) # BxNx2
-
get_laf_orientation
(LAF: torch.Tensor) → torch.Tensor[source]¶ Returns orientation of the LAFs, in degrees.
- Parameters
LAF – (torch.Tensor): tensor [BxNx2x3].
- Returns
tensor BxNx1 .
- Return type
- Shape:
Input: :math: (B, N, 2, 3)
Output: :math: (B, N, 1)
Example
>>> input = torch.ones(1, 5, 2, 3) # BxNx2x3 >>> output = get_laf_orientation(input) # BxNx1
-
laf_from_center_scale_ori
(xy: torch.Tensor, scale: torch.Tensor, ori: torch.Tensor) → torch.Tensor[source]¶ Returns orientation of the LAFs, in radians. Useful to create kornia LAFs from OpenCV keypoints
- Parameters
xy – (torch.Tensor): tensor [BxNx2].
scale – (torch.Tensor): tensor [BxNx1x1].
ori – (torch.Tensor): tensor [BxNx1].
- Returns
tensor BxNx2x3 .
- Return type
-
laf_is_inside_image
(laf: torch.Tensor, images: torch.Tensor, border: int = 0) → torch.Tensor[source]¶ Checks if the LAF is touching or partly outside the image boundary. Returns the mask of LAFs, which are fully inside the image, i.e. valid.
- Parameters
laf (torch.Tensor) – \((B, N, 2, 3)\)
images (torch.Tensor) – images, lafs are detected in \((B, CH, H, W)\)
border (int) – additional border
- Returns
\((B, N)\)
- Return type
mask (torch.Tensor)
-
laf_to_three_points
(laf: torch.Tensor)[source]¶ Converts local affine frame(LAF) to alternative representation: coordinates of LAF center, LAF-x unit vector, LAF-y unit vector.
- Parameters
laf (torch.Tensor) – \((B, N, 2, 3)\)
- Returns
\((B, N, 2, 3)\)
- Return type
threepts (torch.Tensor)
-
laf_from_three_points
(threepts: torch.Tensor)[source]¶ Converts three points to local affine frame. Order is (0,0), (0, 1), (1, 0).
- Parameters
threepts (torch.Tensor) – \((B, N, 2, 3)\)
- Returns
\((B, N, 2, 3)\)
- Return type
laf (torch.Tensor)
Module¶
-
class
NonMaximaSuppression2d
(kernel_size: Tuple[int, int])[source]¶ Applies non maxima suppression to filter.
-
class
NonMaximaSuppression3d
(kernel_size: Tuple[int, int, int])[source]¶ Applies non maxima suppression to filter.
-
class
BlobHessian
(grads_mode='sobel')[source]¶ nn.Module that calculates Hessian blobs See
hessian_response()
for details.
-
class
CornerGFTT
(grads_mode='sobel')[source]¶ nn.Module that calculates Shi-Tomasi corners See
gfft_response()
for details.
-
class
CornerHarris
(k: Union[float, torch.Tensor], grads_mode='sobel')[source]¶ nn.Module that calculates Harris corners See
harris_response()
for details.
-
class
BlobDoG
[source]¶ nn.Module that calculates Difference-of-Gaussians blobs See
dog_response()
for details.
-
class
ScaleSpaceDetector
(num_features: int = 500, mr_size: float = 6.0, scale_pyr_module: torch.nn.modules.module.Module = ScalePyramid(n_levels=3, init_sigma=1.6, min_size=15, extra_levels=3, border=6, sigma_step=1.2599210498948732, double_image=False), resp_module: torch.nn.modules.module.Module = BlobHessiangrads_mode=sobel), nms_module: torch.nn.modules.module.Module = ConvSoftArgmax3d(kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), temperature=tensor(1.), normalized_coordinates=False, eps=1e-08, strict_maxima_bonus=0.0, output_value=True), ori_module: torch.nn.modules.module.Module = PassLAF(), aff_module: torch.nn.modules.module.Module = PassLAF(), minima_are_also_good: bool = False, scale_space_response=False)[source]¶ - Module for differentiable local feature detection, as close as possible to classical
local feature detectors like Harris, Hessian-Affine or SIFT (DoG). It has 5 modules inside: scale pyramid generator, response (“cornerness”) function, soft nms function, affine shape estimator and patch orientation estimator. Each of those modules could be replaced with learned custom one, as long, as they respect output shape.
- Parameters
num_features – (int) Number of features to detect. default = 500. In order to keep everything batchable, output would always have num_features output, even for completely homogeneous images.
mr_size – (float), default 6.0. Multiplier for local feature scale compared to the detection scale. 6.0 is matching OpenCV 12.0 convention for SIFT.
scale_pyr_module – (nn.Module), which generates scale pyramid. See
ScalePyramid
for details. Default is ScalePyramid(3, 1.6, 10)resp_module – (nn.Module), which calculates ‘cornerness’ of the pixel. Default is BlobHessian().
nms_module – (nn.Module), which outputs per-patch coordinates of the response maxima. See
ConvSoftArgmax3d
for details.ori_module – (nn.Module) for local feature orientation estimation. Default is
PassLAF
, which does nothing. SeeLAFOrienter
for details.aff_module – (nn.Module) for local feature affine shape estimation. Default is
PassLAF
, which does nothing. SeeLAFAffineShapeEstimator
for details.minima_are_also_good – (bool) if True, then both response function minima and maxima are detected Useful for symmetric response functions like DoG or Hessian. Default is False
-
forward
(img: torch.Tensor, mask: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Three stage local feature detection. First the location and scale of interest points are determined by detect function. Then affine shape and orientation.
- Parameters
img (torch.Tensor) – image to extract features with shape [BxCxHxW]
mask (torch.Tensor, optional) – a mask with weights where to apply the
function. The shape must be the same as the input image. (response) –
- Returns
shape [BxNx2x3]. Detected local affine frames. responses (torch.Tensor): shape [BxNx1]. Response function values for corresponding lafs
- Return type
lafs (torch.Tensor)
-
class
PassLAF
[source]¶ Dummy module to use instead of local feature orientation or affine shape estimator
-
forward
(laf: torch.Tensor, img: torch.Tensor) → torch.Tensor[source]¶ - Parameters
laf – torch.Tensor: 4d tensor
img (torch.Tensor) – the input image tensor
- Returns
unchanged laf from the input.
- Return type
-
-
class
PatchAffineShapeEstimator
(patch_size: int = 19, eps: float = 1e-10)[source]¶ Module, which estimates the second moment matrix of the patch gradients in order to determine the affine shape of the local feature as in [Baumberg00].
- Parameters
patch_size – int, default = 19
eps – float, for safe division, default is 1e-10
-
class
LAFAffineShapeEstimator
(patch_size: int = 32, affine_shape_detector: Optional[torch.nn.modules.module.Module] = None)[source]¶ Module, which extracts patches using input images and local affine frames (LAFs), then runs
PatchAffineShapeEstimator
on patches to estimate LAFs shape. Then original LAF shape is replaced with estimated one. The original LAF orientation is not preserved, so it is recommended to first run LAFAffineShapeEstimator and then LAFOrienter.- Parameters
patch_size – int, default = 32
affine_shape_detector – nn.Module. Patch affine shape estimator, e.g. PatchAffineShapeEstimator. Default: None
-
class
LAFOrienter
(patch_size: int = 32, num_angular_bins: int = 36, angle_detector: Optional[torch.nn.modules.module.Module] = None)[source]¶ Module, which extracts patches using input images and local affine frames (LAFs), then runs
PatchDominantGradientOrientation
orOriNet
on patches and then rotates the LAFs by the estimated angles- Parameters
patch_size – int, default = 32
num_angular_bins – int, default is 36
angle_detector – nn.Module. Patch orientation estimator, e.g. PatchDominantGradientOrientation or OriNet. Default: None
-
class
PatchDominantGradientOrientation
(patch_size: int = 32, num_angular_bins: int = 36, eps: float = 1e-08)[source]¶ Module, which estimates the dominant gradient orientation of the given patches, in radians. Zero angle points towards right.
- Parameters
patch_size – int, default = 32
num_angular_bins – int, default is 36
eps – float, for safe division, and arctan, default is 1e-8
-
class
OriNet
(pretrained: bool = False, eps: float = 1e-08)[source]¶ Network, which estimates the canonical orientation of the given 32x32 patches, in radians. Zero angle points towards right. This is based on the original code from paper “Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability””. See [MRM18] for more details.
- Parameters
pretrained – (bool) Download and set pretrained weights to the model. Default: false.
eps – (float) to avoid division by zero in atan2. Default: 1e-6.
- Returns
Angle in radians.
- Return type
- Shape:
Input: (B, 1, 32, 32)
Output: (B)
Examples
>>> input = torch.rand(16, 1, 32, 32) >>> orinet = OriNet() >>> angle = orinet(input) # 16
-
class
LAFAffNetShapeEstimator
(pretrained: bool = False)[source]¶ Module, which extracts patches using input images and local affine frames (LAFs), then runs AffNet on patches to estimate LAFs shape. This is based on the original code from paper “Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability””. See [MRM18] for more details. Then original LAF shape is replaced with estimated one. The original LAF orientation is not preserved, so it is recommended to first run LAFAffineShapeEstimator and then LAFOrienter.
- Parameters
pretrained – (bool) Download and set pretrained weights to the model. Default: false.