kornia.geometry.subpix¶
Module with useful functionalities to extract coordinates sub-pixel accuracy.
Convolutional¶
- kornia.geometry.subpix.conv_soft_argmax2d(input, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), temperature=1.0, normalized_coordinates=True, eps=1e-8, output_value=False)[source]¶
Compute the convolutional spatial Soft-Argmax 2D over the windows of a given heatmap.
\[ij(X) = \frac{\sum{(i,j)} * exp(x / T) \in X} {\sum{exp(x / T) \in X}}\]\[val(X) = \frac{\sum{x * exp(x / T) \in X}} {\sum{exp(x / T) \in X}}\]where \(T\) is temperature.
- Parameters:
input (
Tensor) – the given heatmap with shape \((N, C, H_{in}, W_{in})\).kernel_size (
tuple[int,int], optional) – the size of the window. Default:(3, 3)stride (
tuple[int,int], optional) – the stride of the window. Default:(1, 1)padding (
tuple[int,int], optional) – input zero padding. Default:(1, 1)temperature (
Tensor|float, optional) – factor to apply to input. Default:1.0normalized_coordinates (
bool, optional) – whether to return the coordinates normalized in the range of \([-1, 1]\). Otherwise, it will return the coordinates in the range of the input shape. Default:Trueeps (
float, optional) – small value to avoid zero division. Default:1e-8output_value (
bool, optional) – if True, val is output, if False, only ij. Default:False
- Return type:
- Returns:
Function has two outputs - argmax coordinates and the softmaxpooled heatmap values themselves. On each window, the function computed returns with shapes \((N, C, 2, H_{out}, W_{out})\), \((N, C, H_{out}, W_{out})\),
where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]
Examples
>>> input = torch.randn(20, 16, 50, 32) >>> nms_coords, nms_val = conv_soft_argmax2d(input, (3,3), (2,2), (1,1), output_value=True)
- kornia.geometry.subpix.conv_soft_argmax3d(input, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), temperature=1.0, normalized_coordinates=False, eps=1e-8, output_value=True, strict_maxima_bonus=0.0)[source]¶
Compute the convolutional spatial Soft-Argmax 3D over the windows of a given heatmap.
\[ijk(X) = \frac{\sum{(i,j,k)} * exp(x / T) \in X} {\sum{exp(x / T) \in X}}\]\[val(X) = \frac{\sum{x * exp(x / T) \in X}} {\sum{exp(x / T) \in X}}\]where
Tis temperature.- Parameters:
input (
Tensor) – the given heatmap with shape \((N, C, D_{in}, H_{in}, W_{in})\).kernel_size (
tuple[int,int,int], optional) – size of the window. Default:(3, 3, 3)stride (
tuple[int,int,int], optional) – stride of the window. Default:(1, 1, 1)padding (
tuple[int,int,int], optional) – input zero padding. Default:(1, 1, 1)temperature (
Tensor|float, optional) – factor to apply to input. Default:1.0normalized_coordinates (
bool, optional) – whether to return the coordinates normalized in the range of :math:[-1, 1]`. Otherwise, it will return the coordinates in the range of the input shape. Default:Falseeps (
float, optional) – small value to avoid zero division. Default:1e-8output_value (
bool, optional) – if True, val is output, if False, only ij. Default:Truestrict_maxima_bonus (
float, optional) – pixels, which are strict maxima will score (1 + strict_maxima_bonus) * value. This is needed for mimic behavior of strict NMS in classic local features Default:0.0
- Return type:
- Returns:
Function has two outputs - argmax coordinates and the softmaxpooled heatmap values themselves. On each window, the function computed returns with shapes \((N, C, 3, D_{out}, H_{out}, W_{out})\), \((N, C, D_{out}, H_{out}, W_{out})\),
where
\[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor\]
Examples
>>> input = torch.randn(20, 16, 3, 50, 32) >>> nms_coords, nms_val = conv_soft_argmax3d(input, (3, 3, 3), (1, 2, 2), (0, 1, 1))
- kornia.geometry.subpix.conv_quad_interp3d(input, n_iters=5, strict_maxima_bonus=10.0, max_subpixel_shift=0.6, precomputed_nms_mask=None, dilation_radius=1, allow_scale_steps=True)[source]¶
Subpixel localization of 3D scale-space extrema via quadratic interpolation.
For each NMS maximum the function fits a 3-D quadratic to the local \(3 \times 3 \times 3\) neighbourhood and solves for the sub-voxel shift that maximises the fit. When the shift along any axis exceeds
max_subpixel_shiftthe integer centre is moved one step in that direction and the solve is repeated — up ton_iterstimes.Unlike a naive iterative approach, all Hessian solves are precomputed once at the start for every voxel that any keypoint could possibly visit (the dilated NMS neighbourhood, an L\(\infty\) ball of radius
dilation_radiusaround each maximum). The subsequent iteration loop contains no data-dependent Python control flow and no GPU→CPU synchronisation, making the function fully compatible withtorch.compile/ CUDA graphs.The
dilation_radiuscontrols the precompute footprint and should be set to the maximum number of integer-centre moves expected per keypoint. With the defaultmax_subpixel_shift=0.6almost all keypoints converge within 1 move, so the defaultdilation_radius=1(i.e. \(3^3 = 27\) positions per maximum) is sufficient. Usedilation_radius=2(\(5^3 = 125\)) for extra safety. Setting it equal ton_itersrecovers the original behaviour but is much slower on large images.- Parameters:
input (
Tensor) – response pyramid with shape \((B, C, D, H, W)\).n_iters (
int, optional) – maximum number of localization iterations per keypoint. Default:5strict_maxima_bonus (
float, optional) – value added toy_maxat NMS-maximum positions so that strict maxima are preferred during top-K selection. Default:10.0max_subpixel_shift (
float, optional) – threshold above which the integer centre is moved one step and another iteration is run. Default:0.6precomputed_nms_mask (
Optional[Tensor], optional) – optional bool tensor of shape \((B, C, D, H, W)\) — pass the result ofnms3d()to skip the internal NMS call. Default:Nonedilation_radius (
int, optional) – L\(\infty\) radius (in voxels) of the neighbourhood around each NMS maximum where the Hessian solve is precomputed. Keypoints that attempt to move farther than this are marked invalid. Default:1allow_scale_steps (
bool, optional) – ifTrue(default), the iterative shift is also applied along the scale (depth) axis; set toFalseto keep the keypoint on its original scale level. Default:True
- Returns:
coords_max— shape \((B, C, 3, D, H, W)\), refined[scale, x(width), y(height)]coordinates for each NMS maximum; non-maximum positions keep their grid coordinates.y_max— shape \((B, C, D, H, W)\), quadratically corrected response with optional strict-maxima bonus.
- Return type:
Tuple
(coords_max, y_max)
Example
>>> input = torch.randn(2, 3, 5, 64, 64) >>> coords, vals = conv_quad_interp3d(input, n_iters=5) >>> coords.shape torch.Size([2, 3, 3, 5, 64, 64]) >>> vals.shape torch.Size([2, 3, 5, 64, 64])
- kornia.geometry.subpix.iterative_quad_interp3d(input, n_iters=5, strict_maxima_bonus=10.0, max_subpixel_shift=0.6, allow_scale_steps=True, precomputed_nms_mask=None, max_candidates=None)[source]¶
Iterative subpixel localization of 3D extrema via quadratic interpolation.
Unlike
conv_quad_interp3d(), which pre-computes the Hessian solve for all voxels reachable from NMS maxima and then follows shifts by table lookup, this function explicitly re-extracts the \(3 \times 3 \times 3\) patch at each NMS maximum and iterates up ton_iterstimes. When the estimated subpixel shift along any spatial or scale axis exceedsmax_subpixel_shiftthe integer center is moved one step in that direction and the solve is repeated — matching the localization loop from the HessAff / SIFT family of detectors.- Parameters:
input (
Tensor) – response pyramid with shape \((B, C, D, H, W)\).n_iters (
int, optional) – maximum number of localization iterations per keypoint. Default:5strict_maxima_bonus (
float, optional) – value added toy_maxat NMS-maximum positions so that strict maxima are preferred when selecting the top-K keypoints. Default:10.0max_subpixel_shift (
float, optional) – if the estimated shift along any axis is larger than this threshold the integer center is displaced and another iteration is run. Default:0.6allow_scale_steps (
bool, optional) – ifTrue(default), the iterative shift is also applied along the scale (depth) axis; set toFalseto keep the keypoint on its original scale level. Default:Trueprecomputed_nms_mask (
Optional[Tensor], optional) – optional bool tensor of shape \((B, C, D, H, W)\) — pass the result ofnms3d()to skip the internal NMS call. Default:Nonemax_candidates (
Optional[int], optional) – if given, only the top-max_candidatesNMS maxima (ranked by pre-refinement response) are processed. The rest keep their grid-coordinate values. This is a CPU speed-up knob: for large images the number of 3-D NMS maxima can be 10x-100x larger than the desired number of keypoints, making the per-candidate gather+solve loop the dominant CPU cost. Settingmax_candidates = num_features * 5(say) dramatically reduces that work at the cost of occasionally missing a feature whose response rank would have improved after refinement. Default:None
- Return type:
- Returns:
A tuple
(coords_max, y_max)wherecoords_maxhas shape \((B, C, 3, D, H, W)\) and stores the refined coordinates[scale, x, y]for every position in the input. Non-NMS positions keep their original grid coordinates.y_maxhas shape \((B, C, D, H, W)\) and stores the quadratically corrected response values (with the optional strict-maxima bonus added).
Example
>>> input = torch.randn(2, 3, 3, 8, 8) >>> coords, vals = iterative_quad_interp3d(input, n_iters=5) >>> coords.shape torch.Size([2, 3, 3, 3, 8, 8]) >>> vals.shape torch.Size([2, 3, 3, 8, 8])
Tip
AdaptiveQuadInterp3d (the default subpix module in
ScaleSpaceDetector) automatically picks the faster
backend based on the input device:
CUDA →
conv_quad_interp3d()— batched gather+solve, 1.5–2× faster on GPU.CPU →
iterative_quad_interp3d()— processes only NMS maxima directly, no dilation overhead.
Both backends produce numerically identical results (max difference < 2 × 10-6).
import torch
from kornia.geometry.subpix import AdaptiveQuadInterp3d
from kornia.feature import ScaleSpaceDetector
from kornia.feature.responses import BlobDoG
from kornia.geometry.transform import ScalePyramid
detector = ScaleSpaceDetector(
num_features=2000,
resp_module=BlobDoG(),
# default — auto-selects conv on CUDA, patch on CPU:
subpix_module=AdaptiveQuadInterp3d(strict_maxima_bonus=0.0),
scale_pyr_module=ScalePyramid(3, 1.6, 32, double_image=True),
scale_space_response=True,
minima_are_also_good=True,
)
Spatial¶
- kornia.geometry.subpix.spatial_softmax2d(input, temperature=None)[source]¶
Apply the Softmax function over features in each image channel.
Note that this function behaves differently to
torch.nn.Softmax2d, which instead applies Softmax over features at each spatial location.- Parameters:
- Return type:
- Returns:
a 2D probability distribution per image channel with shape \((B, N, H, W)\).
Examples
>>> heatmaps = torch.tensor([[[ ... [0., 0., 0.], ... [0., 0., 0.], ... [0., 1., 2.]]]]) >>> spatial_softmax2d(heatmaps) tensor([[[[0.0585, 0.0585, 0.0585], [0.0585, 0.0585, 0.0585], [0.0585, 0.1589, 0.4319]]]])
- kornia.geometry.subpix.spatial_expectation2d(input, normalized_coordinates=True)[source]¶
Compute the expectation of coordinate values using spatial probabilities.
The input heatmap is assumed to represent a valid spatial probability distribution, which can be achieved using
spatial_softmax2d().- Parameters:
input (
Tensor) – the input torch.Tensor representing dense spatial probabilities with shape \((B, N, H, W)\).normalized_coordinates (
bool, optional) – whether to return the coordinates normalized in the range of \([-1, 1]\). Otherwise, it will return the coordinates in the range of the input shape. Default:True
- Return type:
- Returns:
expected value of the 2D coordinates with shape \((B, N, 2)\). Output order of the coordinates is (x, y).
Examples
>>> heatmaps = torch.tensor([[[ ... [0., 0., 0.], ... [0., 0., 0.], ... [0., 1., 0.]]]]) >>> spatial_expectation2d(heatmaps, False) tensor([[[1., 2.]]])
- kornia.geometry.subpix.spatial_soft_argmax2d(input, temperature=None, normalized_coordinates=True)[source]¶
Compute the Spatial Soft-Argmax 2D of a given input heatmap.
- Parameters:
input (
Tensor) – the given heatmap with shape \((B, N, H, W)\).temperature (
Optional[Tensor], optional) – factor to apply to input. Default:Nonenormalized_coordinates (
bool, optional) – whether to return the coordinates normalized in the range of \([-1, 1]\). Otherwise, it will return the coordinates in the range of the input shape. Default:True
- Return type:
- Returns:
the index of the maximum 2d coordinates of the give map \((B, N, 2)\). The output order is x-coord and y-coord.
Examples
>>> input = torch.tensor([[[ ... [0., 0., 0.], ... [0., 10., 0.], ... [0., 0., 0.]]]]) >>> spatial_soft_argmax2d(input, normalized_coordinates=False) tensor([[[1.0000, 1.0000]]])
- kornia.geometry.subpix.render_gaussian2d(mean, std, size, normalized_coordinates=True)[source]¶
Render the PDF of a 2D Gaussian distribution.
- Parameters:
mean (
Tensor) – the mean location of the Gaussian to render, \((\mu_x, \mu_y)\). Shape: \((*, 2)\).std (
Tensor) – the standard deviation of the Gaussian to render, \((\sigma_x, \sigma_y)\). Shape \((*, 2)\). Should be able to be broadcast with mean.size (
tuple[int,int]) – the (height, width) of the output image.normalized_coordinates (
bool, optional) – whethermeanandstdare assumed to use coordinates normalized in the range of \([-1, 1]\). Otherwise, coordinates are assumed to be in the range of the output shape. Default:True
- Return type:
- Returns:
torch.Tensor including rendered points with shape \((*, H, W)\).
Non Maxima Suppression¶
- kornia.geometry.subpix.nms2d(input, kernel_size, mask_only=False)[source]¶
Apply non maxima suppression to filter.
See
NonMaximaSuppression2dfor details.- Return type:
- kornia.geometry.subpix.nms3d(input, kernel_size, mask_only=False)[source]¶
Apply non maxima suppression to filter.
See :class: ~kornia.feature.NonMaximaSuppression3d for details.
- Return type:
- kornia.geometry.subpix.nms3d_minmax(input)[source]¶
Compute both local-maxima and local-minima NMS masks for a 3-D scale-space tensor in one pass.
Equivalent to calling
nms3d(input, (3,3,3), mask_only=True)andnms3d(-input, (3,3,3), mask_only=True)separately, but only traverses the 26-neighbour comparisons once, halving the NMS cost.Uses integer slice literals (not Python loops or slice objects) so the 52 comparison-and-reduction ops are visible to the compiler at trace time, allowing full fusion into a minimal number of kernels.
- Parameters:
input (
Tensor) – 5-D tensor of shape \((B, C, D, H, W)\).- Return type:
- Returns:
A pair
(max_mask, min_mask)of bool tensors with the same shape as input.max_mask[..., d, h, w]isTruewhen the voxel is strictly greater than all 26 neighbours;min_maskis the same for strict local minima.
Example
>>> x = torch.randn(1, 1, 5, 10, 10) >>> max_mask, min_mask = nms3d_minmax(x) >>> max_mask.shape torch.Size([1, 1, 5, 10, 10])
Module¶
- class kornia.geometry.subpix.SpatialSoftArgmax2d(temperature=None, normalized_coordinates=True)[source]¶
Compute the Spatial Soft-Argmax 2D of a given heatmap.
See
spatial_soft_argmax2d()for details.
- class kornia.geometry.subpix.ConvSoftArgmax2d(kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), temperature=1.0, normalized_coordinates=True, eps=1e-8, output_value=False)[source]¶
nn.Module that calculates soft argmax 2d per window.
See :func: ~kornia.geometry.subpix.conv_soft_argmax2d for details.
- class kornia.geometry.subpix.ConvSoftArgmax3d(kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), temperature=1.0, normalized_coordinates=False, eps=1e-8, output_value=True, strict_maxima_bonus=0.0)[source]¶
nn.Module that calculates soft argmax 3d per window.
See :func: ~kornia.geometry.subpix.conv_soft_argmax3d for details.
- class kornia.geometry.subpix.AdaptiveQuadInterp3d(mode='auto', n_iters=5, strict_maxima_bonus=10.0, max_subpixel_shift=0.6, dilation_radius=1, allow_scale_steps=True, max_candidates=None)[source]¶
Subpixel localization of 3D scale-space extrema with automatic backend selection.
Wraps
conv_quad_interp3d()anditerative_quad_interp3d(), choosing the faster backend based on the input device and the requestedmode.Benchmarks show:
GPU —
conv_quad_interp3d()is 1.5-2x faster due to better parallelism on the batched gather+solve.CPU —
iterative_quad_interp3d()is faster for large images because it processes only the NMS maxima directly without any dilation/dedup overhead.
- Parameters:
mode (
str, optional) –backend selection strategy. Default:
"auto""patch"— always useiterative_quad_interp3d()."conv"— always useconv_quad_interp3d()."auto"— use"conv"when the input is on a CUDA device,"patch"otherwise.
n_iters (
int, optional) – maximum localization iterations per keypoint. Default:5strict_maxima_bonus (
float, optional) – score bonus added at NMS-maximum positions. Default:10.0max_subpixel_shift (
float, optional) – integer-centre move threshold. Default:0.6dilation_radius (
int, optional) – L\(\infty\) precompute radius for"conv"mode (ignored in"patch"mode). Default:1max_candidates (
Optional[int], optional) – if set, only the top-max_candidatesNMS maxima by pre-refinement response are processed in"patch"mode. Has no effect in"conv"mode. Useful on CPU when the number of 3-D NMS maxima greatly exceeds the desired number of keypoints (seeiterative_quad_interp3d()). Default:None
Example
>>> inp = torch.randn(1, 1, 3, 64, 64) >>> subpix = AdaptiveQuadInterp3d(mode="auto") >>> coords, vals = subpix(inp) >>> coords.shape torch.Size([1, 1, 3, 3, 64, 64]) >>> vals.shape torch.Size([1, 1, 3, 64, 64])
- class kornia.geometry.subpix.ConvQuadInterp3d(n_iters=5, strict_maxima_bonus=10.0, max_subpixel_shift=0.6, dilation_radius=1, allow_scale_steps=True)[source]¶
Subpixel localization of 3D scale-space extrema via quadratic interpolation.
Wraps
conv_quad_interp3d(). The Hessian system is solved once for each voxel in the dilated NMS neighbourhood (no dense precomputation over the whole volume), then the shift chain is followed by table lookup with no GPU→CPU synchronisation — making the module compatible withtorch.compileand CUDA graphs.
- class kornia.geometry.subpix.IterativeQuadInterp3d(n_iters=5, strict_maxima_bonus=10.0, max_subpixel_shift=0.6, allow_scale_steps=True, max_candidates=None)[source]¶
Iterative subpixel localization of 3D extrema via quadratic interpolation.
See
iterative_quad_interp3d()for details.