kornia.metrics¶

Module containing metrics for training networks

Classification¶

kornia.metrics.accuracy(pred, target, topk=(1,))[source]¶

Compute the accuracy over the k top predictions for the specified values of k.

Parameters:

pred (Tensor) – the input torch.Tensor with the logits to evaluate.
target (Tensor) – the torch.Tensor containing the ground truth.
topk (Tuple[int, ...], optional) – the expected topk ranking. Default: (1,)

Return type:

List[Tensor]

Example

>>> logits = torch.tensor([[0, 1, 0]])
>>> target = torch.tensor([[1]])
>>> accuracy(logits, target)
[tensor(100.)]

Segmentation¶

kornia.metrics.confusion_matrix(pred, target, num_classes, normalized=False)[source]¶

Compute confusion matrix to evaluate the accuracy of a classification.

Parameters:

pred (Tensor) – tensor with estimated targets returned by a classifier. The shape can be \((B, *)\) and must contain integer values between 0 and K-1.
target (Tensor) – tensor with ground truth (correct) target values. The shape can be \((B, *)\) and must contain integer values between 0 and K-1, where targets are assumed to be provided as one-hot vectors.
num_classes (int) – total possible number of classes in target.
normalized (bool, optional) – whether to return the confusion matrix normalized. Default: False

Return type:

Tensor

Returns:

a tensor containing the confusion matrix with shape \((B, K, K)\) where K is the number of classes.

Example

>>> logits = torch.tensor([[0, 1, 0]])
>>> target = torch.tensor([[0, 1, 0]])
>>> confusion_matrix(logits, target, num_classes=3)
tensor([[[2., 0., 0.],
         [0., 1., 0.],
         [0., 0., 0.]]])

kornia.metrics.mean_iou(pred, target, num_classes, eps=1e-6)[source]¶

Calculate mean Intersection-Over-Union (mIOU).

The function internally computes the confusion matrix.

Parameters:

pred (Tensor) – tensor with estimated targets returned by a classifier. The shape can be \((B, *)\) and must contain integer values between 0 and K-1.
target (Tensor) – tensor with ground truth (correct) target values. The shape can be \((B, *)\) and must contain integer values between 0 and K-1, where targets are assumed to be provided as one-hot vectors.
num_classes (int) – total possible number of classes in target.
eps (float, optional) – epsilon for numerical stability. Default: 1e-6

Return type:

Tensor

Returns:

a tensor representing the mean intersection-over union with shape \((B, K)\) where K is the number of classes.

Example

>>> logits = torch.tensor([[0, 1, 0]])
>>> target = torch.tensor([[0, 1, 0]])
>>> mean_iou(logits, target, num_classes=3)
tensor([[1., 1., 1.]])

Detection¶

kornia.metrics.mean_average_precision(pred_boxes, pred_labels, pred_scores, gt_boxes, gt_labels, n_classes, threshold=0.5)[source]¶

Calculate the Mean Average Precision (mAP) of detected objects.

Code altered from https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection/blob/master/utils.py#L271. Background class (0 index) is excluded.

Parameters:

pred_boxes (List[Tensor]) – a torch.Tensor list of predicted bounding boxes.
pred_labels (List[Tensor]) – a torch.Tensor list of predicted labels.
pred_scores (List[Tensor]) – a torch.Tensor list of predicted labels’ scores.
gt_boxes (List[Tensor]) – a torch.Tensor list of ground truth bounding boxes.
gt_labels (List[Tensor]) – a torch.Tensor list of ground truth labels.
n_classes (int) – the number of classes.
threshold (float, optional) – count as a positive if the overlap is greater than the threshold. Default: 0.5

Return type:

Tuple[Tensor, Dict[int, float]]

Returns:

mean average precision (mAP), list of average precisions for each class.

Examples

>>> boxes, labels, scores = torch.tensor([[100, 50, 150, 100.]]), torch.tensor([1]), torch.tensor([.7])
>>> gt_boxes, gt_labels = torch.tensor([[100, 50, 150, 100.]]), torch.tensor([1])
>>> mean_average_precision([boxes], [labels], [scores], [gt_boxes], [gt_labels], 2)
(tensor(1.), {1: 1.0})

kornia.metrics.mean_iou_bbox(boxes_1, boxes_2, box_format='xyxy')[source]¶

Compute the IoU of the cartesian product of two sets of boxes.

Parameters:

boxes_1 (Tensor) – a tensor of bounding boxes in \((B1, 4)\).
boxes_2 (Tensor) – a tensor of bounding boxes in \((B2, 4)\).
box_format (str, optional) – the bounding box format. Supported formats are: - ‘xyxy’: (x1, y1, x2, y2) where (x1, y1) is top-left and (x2, y2) is bottom-right - ‘xywh’: (x, y, w, h) where (x, y) is top-left, w is width, h is height - ‘cxcywh’: (cx, cy, w, h) where (cx, cy) is center, w is width, h is height Default: ‘xyxy’.

Return type:

Tensor

Returns:

a tensor in dimensions \((B1, B2)\), representing the intersection of each of the boxes in set 1 with respect to each of the boxes in set 2.

Example

>>> # XYXY format
>>> boxes_1 = torch.tensor([[40, 40, 60, 60], [30, 40, 50, 60]])
>>> boxes_2 = torch.tensor([[40, 50, 60, 70], [30, 40, 40, 50]])
>>> mean_iou_bbox(boxes_1, boxes_2)
tensor([[0.3333, 0.0000],
        [0.1429, 0.2500]])
>>> # XYWH format
>>> boxes_1_xywh = torch.tensor([[40, 40, 20, 20], [30, 40, 20, 20]])
>>> boxes_2_xywh = torch.tensor([[40, 50, 20, 20], [30, 40, 10, 10]])
>>> mean_iou_bbox(boxes_1_xywh, boxes_2_xywh, box_format='xywh')
tensor([[0.3333, 0.0000],
        [0.1429, 0.2500]])
>>> # CXCYWH format
>>> boxes_1_cxcywh = torch.tensor([[50, 50, 20, 20], [40, 50, 20, 20]])
>>> boxes_2_cxcywh = torch.tensor([[50, 60, 20, 20], [35, 45, 10, 10]])
>>> mean_iou_bbox(boxes_1_cxcywh, boxes_2_cxcywh, box_format='cxcywh')
tensor([[0.3333, 0.0000],
        [0.1429, 0.2500]])

Image Quality¶

kornia.metrics.psnr(image, target, max_val)[source]¶

Create a function that calculates the PSNR between 2 images.

PSNR is Peek Signal to Noise Ratio, which is similar to mean squared error. Given an m x n image, the PSNR is:

\[\text{PSNR} = 10 \log_{10} \bigg(\frac{\text{MAX}_I^2}{MSE(I,T)}\bigg)\]

where

\[\text{MSE}(I,T) = \frac{1}{mn}\sum_{i=0}^{m-1}\sum_{j=0}^{n-1} [I(i,j) - T(i,j)]^2\]

and \(\text{MAX}_I\) is the maximum possible input value (e.g for floating point images \(\text{MAX}_I=1\)).

Parameters:

image (Tensor) – the input image with arbitrary shape \((*)\).
target (Tensor) – the labels image with arbitrary shape \((*)\).
max_val (float) – The maximum value in the input tensor.

Return type:

Tensor

Returns:

the computed loss as a scalar.

Examples

>>> ones = torch.ones(1)
>>> psnr(ones, 1.2 * ones, 2.) # 10 * log(4/((1.2-1)**2)) / log(10)
tensor(20.0000)

Reference:: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio#Definition

kornia.metrics.ssim(img1, img2, window_size, max_val=1.0, eps=1e-12, padding='same')[source]¶

Compute the Structural Similarity (SSIM) index map between two images.

Measures the (SSIM) index between each element in the input x and target y.

The index can be described as:

\[\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)} {(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}\]

where:

\(c_1=(k_1 L)^2\) and \(c_2=(k_2 L)^2\) are two variables to stabilize the division with weak denominator.
\(L\) is the dynamic range of the pixel-values (typically this is \(2^{\#\text{bits per pixel}}-1\)).

Parameters:

img1 (Tensor) – the first input image with shape \((B, C, H, W)\).
img2 (Tensor) – the second input image with shape \((B, C, H, W)\).
window_size (int) – the size of the gaussian kernel to smooth the images.
max_val (float, optional) – the dynamic range of the images. Default: 1.0
eps (float, optional) – Small value for numerically stability when dividing. Default: 1e-12
padding (str, optional) – 'same' | 'valid'. Whether to only use the “valid” convolution area to compute SSIM to match the MATLAB implementation of original SSIM paper. Default: "same"

Return type:

Tensor

Returns:

The ssim index map with shape \((B, C, H, W)\).

Examples

>>> input1 = torch.rand(1, 4, 5, 5)
>>> input2 = torch.rand(1, 4, 5, 5)
>>> ssim_map = ssim(input1, input2, 5)  # 1x4x5x5

kornia.metrics.ssim3d(img1, img2, window_size, max_val=1.0, eps=1e-12, padding='same')[source]¶

Compute the Structural Similarity (SSIM) index map between two images.

Measures the (SSIM) index between each element in the input x and target y.

The index can be described as:

\[\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)} {(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}\]

torch.where:

\(c_1=(k_1 L)^2\) and \(c_2=(k_2 L)^2\) are two variables to stabilize the division with weak denominator.
\(L\) is the dynamic range of the pixel-values (typically this is \(2^{\#\text{bits per pixel}}-1\)).

Parameters:

img1 (Tensor) – the first input image with shape \((B, C, D, H, W)\).
img2 (Tensor) – the second input image with shape \((B, C, D, H, W)\).
window_size (int) – the size of the gaussian kernel to smooth the images.
max_val (float, optional) – the dynamic range of the images. Default: 1.0
eps (float, optional) – Small value for numerically stability when dividing. Default: 1e-12
padding (str, optional) – 'same' | 'valid'. Whether to only use the “valid” convolution area to compute SSIM to match the MATLAB implementation of original SSIM paper. Default: "same"

Return type:

Tensor

Returns:

The ssim index map with shape \((B, C, D, H, W)\).

Examples

>>> input1 = torch.rand(1, 4, 5, 5, 5)
>>> input2 = torch.rand(1, 4, 5, 5, 5)
>>> ssim_map = ssim3d(input1, input2, 5)  # 1x4x5x5x5

class kornia.metrics.SSIM(window_size, max_val=1.0, eps=1e-12, padding='same')[source]¶

Create a module that computes the Structural Similarity (SSIM) index between two images.

Measures the (SSIM) index between each element in the input x and target y.

The index can be described as:

\[\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)} {(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}\]

where:

\(c_1=(k_1 L)^2\) and \(c_2=(k_2 L)^2\) are two variables to stabilize the division with weak denominator.
\(L\) is the dynamic range of the pixel-values (typically this is \(2^{\#\text{bits per pixel}}-1\)).

Parameters:

window_size (int) – the size of the gaussian kernel to smooth the images.
max_val (float, optional) – the dynamic range of the images. Default: 1.0
eps (float, optional) – Small value for numerically stability when dividing. Default: 1e-12
padding (str, optional) – 'same' | 'valid'. Whether to only use the “valid” convolution area to compute SSIM to match the MATLAB implementation of original SSIM paper. Default: "same"

Shape:

Input: \((B, C, H, W)\).
Target \((B, C, H, W)\).
Output: \((B, C, H, W)\).

Examples

>>> input1 = torch.rand(1, 4, 5, 5)
>>> input2 = torch.rand(1, 4, 5, 5)
>>> ssim = SSIM(5)
>>> ssim_map = ssim(input1, input2)  # 1x4x5x5

class kornia.metrics.SSIM3D(window_size, max_val=1.0, eps=1e-12, padding='same')[source]¶

Create a module that computes the Structural Similarity (SSIM) index between two 3D images.

Measures the (SSIM) index between each element in the input x and target y.

The index can be described as:

\[\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)} {(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}\]

torch.where:

\(c_1=(k_1 L)^2\) and \(c_2=(k_2 L)^2\) are two variables to stabilize the division with weak denominator.
\(L\) is the dynamic range of the pixel-values (typically this is \(2^{\#\text{bits per pixel}}-1\)).

Parameters:

window_size (int) – the size of the gaussian kernel to smooth the images.
max_val (float, optional) – the dynamic range of the images. Default: 1.0
eps (float, optional) – Small value for numerically stability when dividing. Default: 1e-12
padding (str, optional) – 'same' | 'valid'. Whether to only use the “valid” convolution area to compute SSIM to match the MATLAB implementation of original SSIM paper. Default: "same"

Shape:

Input: \((B, C, D, H, W)\).
Target \((B, C, D, H, W)\).
Output: \((B, C, D, H, W)\).

Examples

>>> input1 = torch.rand(1, 4, 5, 5, 5)
>>> input2 = torch.rand(1, 4, 5, 5, 5)
>>> ssim = SSIM3D(5)
>>> ssim_map = ssim(input1, input2)  # 1x4x5x5x5

Optical Flow¶

kornia.metrics.aepe(input, target, reduction='mean')[source]¶

Create a function that calculates the average endpoint error (AEPE) between 2 flow maps.

AEPE is the endpoint error between two 2D vectors (e.g., optical flow). Given a h x w x 2 optical flow map, the AEPE is:

\[\text{AEPE}=\frac{1}{hw}\sum_{i=1, j=1}^{h, w}\sqrt{(I_{i,j,1}-T_{i,j,1})^{2}+(I_{i,j,2}-T_{i,j,2})^{2}}\]

Parameters:

input (Tensor) – the input flow map with shape \((*, 2)\).
target (Tensor) – the target flow map with shape \((*, 2)\).
reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: "mean"

Return type:

Tensor

Returns:

the computed AEPE as a scalar.

Examples

>>> ones = torch.ones(4, 4, 2)
>>> aepe(ones, 1.2 * ones)
tensor(0.2828)

Reference:: https://link.springer.com/content/pdf/10.1007/s11263-010-0390-2.pdf

class kornia.metrics.AEPE(reduction='mean')[source]¶

Computes the average endpoint error (AEPE) between 2 flow maps.

EPE is the endpoint error between two 2D vectors (e.g., optical flow). Given a h x w x 2 optical flow map, the AEPE is:

\[\text{AEPE}=\frac{1}{hw}\sum_{i=1, j=1}^{h, w}\sqrt{(I_{i,j,1}-T_{i,j,1})^{2}+(I_{i,j,2}-T_{i,j,2})^{2}}\]

Parameters:: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: "mean"

Shape:

input: \((*, 2)\).
target \((*, 2)\).
output: \((1)\).

Examples

>>> input1 = torch.rand(1, 4, 5, 2)
>>> input2 = torch.rand(1, 4, 5, 2)
>>> epe = AEPE(reduction="mean")
>>> epe = epe(input1, input2)

Stereo¶

kornia.metrics.mean_absolute_disparity_error(input, target, valid_mask=None, reduction='mean')[source]¶

Compute the mean absolute error (MAE) between two disparity maps.

Given predicted and ground truth disparity maps \(D\) and \(D^{gt}\) with valid pixels \(\mathcal{V}\), the metric is:

\[\text{MAE}(D, D^{gt}) = \frac{1}{|\mathcal{V}|}\sum_{p \in \mathcal{V}} |D_{p} - D^{gt}_{p}|\]

Parameters:

input (Tensor) – the predicted disparity map with arbitrary shape \((*)\).
target (Tensor) – the ground truth disparity map with the same shape as input.
valid_mask (Optional[Tensor], optional) – optional mask broadcastable to the shape of input, where nonzero (True) values mark the pixels to evaluate. Non-boolean masks are converted to boolean. If None, all pixels are evaluated. Default: None
reduction (str, optional) – specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'mean': the error is averaged over the valid pixels, 'sum': the error is summed over the valid pixels, 'none': no reduction will be applied and the per-pixel error map is returned, with masked-out positions set to zero. Default: "mean"

Return type:

Tensor

Returns:

the computed metric as a scalar, or the per-pixel error map if reduction='none'.

Note

If valid_mask selects no pixels, 'mean' reduction returns nan.

Examples

>>> input = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
>>> target = torch.tensor([[0.0, 1.0], [2.0, 4.0]])
>>> mean_absolute_disparity_error(input, target)
tensor(0.2500)
>>> valid_mask = torch.tensor([[True, True], [True, False]])
>>> mean_absolute_disparity_error(input, target, valid_mask)
tensor(0.)

Reference:: D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 2002. https://vision.middlebury.edu/stereo/taxonomy-IJCV.pdf

kornia.metrics.root_mean_squared_disparity_error(input, target, valid_mask=None, reduction='mean')[source]¶

Compute the root mean squared error (RMSE) between two disparity maps.

Given predicted and ground truth disparity maps \(D\) and \(D^{gt}\) with valid pixels \(\mathcal{V}\), the metric is:

\[\text{RMSE}(D, D^{gt}) = \sqrt{\frac{1}{|\mathcal{V}|}\sum_{p \in \mathcal{V}} (D_{p} - D^{gt}_{p})^{2}}\]

Parameters:

input (Tensor) – the predicted disparity map with arbitrary shape \((*)\).
target (Tensor) – the ground truth disparity map with the same shape as input.
valid_mask (Optional[Tensor], optional) – optional mask broadcastable to the shape of input, where nonzero (True) values mark the pixels to evaluate. Non-boolean masks are converted to boolean. If None, all pixels are evaluated. Default: None
reduction (str, optional) – specifies the reduction to apply to the squared error before the square root: 'none' | 'mean' | 'sum'. 'mean': the squared error is averaged over the valid pixels, 'sum': the squared error is summed over the valid pixels, 'none': no reduction will be applied and the per-pixel absolute error map is returned, with masked-out positions set to zero. Default: "mean"

Return type:

Tensor

Returns:

the computed metric as a scalar, or the per-pixel error map if reduction='none'.

Note

If valid_mask selects no pixels, 'mean' reduction returns nan.

Examples

>>> input = torch.zeros(2, 2)
>>> target = torch.tensor([[0.0, 0.0], [0.0, 1.0]])
>>> root_mean_squared_disparity_error(input, target)
tensor(0.5000)

Reference:: D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 2002. https://vision.middlebury.edu/stereo/taxonomy-IJCV.pdf

kornia.metrics.mean_bad_pixel_error(input, target, threshold=3.0, valid_mask=None, reduction='mean')[source]¶

Compute the bad pixel ratio between two disparity maps.

A pixel is considered bad when its absolute disparity error is strictly greater than threshold. Given predicted and ground truth disparity maps \(D\) and \(D^{gt}\) with valid pixels \(\mathcal{V}\), the metric is:

\[\text{Bad}_{\tau}(D, D^{gt}) = \frac{1}{|\mathcal{V}|}\sum_{p \in \mathcal{V}} [|D_{p} - D^{gt}_{p}| > \tau]\]

This corresponds to the bad-pixel percentage reported by the Middlebury and KITTI stereo benchmarks, expressed as a fraction in \([0, 1]\) instead of a percentage.

Parameters:

input (Tensor) – the predicted disparity map with arbitrary shape \((*)\).
target (Tensor) – the ground truth disparity map with the same shape as input.
threshold (float, optional) – the disparity error above which a pixel is considered bad. Default: 3.0
valid_mask (Optional[Tensor], optional) – optional mask broadcastable to the shape of input, where nonzero (True) values mark the pixels to evaluate. Non-boolean masks are converted to boolean. If None, all pixels are evaluated. Default: None
reduction (str, optional) – specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'mean': the fraction of bad pixels among the valid pixels, 'sum': the number of bad pixels among the valid pixels, 'none': no reduction will be applied and the per-pixel bad-pixel map is returned, with masked-out positions set to zero. Default: "mean"

Return type:

Tensor

Returns:

the computed metric as a scalar, or the per-pixel bad-pixel map if reduction='none'.

Note

If valid_mask selects no pixels, 'mean' reduction returns nan.

Examples

>>> input = torch.zeros(2, 2)
>>> target = torch.tensor([[0.0, 1.0], [2.0, 4.0]])
>>> mean_bad_pixel_error(input, target, threshold=1.5)
tensor(0.5000)

Reference:: D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 2002. https://vision.middlebury.edu/stereo/taxonomy-IJCV.pdf

Monitoring¶

class kornia.metrics.AverageMeter[source]¶

Computes and stores the average and current value.

Example

>>> stats = AverageMeter()
>>> acc1 = torch.tensor(0.99) # coming from K.metrics.accuracy
>>> stats.update(acc1, n=1)  # where n is batch size usually
>>> round(stats.avg, 2)
0.99