kornia.geometry.transform

The functions in this section perform various geometrical transformations of 2D images.

Warp operators

warp_perspective(src: torch.Tensor, M: torch.Tensor, dsize: Tuple[int, int], mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Applies a perspective transformation to an image.

The function warp_perspective transforms the source image using the specified matrix:

\[\text{dst} (x, y) = \text{src} \left( \frac{M^{-1}_{11} x + M^{-1}_{12} y + M^{-1}_{13}}{M^{-1}_{31} x + M^{-1}_{32} y + M^{-1}_{33}} , \frac{M^{-1}_{21} x + M^{-1}_{22} y + M^{-1}_{23}}{M^{-1}_{31} x + M^{-1}_{32} y + M^{-1}_{33}} \right )\]
Parameters
  • src (torch.Tensor) – input image with shape \((B, C, H, W)\).

  • M (torch.Tensor) – transformation matrix with shape \((B, 3, 3)\).

  • dsize (tuple) – size of the output image (height, width).

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

the warped input image \((B, C, H, W)\).

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 4, 5, 6)
>>> H = torch.eye(3)[None]
>>> out = warp_perspective(img, H, (4, 2), align_corners=True)
>>> print(out.shape)
torch.Size([1, 4, 4, 2])

Note

This function is often used in conjuntion with get_perspective_transform().

Note

See a working example here.

warp_perspective3d(src: torch.Tensor, M: torch.Tensor, dsize: Tuple[int, int, int], flags: str = 'bilinear', border_mode: str = 'zeros', align_corners: bool = False) → torch.Tensor[source]

Applies a perspective transformation to an image.

The function warp_perspective transforms the source image using the specified matrix:

\[\text{dst} (x, y) = \text{src} \left( \frac{M_{11} x + M_{12} y + M_{13}}{M_{31} x + M_{32} y + M_{33}} , \frac{M_{21} x + M_{22} y + M_{23}}{M_{31} x + M_{32} y + M_{33}} \right )\]
Parameters
  • src (torch.Tensor) – input image with shape \((B, C, D, H, W)\).

  • M (torch.Tensor) – transformation matrix with shape \((B, 4, 4)\).

  • dsize (tuple) – size of the output image (height, width).

  • flags (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • border_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool) – interpolation flag. Default: False.

Returns

the warped input image \((B, C, D, H, W)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with get_perspective_transform3d().

warp_affine(src: torch.Tensor, M: torch.Tensor, dsize: Tuple[int, int], mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Applies an affine transformation to a tensor.

The function warp_affine transforms the source tensor using the specified matrix:

\[\text{dst}(x, y) = \text{src} \left( M_{11} x + M_{12} y + M_{13} , M_{21} x + M_{22} y + M_{23} \right )\]
Parameters
  • src (torch.Tensor) – input tensor of shape \((B, C, H, W)\).

  • M (torch.Tensor) – affine transformation of shape \((B, 2, 3)\).

  • dsize (Tuple[int, int]) – size of the output image (height, width).

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – mode for grid_generation. Default: None.

Returns

the warped tensor with shape \((B, C, H, W)\).

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 4, 5, 6)
>>> A = torch.eye(2, 3)[None]
>>> out = warp_affine(img, A, (4, 2), align_corners=True)
>>> print(out.shape)
torch.Size([1, 4, 4, 2])

Note

This function is often used in conjuntion with get_rotation_matrix2d(), get_shear_matrix2d(), get_affine_matrix2d(), invert_affine_transform().

Note

See a working example here.

warp_affine3d(src: torch.Tensor, M: torch.Tensor, dsize: Tuple[int, int, int], flags: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Applies a projective transformation a to 3d tensor.

Warning

This API signature it is experimental and might suffer some changes in the future.

Parameters
  • src (torch.Tensor) – input tensor of shape \((B, C, D, H, W)\).

  • M (torch.Tensor) – projective transformation matrix of shape \((B, 3, 4)\).

  • dsize (Tuple[int, int, int]) – size of the output image (depth, height, width).

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool) – mode for grid_generation. Default: True.

Returns

the warped 3d tensor with shape \((B, C, D, H, W)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with get_perspective_transform3d().

warp_points_tps(points_src: torch.Tensor, kernel_centers: torch.Tensor, kernel_weights: torch.Tensor, affine_weights: torch.Tensor) → torch.Tensor[source]

Warp a tensor of coordinate points using the thin plate spline defined by kernel points, kernel weights, and affine weights.

The source points should be a \((B, N, 2)\) tensor of \((x, y)\) coordinates. The kernel centers are a \((B, K, 2)\) tensor of \((x, y)\) coordinates. The kernel weights are a \((B, K, 2)\) tensor, and the affine weights are a \((B, 3, 2)\) tensor. For the weight tensors, tensor[…, 0] contains the weights for the x-transform and tensor[…, 1] the weights for the y-transform.

Parameters
  • points_src (torch.Tensor) – tensor of source points \((B, N, 2)\).

  • kernel_centers (torch.Tensor) – tensor of kernel center points \((B, K, 2)\).

  • kernel_weights (torch.Tensor) – tensor of kernl weights \((B, K, 2)\).

  • affine_weights (torch.Tensor) – tensor of affine weights \((B, 3, 2)\).

Returns

The \((B, N, 2)\) tensor of warped source points, from applying the TPS transform.

Return type

torch.Tensor

Example

>>> points_src = torch.rand(1, 5, 2)
>>> points_dst = torch.rand(1, 5, 2)
>>> kernel_weights, affine_weights = get_tps_transform(points_src, points_dst)
>>> warped = warp_points_tps(points_src, points_dst, kernel_weights, affine_weights)
>>> warped_correct = torch.allclose(warped, points_dst)

Note

This function is often used in conjuntion with get_tps_transform().

warp_image_tps(image: torch.Tensor, kernel_centers: torch.Tensor, kernel_weights: torch.Tensor, affine_weights: torch.Tensor, align_corners: bool = False) → torch.Tensor[source]

Warp an image tensor according to the thin plate spline transform defined by kernel centers, kernel weights, and affine weights.

The transform is applied to each pixel coordinate in the output image to obtain a point in the input image for interpolation of the output pixel. So the TPS parameters should correspond to a warp from output space to input space.

The input image is a \((B, C, H, W)\) tensor. The kernel centers, kernel weight and affine weights are the same as in warp_points_tps.

Parameters
  • image (torch.Tensor) – input image tensor \((B, C, H, W)\).

  • kernel_centers (torch.Tensor) – kernel center points \((B, K, 2)\).

  • kernel_weights (torch.Tensor) – tensor of kernl weights \((B, K, 2)\).

  • affine_weights (torch.Tensor) – tensor of affine weights \((B, 3, 2)\).

  • align_corners (bool) – interpolation flag used by grid_sample. Default: False.

Returns

warped image tensor \((B, C, H, W)\).

Return type

torch.Tensor

Example

>>> points_src = torch.rand(1, 5, 2)
>>> points_dst = torch.rand(1, 5, 2)
>>> image = torch.rand(1, 3, 32, 32)
>>> # note that we are getting the reverse transform: dst -> src
>>> kernel_weights, affine_weights = get_tps_transform(points_dst, points_src)
>>> warped_image = warp_image_tps(image, points_src, kernel_weights, affine_weights)

Note

This function is often used in conjuntion with get_tps_transform().

remap(tensor: torch.Tensor, map_x: torch.Tensor, map_y: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None, normalized_coordinates: bool = False) → torch.Tensor[source]

Applies a generic geometrical transformation to a tensor.

The function remap transforms the source tensor using the specified map:

\[\text{dst}(x, y) = \text{src}(map_x(x, y), map_y(x, y))\]
Parameters
  • tensor (torch.Tensor) – the tensor to remap with shape (B, D, H, W). Where D is the number of channels.

  • map_x (torch.Tensor) – the flow in the x-direction in pixel coordinates. The tensor must be in the shape of (B, H, W).

  • map_y (torch.Tensor) – the flow in the y-direction in pixel coordinates. The tensor must be in the shape of (B, H, W).

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – mode for grid_generation. Default: None.

  • normalized_coordinates (bool) – whether the input coordinates are normalised in the range of [-1, 1]. Default: False

Returns

the warped tensor with same shape as the input grid maps.

Return type

torch.Tensor

Example

>>> from kornia.utils import create_meshgrid
>>> grid = create_meshgrid(2, 2, False)  # 1x2x2x2
>>> grid += 1  # apply offset in both directions
>>> input = torch.ones(1, 1, 2, 2)
>>> remap(input, grid[..., 0], grid[..., 1], align_corners=True)   # 1x1x2x2
tensor([[[[1., 0.],
          [0., 0.]]]])

Note

This function is often used in conjuntion with create_meshgrid().

Matrix transformations

get_perspective_transform(src, dst)[source]

Calculates a perspective transform from four pairs of the corresponding points.

The function calculates the matrix of a perspective transform so that:

\[\begin{split}\begin{bmatrix} t_{i}x_{i}^{'} \\ t_{i}y_{i}^{'} \\ t_{i} \\ \end{bmatrix} = \textbf{map_matrix} \cdot \begin{bmatrix} x_{i} \\ y_{i} \\ 1 \\ \end{bmatrix}\end{split}\]

where

\[dst(i) = (x_{i}^{'},y_{i}^{'}), src(i) = (x_{i}, y_{i}), i = 0,1,2,3\]
Parameters
  • src (torch.Tensor) – coordinates of quadrangle vertices in the source image with shape \((B, 4, 2)\).

  • dst (torch.Tensor) – coordinates of the corresponding quadrangle vertices in the destination image with shape \((B, 4, 2)\).

Returns

the perspective transformation with shape \((B, 3, 3)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_perspective().

get_perspective_transform3d(src: torch.Tensor, dst: torch.Tensor) → torch.Tensor[source]

Calculate a 3d perspective transform from four pairs of the corresponding points.

The function calculates the matrix of a perspective transform so that:

\[\begin{split}\begin{bmatrix} t_{i}x_{i}^{'} \\ t_{i}y_{i}^{'} \\ t_{i}z_{i}^{'} \\ t_{i} \\ \end{bmatrix} = \textbf{map_matrix} \cdot \begin{bmatrix} x_{i} \\ y_{i} \\ z_{i} \\ 1 \\ \end{bmatrix}\end{split}\]

where

\[dst(i) = (x_{i}^{'},y_{i}^{'},z_{i}^{'}), src(i) = (x_{i}, y_{i}, z_{i}), i = 0,1,2,5,7\]

Concrete math is as below:

\[\[ u_i =\frac{c_{00} * x_i + c_{01} * y_i + c_{02} * z_i + c_{03}} {c_{30} * x_i + c_{31} * y_i + c_{32} * z_i + c_{33}} \] \[ v_i =\frac{c_{10} * x_i + c_{11} * y_i + c_{12} * z_i + c_{13}} {c_{30} * x_i + c_{31} * y_i + c_{32} * z_i + c_{33}} \] \[ w_i =\frac{c_{20} * x_i + c_{21} * y_i + c_{22} * z_i + c_{23}} {c_{30} * x_i + c_{31} * y_i + c_{32} * z_i + c_{33}} \]\]
\[\begin{split}\begin{pmatrix} x_0 & y_0 & z_0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -x_0*u_0 & -y_0*u_0 & -z_0 * u_0 \\ x_1 & y_1 & z_1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -x_1*u_1 & -y_1*u_1 & -z_1 * u_1 \\ x_2 & y_2 & z_2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -x_2*u_2 & -y_2*u_2 & -z_2 * u_2 \\ x_5 & y_5 & z_5 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -x_5*u_5 & -y_5*u_5 & -z_5 * u_5 \\ x_7 & y_7 & z_7 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -x_7*u_7 & -y_7*u_7 & -z_7 * u_7 \\ 0 & 0 & 0 & 0 & x_0 & y_0 & z_0 & 1 & 0 & 0 & 0 & 0 & -x_0*v_0 & -y_0*v_0 & -z_0 * v_0 \\ 0 & 0 & 0 & 0 & x_1 & y_1 & z_1 & 1 & 0 & 0 & 0 & 0 & -x_1*v_1 & -y_1*v_1 & -z_1 * v_1 \\ 0 & 0 & 0 & 0 & x_2 & y_2 & z_2 & 1 & 0 & 0 & 0 & 0 & -x_2*v_2 & -y_2*v_2 & -z_2 * v_2 \\ 0 & 0 & 0 & 0 & x_5 & y_5 & z_5 & 1 & 0 & 0 & 0 & 0 & -x_5*v_5 & -y_5*v_5 & -z_5 * v_5 \\ 0 & 0 & 0 & 0 & x_7 & y_7 & z_7 & 1 & 0 & 0 & 0 & 0 & -x_7*v_7 & -y_7*v_7 & -z_7 * v_7 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x_0 & y_0 & z_0 & 1 & -x_0*w_0 & -y_0*w_0 & -z_0 * w_0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x_1 & y_1 & z_1 & 1 & -x_1*w_1 & -y_1*w_1 & -z_1 * w_1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x_2 & y_2 & z_2 & 1 & -x_2*w_2 & -y_2*w_2 & -z_2 * w_2 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x_5 & y_5 & z_5 & 1 & -x_5*w_5 & -y_5*w_5 & -z_5 * w_5 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x_7 & y_7 & z_7 & 1 & -x_7*w_7 & -y_7*w_7 & -z_7 * w_7 \\ \end{pmatrix}\end{split}\]
Parameters
  • src (torch.Tensor) – coordinates of quadrangle vertices in the source image with shape \((B, 8, 3)\).

  • dst (torch.Tensor) – coordinates of the corresponding quadrangle vertices in the destination image with shape \((B, 8, 3)\).

Returns

the perspective transformation with shape \((B, 4, 4)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_perspective3d().

get_projective_transform(center: torch.Tensor, angles: torch.Tensor, scales: torch.Tensor) → torch.Tensor[source]

Calculates the projection matrix for a 3D rotation.

Warning

This API signature it is experimental and might suffer some changes in the future.

The function computes the projection matrix given the center and angles per axis.

Parameters
  • center (torch.Tensor) – center of the rotation (x,y,z) in the source with shape \((B, 3)\).

  • angles (torch.Tensor) – angle axis vector containing the rotation angles in degrees in the form of (rx, ry, rz) with shape \((B, 3)\). Internally it calls Rodrigues to compute the rotation matrix from axis-angle.

  • scales (torch.Tensor) – scale factor for x-y-z-directions with shape \((B, 3)\).

Returns

the projection matrix of 3D rotation with shape \((B, 3, 4)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_affine3d().

get_rotation_matrix2d(center: torch.Tensor, angle: torch.Tensor, scale: torch.Tensor) → torch.Tensor[source]

Calculates an affine matrix of 2D rotation.

The function calculates the following matrix:

\[\begin{split}\begin{bmatrix} \alpha & \beta & (1 - \alpha) \cdot \text{x} - \beta \cdot \text{y} \\ -\beta & \alpha & \beta \cdot \text{x} + (1 - \alpha) \cdot \text{y} \end{bmatrix}\end{split}\]

where

\[\begin{split}\alpha = \text{scale} \cdot cos(\text{angle}) \\ \beta = \text{scale} \cdot sin(\text{angle})\end{split}\]

The transformation maps the rotation center to itself If this is not the target, adjust the shift.

Parameters
  • center (torch.Tensor) – center of the rotation in the source image with shape \((B, 2)\).

  • angle (torch.Tensor) – rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner) with shape \((B)\).

  • scale (torch.Tensor) – scale factor for x, y scaling with shape \((B, 2)\).

Returns

the affine matrix of 2D rotation with shape \((B, 2, 3)\).

Return type

torch.Tensor

Example

>>> center = torch.zeros(1, 2)
>>> scale = torch.ones((1, 2))
>>> angle = 45. * torch.ones(1)
>>> get_rotation_matrix2d(center, angle, scale)
tensor([[[ 0.7071,  0.7071,  0.0000],
         [-0.7071,  0.7071,  0.0000]]])

Note

This function is often used in conjuntion with warp_affine().

get_shear_matrix2d(center: torch.Tensor, sx: Optional[torch.Tensor] = None, sy: Optional[torch.Tensor] = None)[source]

Composes shear matrix Bx4x4 from the components.

Note: Ordered shearing, shear x-axis then y-axis.

\[\begin{split}\begin{bmatrix} 1 & b \\ a & ab + 1 \\ \end{bmatrix}\end{split}\]
Parameters
  • center (torch.Tensor) – shearing center coordinates of (x, y).

  • sx (torch.Tensor, optional) – shearing degree along x axis.

  • sy (torch.Tensor, optional) – shearing degree along y axis.

Returns

params to be passed to the affine transformation with shape \((B, 3, 3)\).

Return type

torch.Tensor

Examples

>>> rng = torch.manual_seed(0)
>>> sx = torch.randn(1)
>>> sx
tensor([1.5410])
>>> center = torch.tensor([[0., 0.]])  # Bx2
>>> get_shear_matrix2d(center, sx=sx)
tensor([[[  1.0000, -33.5468,   0.0000],
         [ -0.0000,   1.0000,   0.0000],
         [  0.0000,   0.0000,   1.0000]]])

Note

This function is often used in conjuntion with warp_affine(), warp_perspective().

get_shear_matrix3d(center: torch.Tensor, sxy: Optional[torch.Tensor] = None, sxz: Optional[torch.Tensor] = None, syx: Optional[torch.Tensor] = None, syz: Optional[torch.Tensor] = None, szx: Optional[torch.Tensor] = None, szy: Optional[torch.Tensor] = None)[source]

Composes shear matrix Bx4x4 from the components. Note: Ordered shearing, shear x-axis then y-axis then z-axis.

\[\begin{split}\begin{bmatrix} 1 & o & r & oy + rz \\ m & p & s & mx + py + sz -y \\ n & q & t & nx + qy + tz -z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} Where: m = S_{xy} n = S_{xz} o = S_{yx} p = S_{xy}S_{yx} + 1 q = S_{xz}S_{yx} + S_{yz} r = S_{zx} + S_{yx}S_{zy} s = S_{xy}S_{zx} + (S_{xy}S_{yx} + 1)S_{zy} t = S_{xz}S_{zx} + (S_{xz}S_{yx} + S_{yz})S_{zy} + 1\end{split}\]
Params:

center (torch.Tensor): shearing center coordinates of (x, y, z). sxy (torch.Tensor, optional): shearing degree along x axis, towards y plane. sxz (torch.Tensor, optional): shearing degree along x axis, towards z plane. syx (torch.Tensor, optional): shearing degree along y axis, towards x plane. syz (torch.Tensor, optional): shearing degree along y axis, towards z plane. szx (torch.Tensor, optional): shearing degree along z axis, towards x plane. szy (torch.Tensor, optional): shearing degree along z axis, towards y plane.

Returns

params to be passed to the affine transformation.

Return type

torch.Tensor

Examples

>>> rng = torch.manual_seed(0)
>>> sxy, sxz, syx, syz = torch.randn(4, 1)
>>> sxy, sxz, syx, syz
(tensor([1.5410]), tensor([-0.2934]), tensor([-2.1788]), tensor([0.5684]))
>>> center = torch.tensor([[0., 0., 0.]])  # Bx3
>>> get_shear_matrix3d(center, sxy=sxy, sxz=sxz, syx=syx, syz=syz)
tensor([[[  1.0000,  -1.4369,   0.0000,   0.0000],
         [-33.5468,  49.2039,   0.0000,   0.0000],
         [  0.3022,  -1.0729,   1.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,   1.0000]]])

Note

This function is often used in conjuntion with warp_perspective3d().

get_affine_matrix2d(translations: torch.Tensor, center: torch.Tensor, scale: torch.Tensor, angle: torch.Tensor, sx: Optional[torch.Tensor] = None, sy: Optional[torch.Tensor] = None) → torch.Tensor[source]

Composes affine matrix from the components.

Parameters
  • translations (torch.Tensor) – tensor containing the translation vector with shape \((B, 2)\).

  • center (torch.Tensor) – tensor containing the center vector with shape \((B, 2)\).

  • scale (torch.Tensor) – tensor containing the scale factor with shape \((B, 2)\).

  • angle (torch.Tensor) – tensor of angles in degrees \((B)\).

  • sx (torch.Tensor, optional) – tensor containing the shear factor in the x-direction with shape \((B)\).

  • sy (torch.Tensor, optional) – tensor containing the shear factor in the y-direction with shape \((B)\).

Returns

the affine transformation matrix \((B, 3, 3)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_affine(), warp_perspective().

get_affine_matrix3d(translations: torch.Tensor, center: torch.Tensor, scale: torch.Tensor, angles: torch.Tensor, sxy: Optional[torch.Tensor] = None, sxz: Optional[torch.Tensor] = None, syx: Optional[torch.Tensor] = None, syz: Optional[torch.Tensor] = None, szx: Optional[torch.Tensor] = None, szy: Optional[torch.Tensor] = None) → torch.Tensor[source]

Composes 3d affine matrix from the components.

Parameters
  • translations (torch.Tensor) – tensor containing the translation vector (dx,dy,dz) with shape \((B, 3)\).

  • center (torch.Tensor) – tensor containing the center vector (x,y,z) with shape \((B, 3)\).

  • scale (torch.Tensor) – tensor containing the scale factor with shape \((B)\).

  • angle – (torch.Tensor): angle axis vector containing the rotation angles in degrees in the form of (rx, ry, rz) with shape \((B, 3)\). Internally it calls Rodrigues to compute the rotation matrix from axis-angle.

  • sxy (torch.Tensor, optional) – tensor containing the shear factor in the xy-direction with shape \((B)\).

  • sxz (torch.Tensor, optional) – tensor containing the shear factor in the xz-direction with shape \((B)\).

  • syx (torch.Tensor, optional) – tensor containing the shear factor in the yx-direction with shape \((B)\).

  • syz (torch.Tensor, optional) – tensor containing the shear factor in the yz-direction with shape \((B)\).

  • szx (torch.Tensor, optional) – tensor containing the shear factor in the zx-direction with shape \((B)\).

  • szy (torch.Tensor, optional) – tensor containing the shear factor in the zy-direction with shape \((B)\).

Returns

the 3d affine transformation matrix \((B, 3, 3)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_perspective().

invert_affine_transform(matrix: torch.Tensor) → torch.Tensor[source]

Inverts an affine transformation.

The function computes an inverse affine transformation represented by 2×3 matrix:

\[\begin{split}\begin{bmatrix} a_{11} & a_{12} & b_{1} \\ a_{21} & a_{22} & b_{2} \\ \end{bmatrix}\end{split}\]

The result is also a 2×3 matrix of the same type as M.

Parameters

matrix (torch.Tensor) – original affine transform. The tensor must be in the shape of \((B, 2, 3)\).

Returns

the reverse affine transform with shape \((B, 2, 3)\).

Return type

torch.Tensor

Note

This function is often used in conjuntion with warp_affine().

projection_from_Rt(rmat: torch.Tensor, tvec: torch.Tensor) → torch.Tensor[source]

Compute the projection matrix from Rotation and translation.

Warning

This API signature it is experimental and might suffer some changes in the future.

Concatenates the batch of rotations and translations such that \(P = [R | t]\).

Parameters
  • rmat (torch.Tensor) – the rotation matrix with shape \((*, 3, 3)\).

  • tvec (torch.Tensor) – the translation vector with shape \((*, 3, 1)\).

Returns

the projection matrix with shape \((*, 3, 4)\).

Return type

torch.Tensor

get_tps_transform(points_src: torch.Tensor, points_dst: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Compute the TPS transform parameters that warp source points to target points.

The input to this function is a tensor of \((x, y)\) source points \((B, N, 2)\) and a corresponding tensor of target \((x, y)\) points \((B, N, 2)\).

Parameters
  • points_src (torch.Tensor) – batch of source points \((B, N, 2)\) as \((x, y)\) coordinate vectors.

  • points_dst (torch.Tensor) – batch of target points \((B, N, 2)\) as \((x, y)\) coordinate vectors.

Returns

\((B, N, 2)\) tensor of kernel weights and \((B, 3, 2)\)

tensor of affine weights. The last dimension contains the x-transform and y-transform weights as seperate columns.

Return type

Tuple[torch.Tensor, torch.Tensor]

Example

>>> points_src = torch.rand(1, 5, 2)
>>> points_dst = torch.rand(1, 5, 2)
>>> kernel_weights, affine_weights = get_tps_transform(points_src, points_dst)

Note

This function is often used in conjuntion with warp_points_tps(), warp_image_tps().

normalize_homography(dst_pix_trans_src_pix: torch.Tensor, dsize_src: Tuple[int, int], dsize_dst: Tuple[int, int]) → torch.Tensor[source]

Normalize a given homography in pixels to [-1, 1].

Parameters
  • dst_pix_trans_src_pix (torch.Tensor) – homography/ies from source to destination to be normalized. \((B, 3, 3)\)

  • dsize_src (tuple) – size of the source image (height, width).

  • dsize_dst (tuple) – size of the destination image (height, width).

Returns

the normalized homography of shape \((B, 3, 3)\).

Return type

torch.Tensor

denormalize_homography(dst_pix_trans_src_pix: torch.Tensor, dsize_src: Tuple[int, int], dsize_dst: Tuple[int, int]) → torch.Tensor[source]

De-normalize a given homography in pixels from [-1, 1] to actual height and width.

Parameters
  • dst_pix_trans_src_pix (torch.Tensor) – homography/ies from source to destination to be denormalized. \((B, 3, 3)\)

  • dsize_src (tuple) – size of the source image (height, width).

  • dsize_dst (tuple) – size of the destination image (height, width).

Returns

the denormalized homography of shape \((B, 3, 3)\).

Return type

torch.Tensor

Crop operators

crop_by_boxes(tensor: torch.Tensor, src_box: torch.Tensor, dst_box: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Perform crop transform on 2D images (4D tensor) given two bounding boxes.

Given an input tensor, this function selected the interested areas by the provided bounding boxes (src_box). Then the selected areas would be fitted into the targeted bounding boxes (dst_box) by a perspective transformation. So far, the ragged tensor is not supported by PyTorch right now. This function hereby requires the bounding boxes in a batch must be rectangles with same width and height.

Parameters
  • tensor (torch.Tensor) – the 2D image tensor with shape (B, C, H, W).

  • src_box (torch.Tensor) – a tensor with shape (B, 4, 2) containing the coordinates of the bounding boxes to be extracted. The tensor must have the shape of Bx4x2, where each box is defined in the clockwise order: top-left, top-right, bottom-right and bottom-left. The coordinates must be in x, y order.

  • dst_box (torch.Tensor) – a tensor with shape (B, 4, 2) containing the coordinates of the bounding boxes to be placed. The tensor must have the shape of Bx4x2, where each box is defined in the clockwise order: top-left, top-right, bottom-right and bottom-left. The coordinates must be in x, y order.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – mode for grid_generation. Default: None.

Returns

the output tensor with patches.

Return type

torch.Tensor

Examples

>>> input = torch.arange(16, dtype=torch.float32).reshape((1, 1, 4, 4))
>>> src_box = torch.tensor([[
...     [1., 1.],
...     [2., 1.],
...     [2., 2.],
...     [1., 2.],
... ]])  # 1x4x2
>>> dst_box = torch.tensor([[
...     [0., 0.],
...     [1., 0.],
...     [1., 1.],
...     [0., 1.],
... ]])  # 1x4x2
>>> crop_by_boxes(input, src_box, dst_box, align_corners=True)
tensor([[[[ 5.0000,  6.0000],
          [ 9.0000, 10.0000]]]])

Note

If the src_box is smaller than dst_box, the following error will be thrown. RuntimeError: solve_cpu: For batch 0: U(2,2) is zero, singular U.

center_crop(tensor: torch.Tensor, size: Tuple[int, int], mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Crop the 2D images (4D tensor) from the center.

Parameters
  • tensor (torch.Tensor) – the 2D image tensor with shape (B, C, H, W).

  • size (Tuple[int, int]) – a tuple with the expected height and width of the output patch.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – mode for grid_generation. Default: None.

Returns

the output tensor with patches.

Return type

torch.Tensor

Examples

>>> input = torch.tensor([[[
...     [1., 2., 3., 4.],
...     [5., 6., 7., 8.],
...     [9., 10., 11., 12.],
...     [13., 14., 15., 16.],
...  ]]])
>>> center_crop(input, (2, 4), mode='nearest', align_corners=True)
tensor([[[[ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.]]]])
crop_and_resize(tensor: torch.Tensor, boxes: torch.Tensor, size: Tuple[int, int], mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Extract crops from 2D images (4D tensor) and resize given a bounding box.

Parameters
  • tensor (torch.Tensor) – the 2D image tensor with shape (B, C, H, W).

  • boxes (torch.Tensor) – a tensor containing the coordinates of the bounding boxes to be extracted. The tensor must have the shape of Bx4x2, where each box is defined in the following (clockwise) order: top-left, top-right, bottom-right and bottom-left. The coordinates must be in the x, y order. The coordinates would compose a rectangle with a shape of (N1, N2).

  • size (Tuple[int, int]) – a tuple with the height and width that will be used to resize the extracted patches.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – mode for grid_generation. Default: None.

Returns

tensor containing the patches with shape BxCxN1xN2.

Return type

torch.Tensor

Example

>>> input = torch.tensor([[[
...     [1., 2., 3., 4.],
...     [5., 6., 7., 8.],
...     [9., 10., 11., 12.],
...     [13., 14., 15., 16.],
... ]]])
>>> boxes = torch.tensor([[
...     [1., 1.],
...     [2., 1.],
...     [2., 2.],
...     [1., 2.],
... ]])  # 1x4x2
>>> crop_and_resize(input, boxes, (2, 2), mode='nearest', align_corners=True)
tensor([[[[ 6.,  7.],
          [10., 11.]]]])

Bounding Box

bbox_to_mask(boxes: torch.Tensor, width: int, height: int) → torch.Tensor[source]

Convert 2D bounding boxes to masks. Covered area is 1. and the remaining is 0.

Parameters
  • boxes (torch.Tensor) – a tensor containing the coordinates of the bounding boxes to be extracted. The tensor must have the shape of Bx4x2, where each box is defined in the following (clockwise) order: top-left, top-right, bottom-right and bottom-left. The coordinates must be in the x, y order.

  • width (int) – width of the masked image.

  • height (int) – height of the masked image.

Returns

the output mask tensor.

Return type

torch.Tensor

Note

It is currently non-differentiable.

Examples

>>> boxes = torch.tensor([[
...        [1., 1.],
...        [3., 1.],
...        [3., 2.],
...        [1., 2.],
...   ]])  # 1x4x2
>>> bbox_to_mask(boxes, 5, 5)
tensor([[[0., 0., 0., 0., 0.],
         [0., 1., 1., 1., 0.],
         [0., 1., 1., 1., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])
infer_box_shape(boxes: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]

Auto-infer the output sizes for the given 2D bounding boxes.

Parameters

boxes (torch.Tensor) – a tensor containing the coordinates of the bounding boxes to be extracted. The tensor must have the shape of Bx4x2, where each box is defined in the following (clockwise) order: top-left, top-right, bottom-right, bottom-left. The coordinates must be in the x, y order.

Returns

  • Bounding box heights, shape of \((B,)\).

  • Boundingbox widths, shape of \((B,)\).

Return type

Tuple[torch.Tensor, torch.Tensor]

Example

>>> boxes = torch.tensor([[
...     [1., 1.],
...     [2., 1.],
...     [2., 2.],
...     [1., 2.],
... ], [
...     [1., 1.],
...     [3., 1.],
...     [3., 2.],
...     [1., 2.],
... ]])  # 2x4x2
>>> infer_box_shape(boxes)
(tensor([2., 2.]), tensor([2., 3.]))
validate_bboxes(boxes: torch.Tensor) → bool[source]

Validate if a 2D bounding box usable or not.

This function checks if the boxes are rectangular or not.

Parameters

boxes (torch.Tensor) – a tensor containing the coordinates of the bounding boxes to be extracted. The tensor must have the shape of Bx4x2, where each box is defined in the following (clockwise) order: top-left, top-right, bottom-right, bottom-left. The coordinates must be in the x, y order.

Image 2d transforms

affine(tensor: torch.Tensor, matrix: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Apply an affine transformation to the image.

Parameters
  • tensor (torch.Tensor) – The image tensor to be warped in shapes of \((H, W)\), \((D, H, W)\) and \((B, C, H, W)\).

  • matrix (torch.Tensor) – The 2x3 affine transformation matrix.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The warped image with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 2, 3, 5)
>>> aff = torch.eye(2, 3)[None]
>>> out = affine(img, aff)
>>> print(out.shape)
torch.Size([1, 2, 3, 5])
rotate(tensor: torch.Tensor, angle: torch.Tensor, center: Union[None, torch.Tensor] = None, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Rotate the tensor anti-clockwise about the centre.

Parameters
  • tensor (torch.Tensor) – The image tensor to be warped in shapes of \((B, C, H, W)\).

  • angle (torch.Tensor) – The angle through which to rotate. The tensor must have a shape of (B), where B is batch size.

  • center (torch.Tensor) – The center through which to rotate. The tensor must have a shape of (B, 2), where B is batch size and last dimension contains cx and cy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The rotated tensor with shape as input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> angle = torch.tensor([90.])
>>> out = rotate(img, angle)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
translate(tensor: torch.Tensor, translation: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Translate the tensor in pixel units.

Parameters
  • tensor (torch.Tensor) – The image tensor to be warped in shapes of \((B, C, H, W)\).

  • translation (torch.Tensor) – tensor containing the amount of pixels to translate in the x and y direction. The tensor must have a shape of (B, 2), where B is batch size, last dimension contains dx dy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The translated tensor with shape as input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> translation = torch.tensor([[1., 0.]])
>>> out = translate(img, translation)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
scale(tensor: torch.Tensor, scale_factor: torch.Tensor, center: Union[None, torch.Tensor] = None, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None) → torch.Tensor[source]

Scale the tensor by a factor.

Parameters
  • tensor (torch.Tensor) – The image tensor to be warped in shapes of \((B, C, H, W)\).

  • scale_factor (torch.Tensor) – The scale factor apply. The tensor must have a shape of (B) or (B, 2), where B is batch size. If (B), isotropic scaling will perform. If (B, 2), x-y-direction specific scaling will perform.

  • center (torch.Tensor) – The center through which to scale. The tensor must have a shape of (B, 2), where B is batch size and last dimension contains cx and cy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The scaled tensor with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> scale_factor = torch.tensor([[2., 2.]])
>>> out = scale(img, scale_factor)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
shear(tensor: torch.Tensor, shear: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: bool = False) → torch.Tensor[source]

Shear the tensor.

Parameters
  • tensor (torch.Tensor) – The image tensor to be skewed with shape of \((B, C, H, W)\).

  • shear (torch.Tensor) – tensor containing the angle to shear in the x and y direction. The tensor must have a shape of (B, 2), where B is batch size, last dimension contains shx shy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The skewed tensor with shape same as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> shear_factor = torch.tensor([[0.5, 0.0]])
>>> out = shear(img, shear_factor)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
hflip(input: torch.Tensor) → torch.Tensor[source]

Horizontally flip a tensor image or a batch of tensor images. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Returns

The horizontally flipped image tensor

Return type

torch.Tensor

vflip(input: torch.Tensor) → torch.Tensor[source]

Vertically flip a tensor image or a batch of tensor images. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Returns

The vertically flipped image tensor

Return type

torch.Tensor

rot180(input: torch.Tensor) → torch.Tensor[source]

Rotate a tensor image or a batch of tensor images 180 degrees. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Returns

The rotated image tensor

Return type

torch.Tensor

resize(input: torch.Tensor, size: Union[int, Tuple[int, int]], interpolation: str = 'bilinear', align_corners: Optional[bool] = None, side: str = 'short', antialias: bool = False) → torch.Tensor[source]

Resize the input torch.Tensor to the given size.

Parameters
  • tensor (torch.Tensor) – The image tensor to be skewed with shape of \((..., H, W)\). means there can be any number of dimensions.

  • size (int, tuple(int, int)) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)

  • interpolation (str) – algorithm used for upsampling: ‘nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’. Default: ‘bilinear’.

  • align_corners (bool) – interpolation flag. Default: None. See https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail

  • side (str) – Corresponding side if size is an integer. Can be one of "short", "long", "vert", or "horz". Defaults to "short".

  • antialias (bool) – if True, then image will be filtered with Gaussian before downscaling. No effect for upscaling. Default: False

Returns

The resized tensor with the shape as the specified size.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> out = resize(img, (6, 8))
>>> print(out.shape)
torch.Size([1, 3, 6, 8])
rescale(input: torch.Tensor, factor: Union[float, Tuple[float, float]], interpolation: str = 'bilinear', align_corners: Optional[bool] = None, antialias: bool = False) → torch.Tensor[source]

Rescale the input torch.Tensor with the given factor.

Parameters
  • input (torch.Tensor) – The image tensor to be scale with shape of \((B, C, H, W)\).

  • interpolation (str) – algorithm used for upsampling: ‘nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’. Default: ‘bilinear’.

  • align_corners (bool) – interpolation flag. Default: None. See https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail

  • side (str) – Corresponding side if size is an integer. Can be one of "short", "long", "vert", or "horz". Defaults to "short".

  • antialias (bool) – if True, then image will be filtered with Gaussian before downscaling. No effect for upscaling. Default: False

Returns

The rescaled tensor with the shape as the specified size.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> out = rescale(img, (2, 3))
>>> print(out.shape)
torch.Size([1, 3, 8, 12])
elastic_transform2d(image: torch.Tensor, noise: torch.Tensor, kernel_size: Tuple[int, int] = (63, 63), sigma: Tuple[float, float] = (32.0, 32.0), alpha: Tuple[float, float] = (1.0, 1.0), align_corners: bool = False, mode: str = 'bilinear') → torch.Tensor[source]

Applies elastic transform of images as described in [SSP03].

Parameters
  • image (torch.Tensor) – Input image to be transformed with shape \((B, C, H, W)\).

  • noise (torch.Tensor) – Noise image used to spatially transform the input image. Same resolution as the input image with shape \((B, 2, H, W)\). The coordinates order it is expected to be in x-y.

  • kernel_size (Tuple[int, int]) – the size of the Gaussian kernel. Default: (63, 63).

  • sigma (Tuple[float, float]) – The standard deviation of the Gaussian in the y and x directions, respecitvely. Larger sigma results in smaller pixel displacements. Default: (32, 32).

  • alpha (Tuple[float, float]) – The scaling factor that controls the intensity of the deformation in the y and x directions, respectively. Default: 1.

  • align_corners (bool) – Interpolation flag used by grid_sample. Default: False.

  • mode (str) – Interpolation mode used by grid_sample. Either ‘bilinear’ or ‘nearest’. Default: ‘bilinear’.

Returns

the elastically transformed input image with shape \((B,C,H,W)\).

Return type

torch.Tensor

Example

>>> image = torch.rand(1, 3, 5, 5)
>>> noise = torch.rand(1, 2, 5, 5, requires_grad=True)
>>> image_hat = elastic_transform2d(image, noise, (3, 3))
>>> image_hat.mean().backward()
>>> image = torch.rand(1, 3, 5, 5)
>>> noise = torch.rand(1, 2, 5, 5)
>>> sigma = torch.tensor([4., 4.], requires_grad=True)
>>> image_hat = elastic_transform2d(image, noise, (3, 3), sigma)
>>> image_hat.mean().backward()
>>> image = torch.rand(1, 3, 5, 5)
>>> noise = torch.rand(1, 2, 5, 5)
>>> alpha = torch.tensor([16., 32.], requires_grad=True)
>>> image_hat = elastic_transform2d(image, noise, (3, 3), alpha=alpha)
>>> image_hat.mean().backward()
pyrdown(input: torch.Tensor, border_type: str = 'reflect', align_corners: bool = False) → torch.Tensor[source]

Blurs a tensor and downsamples it.

Parameters
  • input (tensor) – the tensor to be downsampled.

  • border_type (str) – the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'.

  • align_corners (bool) – interpolation flag. Default: False. See

  • https – //pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail.

Returns

the downsampled tensor.

Return type

torch.Tensor

Examples

>>> input = torch.arange(16, dtype=torch.float32).reshape(1, 1, 4, 4)
>>> pyrdown(input, align_corners=True)
tensor([[[[ 3.7500,  5.2500],
          [ 9.7500, 11.2500]]]])
pyrup(input: torch.Tensor, border_type: str = 'reflect', align_corners: bool = False) → torch.Tensor[source]

Upsamples a tensor and then blurs it.

Parameters
  • input (tensor) – the tensor to be downsampled.

  • border_type (str) – the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'.

  • align_corners (bool) – interpolation flag. Default: False. See

  • https – //pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail.

Returns

the downsampled tensor.

Return type

torch.Tensor

Examples

>>> input = torch.arange(4, dtype=torch.float32).reshape(1, 1, 2, 2)
>>> pyrup(input, align_corners=True)
tensor([[[[0.7500, 0.8750, 1.1250, 1.2500],
          [1.0000, 1.1250, 1.3750, 1.5000],
          [1.5000, 1.6250, 1.8750, 2.0000],
          [1.7500, 1.8750, 2.1250, 2.2500]]]])
build_pyramid(input: torch.Tensor, max_level: int, border_type: str = 'reflect', align_corners: bool = False) → List[torch.Tensor][source]

Constructs the Gaussian pyramid for an image.

The function constructs a vector of images and builds the Gaussian pyramid by recursively applying pyrDown to the previously built pyramid layers.

Parameters
  • input (torch.Tensor) – the tensor to be used to construct the pyramid.

  • max_level (int) – 0-based index of the last (the smallest) pyramid layer. It must be non-negative.

  • border_type (str) – the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'.

  • align_corners (bool) – interpolation flag. Default: False. See

Shape:
  • Input: \((B, C, H, W)\)

  • Output \([(B, C, H, W), (B, C, H/2, W/2), ...]\)

Module

class Rotate(angle: torch.Tensor, center: Union[None, torch.Tensor] = None, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None)[source]

Rotate the tensor anti-clockwise about the centre.

Parameters
  • angle (torch.Tensor) – The angle through which to rotate. The tensor must have a shape of (B), where B is batch size.

  • center (torch.Tensor) – The center through which to rotate. The tensor must have a shape of (B, 2), where B is batch size and last dimension contains cx and cy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The rotated tensor with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> angle = torch.tensor([90.])
>>> out = Rotate(angle)(img)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
class Translate(translation: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None)[source]

Translate the tensor in pixel units.

Parameters
  • translation (torch.Tensor) – tensor containing the amount of pixels to translate in the x and y direction. The tensor must have a shape of (B, 2), where B is batch size, last dimension contains dx dy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The translated tensor with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> translation = torch.tensor([[1., 0.]])
>>> out = Translate(translation)(img)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
class Scale(scale_factor: torch.Tensor, center: Union[None, torch.Tensor] = None, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None)[source]

Scale the tensor by a factor.

Parameters
  • scale_factor (torch.Tensor) – The scale factor apply. The tensor must have a shape of (B) or (B, 2), where B is batch size. If (B), isotropic scaling will perform. If (B, 2), x-y-direction specific scaling will perform.

  • center (torch.Tensor) – The center through which to scale. The tensor must have a shape of (B, 2), where B is batch size and last dimension contains cx and cy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Returns

The scaled tensor with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> scale_factor = torch.tensor([[2., 2.]])
>>> out = Scale(scale_factor)(img)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
class Shear(shear: torch.Tensor, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: bool = False)[source]

Shear the tensor.

Parameters
  • tensor (torch.Tensor) – The image tensor to be skewed.

  • shear (torch.Tensor) – tensor containing the angle to shear in the x and y direction. The tensor must have a shape of (B, 2), where B is batch size, last dimension contains shx shy.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: False.

Returns

The skewed tensor with the same shape as the input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> shear_factor = torch.tensor([[0.5, 0.0]])
>>> out = Shear(shear_factor)(img)
>>> print(out.shape)
torch.Size([1, 3, 4, 4])
class PyrDown(border_type: str = 'reflect', align_corners: bool = False)[source]

Blurs a tensor and downsamples it.

Parameters
  • border_type (str) – the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'.

  • align_corners (bool) – interpolation flag. Default: False. See

  • https – //pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail

Returns

the downsampled tensor.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, C, H / 2, W / 2)\)

Examples

>>> input = torch.rand(1, 2, 4, 4)
>>> output = PyrDown()(input)  # 1x2x2x2
class PyrUp(border_type: str = 'reflect', align_corners: bool = False)[source]

Upsamples a tensor and then blurs it.

Parameters
  • borde_type (str) – the padding mode to be applied before convolving. The expected modes are: 'constant', 'reflect', 'replicate' or 'circular'. Default: 'reflect'.

  • align_corners (bool) – interpolation flag. Default: False. See

  • https – //pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail

Returns

the upsampled tensor.

Return type

torch.Tensor

Shape:
  • Input: \((B, C, H, W)\)

  • Output: \((B, C, H * 2, W * 2)\)

Examples

>>> input = torch.rand(1, 2, 4, 4)
>>> output = PyrUp()(input)  # 1x2x8x8
class ScalePyramid(n_levels: int = 3, init_sigma: float = 1.6, min_size: int = 15, double_image: bool = False)[source]

Creates an scale pyramid of image, usually used for local feature detection. Images are consequently smoothed with Gaussian blur and downscaled. :param n_levels: number of the levels in octave. :type n_levels: int :param init_sigma: initial blur level. :type init_sigma: float :param min_size: the minimum size of the octave in pixels. Default is 5 :type min_size: int :param double_image: add 2x upscaled image as 1st level of pyramid. OpenCV SIFT does this. Default is False :type double_image: bool

Returns

1st output: images 2nd output: sigmas (coefficients for scale conversion) 3rd output: pixelDists (coefficients for coordinate conversion)

Return type

Tuple(List(Tensors), List(Tensors), List(Tensors))

Shape:
  • Input: \((B, C, H, W)\)

  • Output 1st: \([(B, C, NL, H, W), (B, C, NL, H/2, W/2), ...]\)

  • Output 2nd: \([(B, NL), (B, NL), (B, NL), ...]\)

  • Output 3rd: \([(B, NL), (B, NL), (B, NL), ...]\)

Examples::
>>> input = torch.rand(2, 4, 100, 100)
>>> sp, sigmas, pds = ScalePyramid(3, 15)(input)
class Hflip[source]

Horizontally flip a tensor image or a batch of tensor images. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Returns

The horizontally flipped image tensor

Return type

torch.Tensor

Examples

>>> hflip = Hflip()
>>> input = torch.tensor([[[
...    [0., 0., 0.],
...    [0., 0., 0.],
...    [0., 1., 1.]
... ]]])
>>> hflip(input)
tensor([[[[0., 0., 0.],
          [0., 0., 0.],
          [1., 1., 0.]]]])
class Vflip[source]

Vertically flip a tensor image or a batch of tensor images. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Returns

The vertically flipped image tensor

Return type

torch.Tensor

Examples

>>> vflip = Vflip()
>>> input = torch.tensor([[[
...    [0., 0., 0.],
...    [0., 0., 0.],
...    [0., 1., 1.]
... ]]])
>>> vflip(input)
tensor([[[[0., 1., 1.],
          [0., 0., 0.],
          [0., 0., 0.]]]])
class Rot180[source]

Rotate a tensor image or a batch of tensor images 180 degrees. Input must be a tensor of shape (C, H, W) or a batch of tensors \((*, C, H, W)\).

Parameters

input (torch.Tensor) – input tensor

Examples

>>> rot180 = Rot180()
>>> input = torch.tensor([[[
...    [0., 0., 0.],
...    [0., 0., 0.],
...    [0., 1., 1.]
... ]]])
>>> rot180(input)
tensor([[[[1., 1., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]]])
class Resize(size: Union[int, Tuple[int, int]], interpolation: str = 'bilinear', align_corners: Optional[bool] = None, side: str = 'short', antialias: bool = False)[source]

Resize the input torch.Tensor to the given size.

Parameters
  • size (int, tuple(int, int)) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)

  • interpolation (str) – algorithm used for upsampling: ‘nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’. Default: ‘bilinear’.

  • align_corners (bool) – interpolation flag. Default: None. See https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.interpolate for detail

  • side (str) – Corresponding side if size is an integer. Can be one of "short", "long", "vert", or "horz". Defaults to "short".

  • antialias (bool) – if True, then image will be filtered with Gaussian before downscaling. No effect for upscaling. Default: False

Returns

The resized tensor with the shape of the given size.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> out = Resize((6, 8))(img)
>>> print(out.shape)
torch.Size([1, 3, 6, 8])
class Rescale(factor: Union[float, Tuple[float, float]], interpolation: str = 'bilinear', align_corners: Optional[bool] = None, antialias: bool = False)[source]

Rescale the input torch.Tensor with the given factor.

Parameters
  • factor (float, tuple(float, float)) – Desired scaling factor in each direction. If scalar, the value is used for both the x- and y-direction.

  • interpolation (str) – Algorithm used for upsampling. Can be one of "nearest", "linear", "bilinear", "bicubic", "trilinear", or "area". Default: "bilinear".

  • align_corners (bool) – Interpolation flag. Default: None. See interpolate() for details.

  • antialias (bool) – if True, then image will be filtered with Gaussian before downscaling. No effect for upscaling. Default: False

Returns

The rescaled tensor with the shape according to the given factor.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 3, 4, 4)
>>> out = Rescale((2, 3))(img)
>>> print(out.shape)
torch.Size([1, 3, 8, 12])
class Affine(angle: Optional[torch.Tensor] = None, translation: Optional[torch.Tensor] = None, scale_factor: Optional[torch.Tensor] = None, shear: Optional[torch.Tensor] = None, center: Optional[torch.Tensor] = None, mode: str = 'bilinear', padding_mode: str = 'zeros', align_corners: Optional[bool] = None)[source]

Apply multiple elementary affine transforms simultaneously.

Parameters
  • angle (torch.Tensor, optional) – Angle in degrees for counter-clockwise rotation around the center. The tensor must have a shape of (B), where B is the batch size.

  • translation (torch.Tensor, optional) – Amount of pixels for translation in x- and y-direction. The tensor must have a shape of (B, 2), where B is the batch size and the last dimension contains dx and dy.

  • scale_factor (torch.Tensor, optional) – Factor for scaling. The tensor must have a shape of (B), where B is the batch size.

  • shear (torch.Tensor, optional) – Angles in degrees for shearing in x- and y-direction around the center. The tensor must have a shape of (B, 2), where B is the batch size and the last dimension contains sx and sy.

  • center (torch.Tensor, optional) – Transformation center in pixels. The tensor must have a shape of (B, 2), where B is the batch size and the last dimension contains cx and cy. Defaults to the center of image to be transformed.

  • mode (str) – interpolation mode to calculate output values ‘bilinear’ | ‘nearest’. Default: ‘bilinear’.

  • padding_mode (str) – padding mode for outside grid values ‘zeros’ | ‘border’ | ‘reflection’. Default: ‘zeros’.

  • align_corners (bool, optional) – interpolation flag. Default: None.

Raises

RuntimeError – If not one of angle, translation, scale_factor, or shear is set.

Returns

The transformed tensor with same shape as input.

Return type

torch.Tensor

Example

>>> img = torch.rand(1, 2, 3, 5)
>>> angle = 90. * torch.rand(1)
>>> out = Affine(angle)(img)
>>> print(out.shape)
torch.Size([1, 2, 3, 5])