kornia.geometry.boxes

Module with useful functionalities for 2D and 3D bounding boxes manipulation.

class kornia.geometry.boxes.Boxes(boxes, raise_if_not_floating_point=True, mode='vertices_plus')[source]

2D boxes containing N or BxN boxes.

Parameters:
  • boxes (Tensor | list[Tensor]) – 2D boxes, shape of \((N, 4, 2)\), \((B, N, 4, 2)\) or a list of \((N, 4, 2)\). See below for more details.

  • raise_if_not_floating_point (bool, optional) – flag to control floating point casting behaviour when boxes is not a floating point tensor. True to raise an error when boxes isn’t a floating point tensor, False to cast to float. Default: True

  • mode (str, optional) – the box format of the input boxes. Default: "vertices_plus"

Note

2D boxes format is defined as a floating data type tensor of shape Nx4x2 or BxNx4x2 where each box is a quadrilateral defined by it’s 4 vertices coordinates (A, B, C, D). Coordinates must be in x, y order. The height and width of a box is defined as width = xmax - xmin + 1 and height = ymax - ymin + 1. Examples of quadrilaterals are rectangles, rhombus and trapezoids.

clamp(topleft=None, botright=None, inplace=False)[source]

Clamp every box vertex inside per-image coordinate limits.

Coordinates below topleft are raised to the lower bound and coordinates above botright are lowered to the upper bound. The implementation expects tensor bounds with one (x, y) pair per batch element.

Parameters:
  • topleft (Union[Tensor, tuple[int, int], None], optional) – Tensor of shape \((B, 2)\) containing the minimum x and y coordinate allowed for each batch item. Default: None

  • botright (Union[Tensor, tuple[int, int], None], optional) – Tensor of shape \((B, 2)\) containing the maximum x and y coordinate allowed for each batch item. Default: None

  • inplace (bool, optional) – If True, clamp this object in place. Otherwise, return a new Boxes object with clamped data. Default: False

Return type:

Boxes

Returns:

Boxes whose vertex coordinates are restricted to the provided bounds.

clone()[source]

Create an independent copy of the box container.

Return type:

Boxes

Returns:

New Boxes object with cloned tensor storage. Metadata such as the current mode, original list lengths, and batched flag is preserved.

compute_area()[source]

Return \((B, N)\).

Return type:

Tensor

property data: Tensor

Return the raw quadrilateral coordinate tensor.

Returns:

Tensor storing four vertices per box in (x, y) order. The common shapes are \((N, 4, 2)\) for unbatched boxes and \((B, N, 4, 2)\) for batched boxes, where \(B\) is batch size and \(N\) is the number of boxes.

property device: device

Returns boxes device.

property dtype: dtype

Returns boxes dtype.

filter_boxes_by_area(min_area=None, max_area=None, inplace=False)[source]

Remove boxes whose polygon area is outside the requested range.

The box area is computed from its four vertices. Boxes smaller than min_area or larger than max_area are not dropped from the tensor; their coordinates are replaced with zeros so the original batch and box dimensions stay unchanged.

Parameters:
  • min_area (Optional[float], optional) – Optional lower inclusive area threshold. Boxes with area below this value are zeroed. Default: None

  • max_area (Optional[float], optional) – Optional upper inclusive area threshold. Boxes with area above this value are zeroed. Default: None

  • inplace (bool, optional) – If True, update this object in place. Otherwise, return a filtered clone. Default: False

Return type:

Boxes

Returns:

Boxes with the same shape as the input container and out-of-range boxes replaced by zero coordinates.

classmethod from_tensor(boxes, mode='xyxy', validate_boxes=True)[source]

Create Boxes from boxes stored in another format.

Parameters:
  • boxes (Tensor | list[Tensor]) – 2D boxes, shape of \((N, 4)\), \((B, N, 4)\), \((N, 4, 2)\) or \((B, N, 4, 2)\).

  • mode (str, optional) – The format in which the boxes are provided. Default: "xyxy"

  • validate_boxes (bool, optional) –

    Check if boxes are valid. Default is True. Default: True

    • ’xyxy’: boxes are assumed to be in the format xmin, ymin, xmax, ymax where width = xmax - xmin and height = ymax - ymin. With shape \((N, 4)\), \((B, N, 4)\).

    • ’xyxy_plus’: similar to ‘xyxy’ mode but where box width and length are defined as width = xmax - xmin + 1 and height = ymax - ymin + 1. With shape \((N, 4)\), \((B, N, 4)\).

    • ’xywh’: boxes are assumed to be in the format xmin, ymin, width, height where width = xmax - xmin and height = ymax - ymin. With shape \((N, 4)\), \((B, N, 4)\).

    • ’vertices’: boxes are defined by their vertices points in the following clockwise order: top-left, top-right, bottom-right, bottom-left. Vertices coordinates are in (x,y) order. Finally, box width and height are defined as width = xmax - xmin and height = ymax - ymin. With shape \((N, 4, 2)\) or \((B, N, 4, 2)\).

    • ’vertices_plus’: similar to ‘vertices’ mode but where box width and length are defined as width = xmax - xmin + 1 and height = ymax - ymin + 1. ymin + 1``. With shape \((N, 4, 2)\) or \((B, N, 4, 2)\).

  • validate_boxes – check if boxes are valid rectangles or not. Valid rectangles are those with width and height >= 1 (>= 2 when mode ends with ‘_plus’ suffix).

Return type:

Boxes

Returns:

Boxes class containing the original boxes in the format specified by mode.

Examples

>>> boxes_xyxy = torch.as_tensor([[0, 3, 1, 4], [5, 1, 8, 4]])
>>> boxes = Boxes.from_tensor(boxes_xyxy, mode='xyxy')
>>> boxes.data  # (2, 4, 2)
tensor([[[0., 3.],
         [0., 3.],
         [0., 3.],
         [0., 3.]],

        [[5., 1.],
         [7., 1.],
         [7., 3.],
         [5., 3.]]])
get_boxes_shape()[source]

Compute boxes heights and widths.

Return type:

tuple[Tensor, Tensor]

Returns:

  • Boxes heights, shape of \((N,)\) or \((B,N)\).

  • Boxes widths, shape of \((N,)\) or \((B,N)\).

Example

>>> boxes_xyxy = torch.tensor([[[1,1,2,2],[1,1,3,2]]])
>>> boxes = Boxes.from_tensor(boxes_xyxy)
>>> boxes.get_boxes_shape()
(tensor([[1., 1.]]), tensor([[1., 2.]]))
index_put(indices, values, inplace=False)[source]

Write box coordinates at selected tensor indices.

This mirrors torch.Tensor.index_put_() for the internal quadrilateral tensor. It is useful when a subset of boxes in a batch must be replaced while keeping the Boxes wrapper and metadata.

Parameters:
  • indices (tuple[Tensor, ...] | list[Tensor]) – Index tuple or list accepted by Tensor.index_put_. The indices address entries in the stored tensor, commonly shaped \((B, N, 4, 2)\) or \((N, 4, 2)\).

  • values (Tensor | Boxes) – Replacement coordinates. If a Boxes object is passed, its data tensor is used.

  • inplace (bool, optional) – If True, update this object and return self. If False, clone the current data first and return a new Boxes instance. Default: False

Return type:

Boxes

Returns:

Boxes containing the updated coordinates.

merge(boxes, inplace=False)[source]

Merge boxes.

Say, current instance holds \((B, N, 4, 2)\) and the incoming boxes holds \((B, M, 4, 2)\), the merge results in \((B, N + M, 4, 2)\).

Parameters:
  • boxes (Boxes) – 2D boxes.

  • inplace (bool, optional) – do transform in-place and return self. Default: False

Return type:

Boxes

property mode: str

Return the box format remembered by this container.

Returns:

Mode string used as the default by to_tensor(), such as "xyxy", "xywh", "vertices", or their "_plus" variants.

pad(padding_size)[source]

Pad a bounding box.

Parameters:

padding_size (Tensor) – (B, 4)

Return type:

Boxes

property shape: tuple[int, ...] | Size

Return the tensor shape used to store the boxes.

Returns:

Shape of data. For unbatched boxes this is usually \((N, 4, 2)\), where \(N\) is the number of boxes, 4 is the number of corner vertices, and 2 stores (x, y). For batched boxes the shape is usually \((B, N, 4, 2)\), where \(B\) is the batch size.

to(device=None, dtype=None)[source]

Like torch.nn.Module.to() method.

Return type:

Boxes

to_mask(height, width)[source]

Convert 2D boxes to masks. Covered area is 1 and the remaining is 0.

Parameters:
  • height (int) – height of the masked image/images.

  • width (int) – width of the masked image/images.

Return type:

Tensor

Returns:

the output mask tensor, shape of \((N, width, height)\) or \((B,N, width, height)\) and dtype of Boxes.dtype() (it can be any floating point dtype).

Note

It is currently non-differentiable.

Examples

>>> boxes = Boxes(torch.tensor([[  # Equivalent to boxes = Boxes.from_tensor([[1,1,4,3]])
...        [1., 1.],
...        [4., 1.],
...        [4., 3.],
...        [1., 3.],
...   ]]))  # 1x4x2
>>> boxes.to_mask(5, 5)
tensor([[[0., 0., 0., 0., 0.],
         [0., 1., 1., 1., 1.],
         [0., 1., 1., 1., 1.],
         [0., 1., 1., 1., 1.],
         [0., 0., 0., 0., 0.]]])
to_tensor(mode=None, as_padded_sequence=False)[source]

Cast Boxes to a tensor.

mode controls which 2D boxes format should be use to represent boxes in the tensor.

Parameters:
  • mode (Optional[str], optional) –

    the output box format. It could be: Default: None

    • ’xyxy’: boxes are defined as xmin, ymin, xmax, ymax where width = xmax - xmin and height = ymax - ymin.

    • ’xyxy_plus’: similar to ‘xyxy’ mode but where box width and length are defined as width = xmax - xmin + 1 and height = ymax - ymin + 1.

    • ’xywh’: boxes are defined as xmin, ymin, width, height where width = xmax - xmin and height = ymax - ymin.

    • ’vertices’: boxes are defined by their vertices points in the following clockwise order: top-left, top-right, bottom-right, bottom-left. Vertices coordinates are in (x,y) order. Finally, box width and height are defined as width = xmax - xmin and height = ymax - ymin.

    • ’vertices_plus’: similar to ‘vertices’ mode but where box width and length are defined as width = xmax - xmin + 1 and height = ymax - ymin + 1. ymin + 1``.

  • as_padded_sequence (bool, optional) – whether to keep the pads for a list of boxes. This parameter is only valid if the boxes are from a box list whilst from_tensor. Default: False

Returns:

  • ‘vertices’ or ‘verticies_plus’: \((N, 4, 2)\) or \((B, N, 4, 2)\).

  • Any other value: \((N, 4)\) or \((B, N, 4)\).

Return type:

Boxes tensor in the mode format. The shape depends with the mode value

Examples

>>> boxes_xyxy = torch.as_tensor([[0, 3, 1, 4], [5, 1, 8, 4]])
>>> boxes = Boxes.from_tensor(boxes_xyxy)
>>> assert (boxes_xyxy == boxes.to_tensor(mode='xyxy')).all()
transform_boxes(M, inplace=False)[source]

Apply a transformation matrix to the 2D boxes.

Parameters:
  • M (Tensor) – The transformation matrix to be applied, shape of \((3, 3)\) or \((B, 3, 3)\).

  • inplace (bool, optional) – do transform in-place and return self. Default: False

Return type:

Boxes

Returns:

The transformed boxes.

transform_boxes_(M)[source]

Inplace version of Boxes.transform_boxes().

Return type:

Boxes

translate(size, method='warp', inplace=False)[source]

Translate boxes by the provided size.

Parameters:
  • size (Tensor) – translate size for x, y direction, shape of \((B, 2)\).

  • method (str, optional) – “warp” or “fast”. Default: "warp"

  • inplace (bool, optional) – do transform in-place and return self. Default: False

Return type:

Boxes

Returns:

The transformed boxes.

trim(correspondence_preserve=False, inplace=False)[source]

Trim out zero padded boxes.

Given box arrangements of shape \((4, 4, Box)\):

Box

Box

Box

Box

0

0

Box

Box

0

Box

0

0

0

0

0

0

Nothing will change if correspondence_preserve is True. Only pure zero layers will be removed, resulting in shape \((4, 3, Box)\):

Box

Box

Box

Box

0

0

Box

Box

0

Box

0

0

Otherwise, you will get \((4, 2, Box)\):

Box

Box

Box

Box

0

Box

Box

Box

Return type:

Boxes

type(dtype)[source]

Cast the stored box coordinates to a new dtype.

Parameters:

dtype (dtype) – Target floating-point dtype for the coordinate tensor.

Return type:

Boxes

Returns:

self after converting data in place.

unpad(padding_size)[source]

Pad a bounding box.

Parameters:

padding_size (Tensor) – (B, 4)

Return type:

Boxes

class kornia.geometry.boxes.Boxes3D(boxes, raise_if_not_floating_point=True, mode='xyzxyz_plus')[source]

3D boxes containing N or BxN boxes.

Parameters:
  • boxes (Tensor) – 3D boxes, shape of \((N,8,3)\) or \((B,N,8,3)\). See below for more details.

  • raise_if_not_floating_point (bool, optional) – flag to control floating point casting behaviour when boxes is not a floating point tensor. True to raise an error when boxes isn’t a floating point tensor, False to cast to float. Default: True

Note

3D boxes format is defined as a floating data type tensor of shape Nx8x3 or BxNx8x3 where each box is a hexahedron defined by it’s 8 vertices coordinates. Coordinates must be in x, y, z order. The height, width and depth of a box is defined as width = xmax - xmin + 1, height = ymax - ymin + 1 and depth = zmax - zmin + 1. Examples of hexahedrons are cubes and rhombohedrons.

property data: Tensor

Return the raw 3D corner-coordinate tensor.

Returns:

Tensor containing eight 3D corner coordinates per box, usually shaped \((N, 8, 3)\) or \((B, N, 8, 3)\).

property device: device

Returns boxes device.

property dtype: dtype

Returns boxes dtype.

classmethod from_tensor(boxes, mode='xyzxyz', validate_boxes=True)[source]

Create Boxes3D from 3D boxes stored in another format.

Parameters:
  • boxes (Tensor) – 3D boxes, shape of \((N,6)\) or \((B,N,6)\).

  • mode (str, optional) –

    The format in which the 3D boxes are provided. Default: "xyzxyz"

    • ’xyzxyz’: boxes are assumed to be in the format xmin, ymin, zmin, xmax, ymax, zmax where width = xmax - xmin, height = ymax - ymin and depth = zmax - zmin.

    • ’xyzxyz_plus’: similar to ‘xyzxyz’ mode but where box width, length and depth are defined as width = xmax - xmin + 1, height = ymax - ymin + 1 and depth = zmax - zmin + 1.

    • ’xyzwhd’: boxes are assumed to be in the format xmin, ymin, zmin, width, height, depth where width = xmax - xmin, height = ymax - ymin and depth = zmax - zmin.

  • validate_boxes (bool, optional) – check if boxes are valid rectangles or not. Valid rectangles are those with width, height and depth >= 1 (>= 2 when mode ends with ‘_plus’ suffix). Default: True

Return type:

Boxes3D

Returns:

Boxes3D class containing the original boxes in the format specified by mode.

Examples

>>> boxes_xyzxyz = torch.as_tensor([[0, 3, 6, 1, 4, 8], [5, 1, 3, 8, 4, 9]])
>>> boxes = Boxes3D.from_tensor(boxes_xyzxyz, mode='xyzxyz')
>>> boxes.data  # (2, 8, 3)
tensor([[[0., 3., 6.],
         [0., 3., 6.],
         [0., 3., 6.],
         [0., 3., 6.],
         [0., 3., 7.],
         [0., 3., 7.],
         [0., 3., 7.],
         [0., 3., 7.]],

        [[5., 1., 3.],
         [7., 1., 3.],
         [7., 3., 3.],
         [5., 3., 3.],
         [5., 1., 8.],
         [7., 1., 8.],
         [7., 3., 8.],
         [5., 3., 8.]]])
get_boxes_shape()[source]

Compute boxes heights and widths.

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

  • Boxes depths, shape of \((N,)\) or \((B,N)\).

  • Boxes heights, shape of \((N,)\) or \((B,N)\).

  • Boxes widths, shape of \((N,)\) or \((B,N)\).

Example

>>> boxes_xyzxyz = torch.tensor([[ 0,  1,  2, 10, 21, 32], [3, 4, 5, 43, 54, 65]])
>>> boxes3d = Boxes3D.from_tensor(boxes_xyzxyz)
>>> boxes3d.get_boxes_shape()
(tensor([30., 60.]), tensor([20., 50.]), tensor([10., 40.]))
property mode: str

Return the 3D box format remembered by this container.

Returns:

Mode string describing how this container should be interpreted during tensor conversion.

property shape: tuple[int, ...] | Size

Return the tensor shape used to store 3D boxes.

Returns:

Shape of data. For unbatched boxes this is usually \((N, 8, 3)\), where \(N\) is the number of boxes, 8 is the number of cuboid corners, and 3 stores (x, y, z). For batched boxes the shape is usually \((B, N, 8, 3)\).

to(device=None, dtype=None)[source]

Like torch.nn.Module.to() method.

Return type:

Boxes3D

to_mask(depth, height, width)[source]

Convert ·D boxes to masks. Covered area is 1 and the remaining is 0.

Parameters:
  • depth (int) – depth of the masked image/images.

  • height (int) – height of the masked image/images.

  • width (int) – width of the masked image/images.

Return type:

Tensor

Returns:

the output mask tensor, shape of \((N, depth, width, height)\) or \((B,N, depth, width, height)\)

and dtype of Boxes3D.dtype() (it can be any floating point dtype).

Note

It is currently non-differentiable.

Examples

>>> boxes = Boxes3D(torch.tensor([[  # Equivalent to boxes = Boxes.3Dfrom_tensor([[1,1,1,3,3,2]])
...     [1., 1., 1.],
...     [3., 1., 1.],
...     [3., 3., 1.],
...     [1., 3., 1.],
...     [1., 1., 2.],
...     [3., 1., 2.],
...     [3., 3., 2.],
...     [1., 3., 2.],
... ]]))  # 1x8x3
>>> boxes.to_mask(4, 5, 5)
tensor([[[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 1., 1., 1., 0.],
          [0., 1., 1., 1., 0.],
          [0., 1., 1., 1., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 1., 1., 1., 0.],
          [0., 1., 1., 1., 0.],
          [0., 1., 1., 1., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]]])
to_tensor(mode='xyzxyz')[source]

Cast Boxes3D to a tensor.

mode controls which 3D boxes format should be use to represent boxes in the tensor.

Parameters:

mode (str, optional) –

The format in which the boxes are provided. Default: "xyzxyz"

  • ’xyzxyz’: boxes are assumed to be in the format xmin, ymin, zmin, xmax, ymax, zmax where width = xmax - xmin, height = ymax - ymin and depth = zmax - zmin.

  • ’xyzxyz_plus’: similar to ‘xyzxyz’ mode but where box width, length and depth are defined as

    width = xmax - xmin + 1, height = ymax - ymin + 1 and depth = zmax - zmin + 1.

  • ’xyzwhd’: boxes are assumed to be in the format xmin, ymin, zmin, width, height, depth where width = xmax - xmin, height = ymax - ymin and depth = zmax - zmin.

  • ’vertices’: boxes are defined by their vertices points in the following clockwise order: front-top-left, front-top-right, front-bottom-right, front-bottom-left, back-top-left, back-top-right, back-bottom-right, back-bottom-left. Vertices coordinates are in (x,y, z) order. Finally, box width, height and depth are defined as width = xmax - xmin, height = ymax - ymin and depth = zmax - zmin.

  • ’vertices_plus’: similar to ‘vertices’ mode but where box width, length and depth are defined as width = xmax - xmin + 1 and height = ymax - ymin + 1.

Returns:

  • ‘vertices’ or ‘verticies_plus’: \((N, 8, 3)\) or \((B, N, 8, 3)\).

  • Any other value: \((N, 6)\) or \((B, N, 6)\).

Return type:

3D Boxes tensor in the mode format. The shape depends with the mode value

Note

It is currently non-differentiable due to a bug. See github issue #1304.

Examples

>>> boxes_xyzxyz = torch.as_tensor([[0, 3, 6, 1, 4, 8], [5, 1, 3, 8, 4, 9]])
>>> boxes = Boxes3D.from_tensor(boxes_xyzxyz, mode='xyzxyz')
>>> assert (boxes.to_tensor(mode='xyzxyz') == boxes_xyzxyz).all()
transform_boxes(M, inplace=False)[source]

Apply a transformation matrix to the 3D boxes.

Parameters:
  • M (Tensor) – The transformation matrix to be applied, shape of \((4, 4)\) or \((B, 4, 4)\).

  • inplace (bool, optional) – do transform in-place and return self. Default: False

Return type:

Boxes3D

Returns:

The transformed boxes.

transform_boxes_(M)[source]

Inplace version of Boxes3D.transform_boxes().

Return type:

Boxes3D