Stereo Camera

In this module we provide the StereoCamera that contains functionality for working with a horizontal stereo camera setup.

The horizontal stereo camera setup is assumed to be calibrated and rectified such that the setup can be described by two camera matrices:

The left rectified camera matrix:

\[\begin{split}P_0 = \begin{bmatrix} fx & 0 & cx & 0 \\ 0 & fy & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\end{split}\]

The right rectified camera matrix:

\[\begin{split}P_1 = \begin{bmatrix} fx & 0 & cx & tx * fx \\ 0 & fy & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\end{split}\]
where:
  • \(fx\) is the focal length in the x-direction in pixels.

  • \(fy\) is the focal length in the y-direction in pixels.

  • \(cx\) is the x-coordinate of the principal point in pixels.

  • \(cy\) is the y-coordinate of the principal point in pixels.

  • \(tx\) is the horizontal baseline in metric units.

These camera matrices are obtained by calibrating your stereo camera setup which can be done in OpenCV.

The StereoCamera allows you to convert disparity maps to the real world 3D geometry represented by a point cloud.

This is done by forming the \(Q\) matrix.

Using the pinhole camera model to project \([X Y Z 1]\) in world coordinates to \(uv\) pixels in the left and right camera frame respectively:

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = P_0 * \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \\ \begin{bmatrix} u-d \\ v \\ 1 \end{bmatrix} = P_1 * \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Where \(d\) is the disparity between pixels in left and right image.

Combining these two expressions let us write it as one matrix multiplication

\[\begin{split}\begin{bmatrix} u \\ v \\ u-d \\ 1 \end{bmatrix} = \begin{bmatrix} fx & 0 & cx_{left} & 0 \\ 0 & fy & cy & 0 \\ fx & 0 & cx_{right} & fx * tx \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Now subtract the first from the third row and invert the expression and you’ll get:

\[\begin{split}\begin{bmatrix} u \\ v \\ d \\ 1 \end{bmatrix} = \begin{bmatrix} fy * tx & 0 & 0 & -fy * cx * tx \\ 0 & fx * tx & 0 & -fx * cy * tx \\ 0 & 0 & 0 & fx * fy * tx \\ 0 & 0 & -fy & fy * (cx_{left} -cx_{right}) \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Where \(Q\) is

\[\begin{split}Q = \begin{bmatrix} fy * tx & 0 & 0 & -fy * cx * tx \\ 0 & fx * tx & 0 & -fx * cy * tx \\ 0 & 0 & 0 & fx * fy * tx \\ 0 & 0 & -fy & fy * (cx_{left} -cx_{right}) \end{bmatrix}\end{split}\]

Notice here that the x-coordinate for the principal point in the left and right camera \(cx\) might differ, which is being taken into account here.

Assuming \(fx = fy\) you can further reduce this to:

\[\begin{split}Q = \begin{bmatrix} 1 & 0 & 0 & -cx \\ 0 & 1 & 0 & -cy \\ 0 & 0 & 0 & fx \\ 0 & 0 & -1/tx & (cx_{left} -cx_{right} / tx) \end{bmatrix}\end{split}\]

But we’ll use the general \(Q\) matrix.

Using the \(Q\) matrix we can obtain the 3D points by:

\[\begin{split}\begin{bmatrix} X \\ Y \\ Z \\ W \end{bmatrix} = Q * \begin{bmatrix} u \\ v \\ disparity(y, v) \\ z \end{bmatrix}\end{split}\]
class kornia.geometry.camera.stereo.StereoCamera(rectified_left_camera, rectified_right_camera)[source]
__init__(rectified_left_camera, rectified_right_camera)[source]

Class representing a horizontal stereo camera setup.

Parameters
  • rectified_left_camera (Tensor) – The rectified left camera projection matrix of shape \((B, 3, 4)\)

  • rectified_right_camera (Tensor) – The rectified right camera projection matrix of shape \((B, 3, 4)\)

property Q: torch.Tensor

The Q matrix of the horizontal stereo setup.

This matrix is used for reprojecting a disparity tensor to the corresponding point cloud. Note that this is in a general form that allows different focal lengths in the x and y direction.

Return type

Tensor

Returns

The Q matrix of shape \((B, 4, 4)\).

property batch_size: int

Return the batch size of the storage.

Return type

int

Returns

scalar with the batch size

property cx_left: torch.Tensor

Return the x-coordinate of the principal point for the left camera.

Return type

Tensor

Returns

tensor of shape \((B)\)

property cx_right: torch.Tensor

Return the x-coordinate of the principal point for the right camera.

Return type

Tensor

Returns

tensor of shape \((B)\)

property cy: torch.Tensor

Return the y-coordinate of the principal point.

Note that the y-coordinate of the principal points is assumed to be equal for the left and right camera.

Return type

Tensor

Returns

tensor of shape \((B)\)

property fx: torch.Tensor

Return the focal length in the x-direction.

Note that the focal lengths of the rectified left and right camera are assumed to be equal.

Return type

Tensor

Returns

tensor of shape \((B)\)

property fy: torch.Tensor

Returns the focal length in the y-direction.

Note that the focal lengths of the rectified left and right camera are assumed to be equal.

Return type

Tensor

Returns

tensor of shape \((B)\)

reproject_disparity_to_3D(disparity_tensor)[source]

Reproject the disparity tensor to a 3D point cloud.

Parameters

disparity_tensor (Tensor) – Disparity tensor of shape \((B, 1, H, W)\).

Return type

Tensor

Returns

The 3D point cloud of shape \((B, H, W, 3)\)

property tx: torch.Tensor

The horizontal baseline between the two cameras.

Return type

Tensor

Returns

Tensor of shape \((B)\)

kornia.geometry.camera.stereo.reproject_disparity_to_3D(disparity_tensor, Q_matrix)[source]

Reproject the disparity tensor to a 3D point cloud.

Parameters
  • disparity_tensor (Tensor) – Disparity tensor of shape \((B, 1, H, W)\).

  • Q_matrix (Tensor) – Tensor of Q matrices of shapes \((B, 4, 4)\).

Return type

Tensor

Returns

The 3D point cloud of shape \((B, H, W, 3)\)