Stereo Camera¶

In this module we provide the StereoCamera that contains functionality for working with a horizontal stereo camera setup.

The horizontal stereo camera setup is assumed to be calibrated and rectified such that the setup can be described by two camera matrices:

The left rectified camera matrix:

\[\begin{split}P_0 = \begin{bmatrix} fx & 0 & cx & 0 \\ 0 & fy & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\end{split}\]

The right rectified camera matrix:

\[\begin{split}P_1 = \begin{bmatrix} fx & 0 & cx & tx * fx \\ 0 & fy & cy & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}\end{split}\]

where:

\(fx\) is the focal length in the x-direction in pixels.
\(fy\) is the focal length in the y-direction in pixels.
\(cx\) is the x-coordinate of the principal point in pixels.
\(cy\) is the y-coordinate of the principal point in pixels.
\(tx\) is the horizontal baseline in metric units.

These camera matrices are obtained by calibrating your stereo camera setup which can be done in OpenCV.

The StereoCamera allows you to convert disparity maps to the real world 3D geometry represented by a point cloud.

This is done by forming the \(Q\) matrix.

Using the pinhole camera model to project \([X Y Z 1]\) in world coordinates to \(uv\) pixels in the left and right camera frame respectively:

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = P_0 * \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix} \\ \begin{bmatrix} u-d \\ v \\ 1 \end{bmatrix} = P_1 * \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Where \(d\) is the disparity between pixels in left and right image.

Combining these two expressions let us write it as one matrix multiplication

\[\begin{split}\begin{bmatrix} u \\ v \\ u-d \\ 1 \end{bmatrix} = \begin{bmatrix} fx & 0 & cx_{left} & 0 \\ 0 & fy & cy & 0 \\ fx & 0 & cx_{right} & fx * tx \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Now subtract the first from the third row and invert the expression and you’ll get:

\[\begin{split}\begin{bmatrix} u \\ v \\ d \\ 1 \end{bmatrix} = \begin{bmatrix} fy * tx & 0 & 0 & -fy * cx * tx \\ 0 & fx * tx & 0 & -fx * cy * tx \\ 0 & 0 & 0 & fx * fy * tx \\ 0 & 0 & -fy & fy * (cx_{left} -cx_{right}) \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Where \(Q\) is

\[\begin{split}Q = \begin{bmatrix} fy * tx & 0 & 0 & -fy * cx * tx \\ 0 & fx * tx & 0 & -fx * cy * tx \\ 0 & 0 & 0 & fx * fy * tx \\ 0 & 0 & -fy & fy * (cx_{left} -cx_{right}) \end{bmatrix}\end{split}\]

Notice here that the x-coordinate for the principal point in the left and right camera \(cx\) might differ, which is being taken into account here.

Assuming \(fx = fy\) you can further reduce this to:

\[\begin{split}Q = \begin{bmatrix} 1 & 0 & 0 & -cx \\ 0 & 1 & 0 & -cy \\ 0 & 0 & 0 & fx \\ 0 & 0 & -1/tx & (cx_{left} -cx_{right} / tx) \end{bmatrix}\end{split}\]

But we’ll use the general \(Q\) matrix.

Using the \(Q\) matrix we can obtain the 3D points by:

\[\begin{split}\begin{bmatrix} X \\ Y \\ Z \\ W \end{bmatrix} = Q * \begin{bmatrix} u \\ v \\ disparity(y, v) \\ z \end{bmatrix}\end{split}\]

class kornia.geometry.camera.stereo.StereoCamera(rectified_left_camera, rectified_right_camera)[source]¶

Represent a horizontal stereo camera setup for depth estimation.

Parameters:

rectified_left_camera (Tensor) – The intrinsic matrix for the left camera.
rectified_right_camera (Tensor) – The intrinsic matrix for the right camera.

__init__(rectified_left_camera, rectified_right_camera)[source]¶

Class representing a horizontal stereo camera setup.

Parameters:

rectified_left_camera (Tensor) – The rectified left camera projection matrix of shape \((B, 3, 4)\)
rectified_right_camera (Tensor) – The rectified right camera projection matrix of shape \((B, 3, 4)\)

property Q: Tensor¶

The Q matrix of the horizontal stereo setup.

This matrix is used for reprojecting a disparity torch.Tensor to the corresponding point cloud. Note that this is in a general form that allows different focal lengths in the x and y direction.

Returns:: The Q matrix of shape \((B, 4, 4)\).

property batch_size: int¶

Return the batch size of the storage.

Returns:: scalar with the batch size

property cx_left: Tensor¶

Return the x-coordinate of the principal point for the left camera.

Returns:: torch.Tensor of shape \((B)\)

property cx_right: Tensor¶

Return the x-coordinate of the principal point for the right camera.

Returns:: torch.Tensor of shape \((B)\)

property cy: Tensor¶

Return the y-coordinate of the principal point.

Note that the y-coordinate of the principal points is assumed to be equal for the left and right camera.

Returns:: torch.Tensor of shape \((B)\)

property fx: Tensor¶

Return the focal length in the x-direction.

Note that the focal lengths of the rectified left and right camera are assumed to be equal.

Returns:: torch.Tensor of shape \((B)\)

property fy: Tensor¶

Returns the focal length in the y-direction.

Note that the focal lengths of the rectified left and right camera are assumed to be equal.

Returns:: torch.Tensor of shape \((B)\)

reproject_disparity_to_3D(disparity_tensor)[source]¶

Reproject the disparity torch.Tensor to a 3D point cloud.

Parameters:: disparity_tensor (Tensor) – Disparity torch.Tensor of shape \((B, 1, H, W)\).
Return type:: Tensor
Returns:: The 3D point cloud of shape \((B, H, W, 3)\)

property tx: Tensor¶

The horizontal baseline between the two cameras.

Returns:: torch.Tensor of shape \((B)\)

kornia.geometry.camera.stereo.reproject_disparity_to_3D(disparity_tensor, Q_matrix)[source]¶

Reproject the disparity torch.Tensor to a 3D point cloud.

Parameters:

disparity_tensor (Tensor) – Disparity torch.Tensor of shape \((B, H, W, 1)\).
Q_matrix (Tensor) – torch.Tensor of Q matrices of shapes \((B, 4, 4)\).

Return type:

Tensor

Returns:

The 3D point cloud of shape \((B, H, W, 3)\)