Half-Precision Support

This page documents which kornia modules support half-precision floating-point dtypes (torch.float16 and torch.bfloat16) and what limitations to expect.

Half-Precision Support by Module

Module

float16

bfloat16

Notes

kornia.color

⚠️ Partial

⚠️ Partial

Most color space conversions work for both half-precision dtypes. FFT-based operations may fail on CUDA.

kornia.filters

⚠️ Partial

⚠️ Partial

Basic convolution-based filters (Gaussian, Sobel, Median, Box) work for both dtypes. FFT-based operations (fft_conv) may fail on CUDA.

kornia.enhance

⚠️ Partial

⚠️ Partial

Histogram equalization, CLAHE, gamma correction, and ZCA whitening work for both dtypes. ZCA linalg ops go through _torch_svd_cast / _torch_inverse_cast which promote to float32 before computing.

kornia.morphology

✅ Yes

✅ Yes

Uses only convolution and pooling; no dtype restrictions.

kornia.augmentation

⚠️ Partial

⚠️ Partial

Both dtypes are accepted by validate_tensor. Most ops work; precision-sensitive transforms (e.g. affine with large rotations) may produce inaccurate results at half precision.

kornia.geometry.transform

⚠️ Partial

⚠️ Partial

Affine, homography, resize, and warp operations use _torch_inverse_cast / _torch_solve_cast which promote to float32 and cast back; both dtypes work.

kornia.geometry.camera

⚠️ Partial

⚠️ Partial

Pinhole camera model and most projection ops work for both dtypes. StereoCamera accepts both float16 and bfloat16.

kornia.geometry.calibration

❌ No

❌ No

solve_pnp_dlt() explicitly checks that inputs are float32 or float64 and raises otherwise.

kornia.geometry.epipolar

⚠️ Partial

⚠️ Partial

SVD and solve operations use _torch_svd_cast / _torch_solve_cast / _torch_inverse_cast; both dtypes work via casting to float32.

kornia.geometry.homography

⚠️ Partial

⚠️ Partial

Uses _torch_svd_cast; both dtypes are promoted to float32 before SVD and the result is cast back.

kornia.geometry.liegroup

⚠️ Partial

⚠️ Partial

Most rotation/translation operations (SO2, SO3, SE2, SE3) work for both dtypes via cast helpers. A few code paths may still fail.

kornia.geometry.solvers

⚠️ Partial

⚠️ Partial

RANSAC-based solvers use _torch_solve_cast; both dtypes are promoted before the solve and the result is cast back.

kornia.geometry.subpix

⚠️ Partial

⚠️ Partial

Soft-argmax and weighted softmax work for both dtypes. Precision-sensitive ops may produce inaccurate results.

kornia.losses

⚠️ Partial

⚠️ Partial

Photometric losses (SSIM, PSNR, MS-SSIM) work for both dtypes. Losses based on linalg operations (Hausdorff, etc.) may not.

kornia.feature

⚠️ Partial

⚠️ Partial

Local feature detectors and descriptors (SIFT, HardNet, DISK, DeDoDe) work for inference. Feature matching uses a manual cdist fallback for both half-precision dtypes on CUDA.

kornia.metrics

⚠️ Partial

⚠️ Partial

Simple pixel-level metrics work for both dtypes. Metrics involving linalg operations may not.

kornia.models

⚠️ Partial

⚠️ Partial

Conv-based models work for both dtypes. Attention-based models (e.g. VLMs, ViTs) may have internal dtype mismatches.

Legend

  • Yes — Works correctly; results are accurate at the given precision.

  • ⚠️ Partial — Some operations work; others fail at runtime or produce inaccurate results due to limited numerical range/precision.

  • No — Not supported; raises a RuntimeError or TypeError at runtime (explicit dtype check in the implementation).

Test Results

Measured on commit 6131e98 (2026-03-21), full test suite (no --runslow). Pass% = passed ÷ (passed + failed); skipped and xfailed tests are excluded.

Run

Passed

Failed

Skipped

Pass%

CPU float32 (baseline)

7647

3

3269

99.9%

CUDA float32 (baseline)

7634

3

3280

99.9%

CPU float16

6866

747

3306

90.1%

CPU bfloat16

6838

812

3269

89.3%

CUDA float16 (KORNIA_TEST_IN_SUBPROCESS=1)

6727

643

3556

91.3%

CUDA bfloat16 (KORNIA_TEST_IN_SUBPROCESS=1)

6695

713

3518

90.4%

Note

CUDA half-precision tests are measured using KORNIA_TEST_IN_SUBPROCESS=1 which bypasses the skip_half_precision_on_cuda fixture. Each test then runs in the same process but with the cuda_device_assert_guard fixture synchronising CUDA before and after each test. For full isolation the current implementation uses subprocess.run for true process isolation; a fresh --isolate-half-precision flag spawns each test in a fresh subprocess.run process with no shared CUDA state.

Test Suite Behaviour

Half-precision tests live in the same directories and files as their float32/float64 counterparts. They are run as separate, isolated pytest invocations rather than being mixed into a combined --dtype=all run. This prevents a CUDA device-side assert in a half-precision test from corrupting the CUDA context and causing unrelated float32 tests to fail.

# Standard precision — default CI
pixi run test tests/ --dtype=float32,float64

# Half-precision — run in isolation, per directory
pytest tests/color/     --dtype=float16,bfloat16
pytest tests/geometry/  --dtype=float16,bfloat16 --device=cuda

Two autouse fixtures in the root conftest.py enforce safe behaviour:

  • ``skip_half_precision_on_cuda`` — skips float16/bfloat16 tests on CUDA in combined runs so no half-precision kernel is ever launched (and therefore no device-side assert can fire).

  • ``cuda_device_assert_guard`` — synchronises CUDA before and after each CUDA test to catch async device-side assert errors in the test that caused them, not in the next one. If the context is already corrupted, the test is skipped rather than allowed to fail spuriously.

With --isolate-half-precision, each float16/bfloat16 CUDA test is intercepted by a custom pytest_runtest_protocol hook and executed in a completely fresh Python process via subprocess.run. There is no shared CUDA context between tests, so a device-side assert in one test cannot affect any other.

See TESTING.md in the repository root for a full description of the contamination mechanism and fixture implementation.