Half-Precision Support
======================

This page documents which kornia modules support half-precision floating-point dtypes
(``torch.float16`` and ``torch.bfloat16``) and what limitations to expect.

.. list-table:: Half-Precision Support by Module
   :header-rows: 1
   :widths: 28 14 14 44

   * - Module
     - float16
     - bfloat16
     - Notes
   * - ``kornia.color``
     - ⚠️ Partial
     - ⚠️ Partial
     - Most color space conversions work for both half-precision dtypes.
       FFT-based operations may fail on CUDA.
   * - ``kornia.filters``
     - ⚠️ Partial
     - ⚠️ Partial
     - Basic convolution-based filters (Gaussian, Sobel, Median, Box) work
       for both dtypes. FFT-based operations (``fft_conv``) may fail on CUDA.
   * - ``kornia.enhance``
     - ⚠️ Partial
     - ⚠️ Partial
     - Histogram equalization, CLAHE, gamma correction, and ZCA whitening work
       for both dtypes. ZCA linalg ops go through ``_torch_svd_cast`` /
       ``_torch_inverse_cast`` which promote to float32 before computing.
   * - ``kornia.morphology``
     - ✅ Yes
     - ✅ Yes
     - Uses only convolution and pooling; no dtype restrictions.
   * - ``kornia.augmentation``
     - ⚠️ Partial
     - ⚠️ Partial
     - Both dtypes are accepted by ``validate_tensor``. Most ops work;
       precision-sensitive transforms (e.g. affine with large rotations) may
       produce inaccurate results at half precision.
   * - ``kornia.geometry.transform``
     - ⚠️ Partial
     - ⚠️ Partial
     - Affine, homography, resize, and warp operations use ``_torch_inverse_cast``
       / ``_torch_solve_cast`` which promote to float32 and cast back;
       both dtypes work.
   * - ``kornia.geometry.camera``
     - ⚠️ Partial
     - ⚠️ Partial
     - Pinhole camera model and most projection ops work for both dtypes.
       ``StereoCamera`` accepts both float16 and bfloat16.
   * - ``kornia.geometry.calibration``
     - ❌ No
     - ❌ No
     - ``solve_pnp_dlt()`` explicitly checks that inputs are ``float32`` or
       ``float64`` and raises otherwise.
   * - ``kornia.geometry.epipolar``
     - ⚠️ Partial
     - ⚠️ Partial
     - SVD and solve operations use ``_torch_svd_cast`` / ``_torch_solve_cast``
       / ``_torch_inverse_cast``; both dtypes work via casting to float32.
   * - ``kornia.geometry.homography``
     - ⚠️ Partial
     - ⚠️ Partial
     - Uses ``_torch_svd_cast``; both dtypes are promoted to float32 before SVD
       and the result is cast back.
   * - ``kornia.geometry.liegroup``
     - ⚠️ Partial
     - ⚠️ Partial
     - Most rotation/translation operations (SO2, SO3, SE2, SE3) work for both
       dtypes via cast helpers. A few code paths may still fail.
   * - ``kornia.geometry.solvers``
     - ⚠️ Partial
     - ⚠️ Partial
     - RANSAC-based solvers use ``_torch_solve_cast``; both dtypes are promoted
       before the solve and the result is cast back.
   * - ``kornia.geometry.subpix``
     - ⚠️ Partial
     - ⚠️ Partial
     - Soft-argmax and weighted softmax work for both dtypes.
       Precision-sensitive ops may produce inaccurate results.
   * - ``kornia.losses``
     - ⚠️ Partial
     - ⚠️ Partial
     - Photometric losses (SSIM, PSNR, MS-SSIM) work for both dtypes.
       Losses based on linalg operations (Hausdorff, etc.) may not.
   * - ``kornia.feature``
     - ⚠️ Partial
     - ⚠️ Partial
     - Local feature detectors and descriptors (SIFT, HardNet, DISK, DeDoDe)
       work for inference. Feature *matching* uses a manual ``cdist`` fallback
       for both half-precision dtypes on CUDA.
   * - ``kornia.metrics``
     - ⚠️ Partial
     - ⚠️ Partial
     - Simple pixel-level metrics work for both dtypes. Metrics involving linalg
       operations may not.
   * - ``kornia.models``
     - ⚠️ Partial
     - ⚠️ Partial
     - Conv-based models work for both dtypes. Attention-based models (e.g.
       VLMs, ViTs) may have internal dtype mismatches.

Legend
------

- ✅ **Yes** — Works correctly; results are accurate at the given precision.
- ⚠️ **Partial** — Some operations work; others fail at runtime or produce inaccurate results due to limited numerical range/precision.
- ❌ **No** — Not supported; raises a ``RuntimeError`` or ``TypeError`` at runtime (explicit dtype check in the implementation).

Test Results
------------

Measured on commit ``6131e98`` (2026-03-21), full test suite (no ``--runslow``).
Pass% = passed ÷ (passed + failed); skipped and xfailed tests are excluded.

.. list-table::
   :header-rows: 1
   :widths: 32 10 10 10 10

   * - Run
     - Passed
     - Failed
     - Skipped
     - Pass%
   * - CPU float32 *(baseline)*
     - 7647
     - 3
     - 3269
     - **99.9%**
   * - CUDA float32 *(baseline)*
     - 7634
     - 3
     - 3280
     - **99.9%**
   * - CPU float16
     - 6866
     - 747
     - 3306
     - **90.1%**
   * - CPU bfloat16
     - 6838
     - 812
     - 3269
     - **89.3%**
   * - CUDA float16 *(KORNIA_TEST_IN_SUBPROCESS=1)*
     - 6727
     - 643
     - 3556
     - **91.3%**
   * - CUDA bfloat16 *(KORNIA_TEST_IN_SUBPROCESS=1)*
     - 6695
     - 713
     - 3518
     - **90.4%**

.. note::

   CUDA half-precision tests are measured using ``KORNIA_TEST_IN_SUBPROCESS=1``
   which bypasses the ``skip_half_precision_on_cuda`` fixture.  Each test then
   runs in the same process but with the ``cuda_device_assert_guard`` fixture
   synchronising CUDA before and after each test.  For full isolation the current
   implementation uses ``subprocess.run`` for true process isolation; a fresh
   ``--isolate-half-precision`` flag spawns each test in a fresh ``subprocess.run``
   process with no shared CUDA state.

Test Suite Behaviour
--------------------

Half-precision tests live in the same directories and files as their
float32/float64 counterparts.  They are run as **separate, isolated pytest
invocations** rather than being mixed into a combined ``--dtype=all`` run.
This prevents a CUDA device-side assert in a half-precision test from
corrupting the CUDA context and causing unrelated float32 tests to fail.

.. code-block:: bash

   # Standard precision — default CI
   pixi run test tests/ --dtype=float32,float64

   # Half-precision — run in isolation, per directory
   pytest tests/color/     --dtype=float16,bfloat16
   pytest tests/geometry/  --dtype=float16,bfloat16 --device=cuda

Two autouse fixtures in the root ``conftest.py`` enforce safe behaviour:

- **``skip_half_precision_on_cuda``** — skips float16/bfloat16 tests on CUDA
  in combined runs so no half-precision kernel is ever launched (and therefore
  no device-side assert can fire).
- **``cuda_device_assert_guard``** — synchronises CUDA before and after each
  CUDA test to catch async device-side assert errors in the test that caused
  them, not in the next one.  If the context is already corrupted, the test
  is skipped rather than allowed to fail spuriously.

With ``--isolate-half-precision``, each float16/bfloat16 CUDA test is
intercepted by a custom ``pytest_runtest_protocol`` hook and executed in a
completely fresh Python process via ``subprocess.run``.  There is no shared
CUDA context between tests, so a device-side assert in one test cannot affect
any other.

See ``TESTING.md`` in the repository root for a full description of the
contamination mechanism and fixture implementation.