gcc/doc/gcc/extensions-to-the-c-language-family/half-precision-floating-point.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 .. index:: half-precision floating point, __fp16 data type, __Float16 data type
   7
   8 .. _half-precision:
   9
  10 Half-Precision Floating Point
  11 *****************************
  12
  13 On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
  14 point via the ``__fp16`` type defined in the ARM C Language Extensions.
  15 On ARM systems, you must enable this type explicitly with the
  16 :option:`-mfp16-format` command-line option in order to use it.
  17 On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
  18 floating point via the ``_Float16`` type. For C++, x86 provides a builtin
  19 type named ``_Float16`` which contains same data format as C.
  20
  21 ARM targets support two incompatible representations for half-precision
  22 floating-point values.  You must choose one of the representations and
  23 use it consistently in your program.
  24
  25 Specifying :option:`-mfp16-format=ieee` selects the IEEE 754-2008 format.
  26 This format can represent normalized values in the range of 2^{-14} to 65504.
  27 There are 11 bits of significand precision, approximately 3
  28 decimal digits.
  29
  30 Specifying :option:`-mfp16-format=alternative` selects the ARM
  31 alternative format.  This representation is similar to the IEEE
  32 format, but does not support infinities or NaNs.  Instead, the range
  33 of exponents is extended, so that this format can represent normalized
  34 values in the range of 2^{-14} to 131008.
  35
  36 The GCC port for AArch64 only supports the IEEE 754-2008 format, and does
  37 not require use of the :option:`-mfp16-format` command-line option.
  38
  39 The ``__fp16`` type may only be used as an argument to intrinsics defined
  40 in ``<arm_fp16.h>``, or as a storage format.  For purposes of
  41 arithmetic and other operations, ``__fp16`` values in C or C++
  42 expressions are automatically promoted to ``float``.
  43
  44 The ARM target provides hardware support for conversions between
  45 ``__fp16`` and ``float`` values
  46 as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides
  47 hardware support for conversions between ``__fp16`` and ``double``
  48 values.  GCC generates code using these hardware instructions if you
  49 compile with options to select an FPU that provides them;
  50 for example, :option:`-mfpu=neon-fp16 -mfloat-abi=softfp`,
  51 in addition to the :option:`-mfp16-format` option to select
  52 a half-precision format.
  53
  54 Language-level support for the ``__fp16`` data type is
  55 independent of whether GCC generates code using hardware floating-point
  56 instructions.  In cases where hardware support is not specified, GCC
  57 implements conversions between ``__fp16`` and other types as library
  58 calls.
  59
  60 It is recommended that portable code use the ``_Float16`` type defined
  61 by ISO/IEC TS 18661-3:2015.  See :ref:`floating-types`.
  62
  63 On x86 targets with SSE2 enabled, without :option:`-mavx512fp16`,
  64 all operations will be emulated by software emulation and the ``float``
  65 instructions. The default behavior for ``FLT_EVAL_METHOD`` is to keep the
  66 intermediate result of the operation as 32-bit precision. This may lead to
  67 inconsistent behavior between software emulation and AVX512-FP16 instructions.
  68 Using :option:`-fexcess-precision=16` will force round back after each operation.
  69
  70 Using :option:`-mavx512fp16` will generate AVX512-FP16 instructions instead of
  71 software emulation. The default behavior of ``FLT_EVAL_METHOD`` is to round
  72 after each operation. The same is true with :option:`-fexcess-precision=standard`
  73 and :option:`-mfpmath=sse`. If there is no :option:`-mfpmath=sse`,
  74 :option:`-fexcess-precision=standard` alone does the same thing as before,
  75 It is useful for code that does not have ``_Float16`` and runs on the x87
  76 FPU.