[thirdparty/glibc.git] / manual / =float.texinfo

@node Floating-Point Limits
@chapter Floating-Point Limits
@pindex <float.h>
@cindex floating-point number representation
@cindex representation of floating-point numbers

Because floating-point numbers are represented internally as approximate
quantities, algorithms for manipulating floating-point data often need
to be parameterized in terms of the accuracy of the representation.
Some of the functions in the C library itself need this information; for
example, the algorithms for printing and reading floating-point numbers
(@pxref{I/O on Streams}) and for calculating trigonometric and
irrational functions (@pxref{Mathematics}) use information about the
underlying floating-point representation to avoid round-off error and
loss of accuracy.  User programs that implement numerical analysis
techniques also often need to be parameterized in this way in order to
minimize or compute error bounds.

The specific representation of floating-point numbers varies from
machine to machine.  The GNU C Library defines a set of parameters which
characterize each of the supported floating-point representations on a
particular system.

@menu
* Floating-Point Representation::   Definitions of terminology.
* Floating-Point Parameters::	    Descriptions of the library facilities.
* IEEE Floating-Point::		    An example of a common representation.
@end menu

@node Floating-Point Representation
@section Floating-Point Representation

This section introduces the terminology used to characterize the
representation of floating-point numbers.

You are probably already familiar with most of these concepts in terms
of scientific or exponential notation for floating-point numbers.  For
example, the number @code{123456.0} could be expressed in exponential
notation as @code{1.23456e+05}, a shorthand notation indicating that the
mantissa @code{1.23456} is multiplied by the base @code{10} raised to
power @code{5}.

More formally, the internal representation of a floating-point number
can be characterized in terms of the following parameters:

@itemize @bullet
@item
The @dfn{sign} is either @code{-1} or @code{1}.
@cindex sign (of floating-point number)

@item
The @dfn{base} or @dfn{radix} for exponentiation; an integer greater
than @code{1}.  This is a constant for the particular representation.
@cindex base (of floating-point number)
@cindex radix (of floating-point number)

@item
The @dfn{exponent} to which the base is raised.  The upper and lower
bounds of the exponent value are constants for the particular
representation.
@cindex exponent (of floating-point number)

Sometimes, in the actual bits representing the floating-point number,
the exponent is @dfn{biased} by adding a constant to it, to make it
always be represented as an unsigned quantity.  This is only important
if you have some reason to pick apart the bit fields making up the
floating-point number by hand, which is something for which the GNU
library provides no support.  So this is ignored in the discussion that
follows.
@cindex bias, in exponent (of floating-point number)

@item
The value of the @dfn{mantissa} or @dfn{significand}, which is an
unsigned quantity.
@cindex mantissa (of floating-point number)
@cindex significand (of floating-point number)

@item
The @dfn{precision} of the mantissa.  If the base of the representation
is @var{b}, then the precision is the number of base-@var{b} digits in
the mantissa.  This is a constant for the particular representation.

Many floating-point representations have an implicit @dfn{hidden bit} in
the mantissa.  Any such hidden bits are counted in the precision.
Again, the GNU library provides no facilities for dealing with such low-level
aspects of the representation.
@cindex precision (of floating-point number)
@cindex hidden bit, in mantissa (of floating-point number)
@end itemize

The mantissa of a floating-point number actually represents an implicit
fraction whose denominator is the base raised to the power of the
precision.  Since the largest representable mantissa is one less than
this denominator, the value of the fraction is always strictly less than
@code{1}.  The mathematical value of a floating-point number is then the
product of this fraction; the sign; and the base raised to the exponent.

If the floating-point number is @dfn{normalized}, the mantissa is also
greater than or equal to the base raised to the power of one less
than the precision (unless the number represents a floating-point zero,
in which case the mantissa is zero).  The fractional quantity is
therefore greater than or equal to @code{1/@var{b}}, where @var{b} is
the base.
@cindex normalized floating-point number

@node Floating-Point Parameters
@section Floating-Point Parameters

@strong{Incomplete:}  This section needs some more concrete examples
of what these parameters mean and how to use them in a program.

These macro definitions can be accessed by including the header file
@file{<float.h>} in your program.

Macro names starting with @samp{FLT_} refer to the @code{float} type,
while names beginning with @samp{DBL_} refer to the @code{double} type
and names beginning with @samp{LDBL_} refer to the @code{long double}
type.  (In implementations that do not support @code{long double} as
a distinct data type, the values for those constants are the same
as the corresponding constants for the @code{double} type.)@refill

Note that only @code{FLT_RADIX} is guaranteed to be a constant
expression, so the other macros listed here cannot be reliably used in
places that require constant expressions, such as @samp{#if}
preprocessing directives and array size specifications.

Although the @w{ISO C} standard specifies minimum and maximum values for
most of these parameters, the GNU C implementation uses whatever
floating-point representations are supported by the underlying hardware.
So whether GNU C actually satisfies the @w{ISO C} requirements depends on
what machine it is running on.

@comment float.h
@comment ISO
@defvr Macro FLT_ROUNDS
This value characterizes the rounding mode for floating-point addition.
The following values indicate standard rounding modes:

@table @code
@item -1
The mode is indeterminable.
@item 0
Rounding is towards zero.
@item 1
Rounding is to the nearest number.
@item 2
Rounding is towards positive infinity.
@item 3
Rounding is towards negative infinity.
@end table

@noindent
Any other value represents a machine-dependent nonstandard rounding
mode.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_RADIX
This is the value of the base, or radix, of exponent representation.
This is guaranteed to be a constant expression, unlike the other macros
described in this section.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{float} data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{double} data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MANT_DIG
This is the number of base-@code{FLT_RADIX} digits in the floating-point
mantissa for the @code{long double} data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_DIG
This is the number of decimal digits of precision for the @code{float}
data type.  Technically, if @var{p} and @var{b} are the precision and
base (respectively) for the representation, then the decimal precision
@var{q} is the maximum number of decimal digits such that any floating
point number with @var{q} base 10 digits can be rounded to a floating
point number with @var{p} base @var{b} digits and back again, without
change to the @var{q} decimal digits.

The value of this macro is guaranteed to be at least @code{6}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{double} data
type.  The value of this macro is guaranteed to be at least @code{10}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_DIG
This is similar to @code{FLT_DIG}, but is for the @code{long double}
data type.  The value of this macro is guaranteed to be at least
@code{10}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_MIN_EXP
This is the minimum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}.  In terms of the
actual implementation, this is just the smallest value that can be
represented in the exponent field of the number.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MIN_EXP
This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_MIN_10_EXP
This is the minimum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}.  This is
guaranteed to be no greater than @code{-37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MIN_10_EXP
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
double} data type.
@end defvr


@comment float.h
@comment ISO
@defvr Macro FLT_MAX_EXP
This is the maximum negative integer such that the mathematical value
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
floating-point number of type @code{float}.  In terms of the actual
implementation, this is just the largest value that can be represented
in the exponent field of the number.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MAX_EXP
This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro FLT_MAX_10_EXP
This is the maximum negative integer such that the mathematical value
@code{10} raised to this power minus 1 can be represented as a
normalized floating-point number of type @code{float}.  This is
guaranteed to be at least @code{37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
data type.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MAX_10_EXP
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
double} data type.
@end defvr


@comment float.h
@comment ISO
@defvr Macro FLT_MAX
The value of this macro is the maximum representable floating-point
number of type @code{float}, and is guaranteed to be at least
@code{1E+37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{double}, and is guaranteed to be at least
@code{1E+37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MAX
The value of this macro is the maximum representable floating-point
number of type @code{long double}, and is guaranteed to be at least
@code{1E+37}.
@end defvr


@comment float.h
@comment ISO
@defvr Macro FLT_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{float}, and is
guaranteed to be no more than @code{1E-37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{double}, and
is guaranteed to be no more than @code{1E-37}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_MIN
The value of this macro is the minimum normalized positive
floating-point number that is representable by type @code{long double},
and is guaranteed to be no more than @code{1E-37}.
@end defvr


@comment float.h
@comment ISO
@defvr Macro FLT_EPSILON
This is the minimum positive floating-point number of type @code{float}
such that @code{1.0 + FLT_EPSILON != 1.0} is true.  It's guaranteed to
be no greater than @code{1E-5}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro DBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{double}
type.  The maximum value is @code{1E-9}.
@end defvr

@comment float.h
@comment ISO
@defvr Macro LDBL_EPSILON
This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
type.  The maximum value is @code{1E-9}.
@end defvr


@node IEEE Floating Point
@section IEEE Floating Point

Here is an example showing how these parameters work for a common
floating point representation, specified by the @cite{IEEE Standard for
Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 or ANSI/IEEE
Std 854-1987)}.

The IEEE single-precision float representation uses a base of 2.  There
is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
precision is 24 base-2 digits), and an 8-bit exponent that can represent
values in the range -125 to 128, inclusive.

So, for an implementation that uses this representation for the
@code{float} data type, appropriate values for the corresponding
parameters are:

@example
FLT_RADIX                         2
FLT_MANT_DIG                     24
FLT_DIG                           6
FLT_MIN_EXP                    -125
FLT_MIN_10_EXP                  -37
FLT_MAX_EXP                     128
FLT_MAX_10_EXP                  +38
FLT_MIN             1.17549435E-38F
FLT_MAX             3.40282347E+38F
FLT_EPSILON         1.19209290E-07F
@end example
Commit	Line	Data
f65fd747	1	@node Floating-Point Limits
28f540f4 RM	2	@chapter Floating-Point Limits
	3	@pindex <float.h>
	4	@cindex floating-point number representation
	5	@cindex representation of floating-point numbers
	6
	7	Because floating-point numbers are represented internally as approximate
	8	quantities, algorithms for manipulating floating-point data often need
	9	to be parameterized in terms of the accuracy of the representation.
	10	Some of the functions in the C library itself need this information; for
	11	example, the algorithms for printing and reading floating-point numbers
	12	(@pxref{I/O on Streams}) and for calculating trigonometric and
	13	irrational functions (@pxref{Mathematics}) use information about the
	14	underlying floating-point representation to avoid round-off error and
	15	loss of accuracy. User programs that implement numerical analysis
	16	techniques also often need to be parameterized in this way in order to
	17	minimize or compute error bounds.
	18
	19	The specific representation of floating-point numbers varies from
	20	machine to machine. The GNU C Library defines a set of parameters which
	21	characterize each of the supported floating-point representations on a
	22	particular system.
	23
	24	@menu
	25	* Floating-Point Representation:: Definitions of terminology.
	26	* Floating-Point Parameters:: Descriptions of the library facilities.
	27	* IEEE Floating-Point:: An example of a common representation.
	28	@end menu
	29
	30	@node Floating-Point Representation
	31	@section Floating-Point Representation
	32
	33	This section introduces the terminology used to characterize the
	34	representation of floating-point numbers.
	35
	36	You are probably already familiar with most of these concepts in terms
	37	of scientific or exponential notation for floating-point numbers. For
	38	example, the number @code{123456.0} could be expressed in exponential
	39	notation as @code{1.23456e+05}, a shorthand notation indicating that the
	40	mantissa @code{1.23456} is multiplied by the base @code{10} raised to
	41	power @code{5}.
	42
	43	More formally, the internal representation of a floating-point number
	44	can be characterized in terms of the following parameters:
	45
	46	@itemize @bullet
	47	@item
	48	The @dfn{sign} is either @code{-1} or @code{1}.
	49	@cindex sign (of floating-point number)
	50
	51	@item
	52	The @dfn{base} or @dfn{radix} for exponentiation; an integer greater
	53	than @code{1}. This is a constant for the particular representation.
	54	@cindex base (of floating-point number)
	55	@cindex radix (of floating-point number)
	56
	57	@item
	58	The @dfn{exponent} to which the base is raised. The upper and lower
	59	bounds of the exponent value are constants for the particular
	60	representation.
	61	@cindex exponent (of floating-point number)
	62
	63	Sometimes, in the actual bits representing the floating-point number,
	64	the exponent is @dfn{biased} by adding a constant to it, to make it
	65	always be represented as an unsigned quantity. This is only important
66	if you have some reason to pick apart the bit fields making up the
67	floating-point number by hand, which is something for which the GNU
68	library provides no support. So this is ignored in the discussion that
69	follows.
70	@cindex bias, in exponent (of floating-point number)
71
72	@item
73	The value of the @dfn{mantissa} or @dfn{significand}, which is an
74	unsigned quantity.
75	@cindex mantissa (of floating-point number)
76	@cindex significand (of floating-point number)
77
f65fd747	78	@item
28f540f4 RM	79	The @dfn{precision} of the mantissa. If the base of the representation
	80	is @var{b}, then the precision is the number of base-@var{b} digits in
	81	the mantissa. This is a constant for the particular representation.
	82
	83	Many floating-point representations have an implicit @dfn{hidden bit} in
	84	the mantissa. Any such hidden bits are counted in the precision.
	85	Again, the GNU library provides no facilities for dealing with such low-level
	86	aspects of the representation.
	87	@cindex precision (of floating-point number)
	88	@cindex hidden bit, in mantissa (of floating-point number)
	89	@end itemize
	90
	91	The mantissa of a floating-point number actually represents an implicit
	92	fraction whose denominator is the base raised to the power of the
	93	precision. Since the largest representable mantissa is one less than
	94	this denominator, the value of the fraction is always strictly less than
	95	@code{1}. The mathematical value of a floating-point number is then the
	96	product of this fraction; the sign; and the base raised to the exponent.
	97
	98	If the floating-point number is @dfn{normalized}, the mantissa is also
	99	greater than or equal to the base raised to the power of one less
	100	than the precision (unless the number represents a floating-point zero,
	101	in which case the mantissa is zero). The fractional quantity is
	102	therefore greater than or equal to @code{1/@var{b}}, where @var{b} is
	103	the base.
	104	@cindex normalized floating-point number
	105
	106	@node Floating-Point Parameters
	107	@section Floating-Point Parameters
	108
	109	@strong{Incomplete:} This section needs some more concrete examples
	110	of what these parameters mean and how to use them in a program.
	111
	112	These macro definitions can be accessed by including the header file
	113	@file{<float.h>} in your program.
	114
	115	Macro names starting with @samp{FLT_} refer to the @code{float} type,
	116	while names beginning with @samp{DBL_} refer to the @code{double} type
	117	and names beginning with @samp{LDBL_} refer to the @code{long double}
	118	type. (In implementations that do not support @code{long double} as
	119	a distinct data type, the values for those constants are the same
	120	as the corresponding constants for the @code{double} type.)@refill
	121
	122	Note that only @code{FLT_RADIX} is guaranteed to be a constant
	123	expression, so the other macros listed here cannot be reliably used in
	124	places that require constant expressions, such as @samp{#if}
	125	preprocessing directives and array size specifications.
	126
f65fd747	127	Although the @w{ISO C} standard specifies minimum and maximum values for
28f540f4 RM	128	most of these parameters, the GNU C implementation uses whatever
28f540f4 RM	129	floating-point representations are supported by the underlying hardware.
f65fd747	130	So whether GNU C actually satisfies the @w{ISO C} requirements depends on
28f540f4 RM	131	what machine it is running on.
	132
	133	@comment float.h
f65fd747	134	@comment ISO
28f540f4 RM	135	@defvr Macro FLT_ROUNDS
	136	This value characterizes the rounding mode for floating-point addition.
	137	The following values indicate standard rounding modes:
	138
	139	@table @code
	140	@item -1
	141	The mode is indeterminable.
	142	@item 0
	143	Rounding is towards zero.
	144	@item 1
	145	Rounding is to the nearest number.
	146	@item 2
	147	Rounding is towards positive infinity.
	148	@item 3
	149	Rounding is towards negative infinity.
	150	@end table
	151
	152	@noindent
	153	Any other value represents a machine-dependent nonstandard rounding
	154	mode.
	155	@end defvr
	156
	157	@comment float.h
f65fd747	158	@comment ISO
28f540f4 RM	159	@defvr Macro FLT_RADIX
	160	This is the value of the base, or radix, of exponent representation.
	161	This is guaranteed to be a constant expression, unlike the other macros
	162	described in this section.
	163	@end defvr
	164
	165	@comment float.h
f65fd747	166	@comment ISO
28f540f4 RM	167	@defvr Macro FLT_MANT_DIG
	168	This is the number of base-@code{FLT_RADIX} digits in the floating-point
	169	mantissa for the @code{float} data type.
	170	@end defvr
	171
	172	@comment float.h
f65fd747	173	@comment ISO
28f540f4 RM	174	@defvr Macro DBL_MANT_DIG
	175	This is the number of base-@code{FLT_RADIX} digits in the floating-point
	176	mantissa for the @code{double} data type.
	177	@end defvr
	178
	179	@comment float.h
f65fd747	180	@comment ISO
28f540f4 RM	181	@defvr Macro LDBL_MANT_DIG
	182	This is the number of base-@code{FLT_RADIX} digits in the floating-point
	183	mantissa for the @code{long double} data type.
	184	@end defvr
	185
	186	@comment float.h
f65fd747	187	@comment ISO
28f540f4 RM	188	@defvr Macro FLT_DIG
	189	This is the number of decimal digits of precision for the @code{float}
	190	data type. Technically, if @var{p} and @var{b} are the precision and
	191	base (respectively) for the representation, then the decimal precision
	192	@var{q} is the maximum number of decimal digits such that any floating
	193	point number with @var{q} base 10 digits can be rounded to a floating
	194	point number with @var{p} base @var{b} digits and back again, without
	195	change to the @var{q} decimal digits.
	196
	197	The value of this macro is guaranteed to be at least @code{6}.
	198	@end defvr
	199
	200	@comment float.h
f65fd747	201	@comment ISO
28f540f4 RM	202	@defvr Macro DBL_DIG
	203	This is similar to @code{FLT_DIG}, but is for the @code{double} data
	204	type. The value of this macro is guaranteed to be at least @code{10}.
	205	@end defvr
	206
	207	@comment float.h
f65fd747	208	@comment ISO
28f540f4 RM	209	@defvr Macro LDBL_DIG
	210	This is similar to @code{FLT_DIG}, but is for the @code{long double}
	211	data type. The value of this macro is guaranteed to be at least
	212	@code{10}.
	213	@end defvr
	214
	215	@comment float.h
f65fd747	216	@comment ISO
28f540f4 RM	217	@defvr Macro FLT_MIN_EXP
	218	This is the minimum negative integer such that the mathematical value
	219	@code{FLT_RADIX} raised to this power minus 1 can be represented as a
	220	normalized floating-point number of type @code{float}. In terms of the
	221	actual implementation, this is just the smallest value that can be
	222	represented in the exponent field of the number.
	223	@end defvr
	224
	225	@comment float.h
f65fd747	226	@comment ISO
28f540f4 RM	227	@defvr Macro DBL_MIN_EXP
	228	This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
	229	type.
	230	@end defvr
	231
	232	@comment float.h
f65fd747	233	@comment ISO
28f540f4 RM	234	@defvr Macro LDBL_MIN_EXP
	235	This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
	236	data type.
	237	@end defvr
	238
	239	@comment float.h
f65fd747	240	@comment ISO
28f540f4 RM	241	@defvr Macro FLT_MIN_10_EXP
	242	This is the minimum negative integer such that the mathematical value
	243	@code{10} raised to this power minus 1 can be represented as a
	244	normalized floating-point number of type @code{float}. This is
	245	guaranteed to be no greater than @code{-37}.
	246	@end defvr
	247
	248	@comment float.h
f65fd747	249	@comment ISO
28f540f4 RM	250	@defvr Macro DBL_MIN_10_EXP
	251	This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
	252	data type.
	253	@end defvr
	254
	255	@comment float.h
f65fd747	256	@comment ISO
28f540f4 RM	257	@defvr Macro LDBL_MIN_10_EXP
	258	This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
	259	double} data type.
	260	@end defvr
	261
	262
	263
	264	@comment float.h
f65fd747	265	@comment ISO
28f540f4 RM	266	@defvr Macro FLT_MAX_EXP
	267	This is the maximum negative integer such that the mathematical value
	268	@code{FLT_RADIX} raised to this power minus 1 can be represented as a
	269	floating-point number of type @code{float}. In terms of the actual
	270	implementation, this is just the largest value that can be represented
	271	in the exponent field of the number.
	272	@end defvr
	273
	274	@comment float.h
f65fd747	275	@comment ISO
28f540f4 RM	276	@defvr Macro DBL_MAX_EXP
	277	This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
	278	type.
	279	@end defvr
	280
	281	@comment float.h
f65fd747	282	@comment ISO
28f540f4 RM	283	@defvr Macro LDBL_MAX_EXP
	284	This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
	285	data type.
	286	@end defvr
	287
	288	@comment float.h
f65fd747	289	@comment ISO
28f540f4 RM	290	@defvr Macro FLT_MAX_10_EXP
	291	This is the maximum negative integer such that the mathematical value
	292	@code{10} raised to this power minus 1 can be represented as a
	293	normalized floating-point number of type @code{float}. This is
	294	guaranteed to be at least @code{37}.
	295	@end defvr
	296
	297	@comment float.h
f65fd747	298	@comment ISO
28f540f4 RM	299	@defvr Macro DBL_MAX_10_EXP
	300	This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
	301	data type.
	302	@end defvr
	303
	304	@comment float.h
f65fd747	305	@comment ISO
28f540f4 RM	306	@defvr Macro LDBL_MAX_10_EXP
	307	This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
	308	double} data type.
	309	@end defvr
	310
	311
	312	@comment float.h
f65fd747	313	@comment ISO
28f540f4 RM	314	@defvr Macro FLT_MAX
	315	The value of this macro is the maximum representable floating-point
	316	number of type @code{float}, and is guaranteed to be at least
	317	@code{1E+37}.
	318	@end defvr
	319
	320	@comment float.h
f65fd747	321	@comment ISO
28f540f4 RM	322	@defvr Macro DBL_MAX
	323	The value of this macro is the maximum representable floating-point
	324	number of type @code{double}, and is guaranteed to be at least
	325	@code{1E+37}.
	326	@end defvr
	327
	328	@comment float.h
f65fd747	329	@comment ISO
28f540f4 RM	330	@defvr Macro LDBL_MAX
	331	The value of this macro is the maximum representable floating-point
	332	number of type @code{long double}, and is guaranteed to be at least
	333	@code{1E+37}.
	334	@end defvr
	335
	336
	337	@comment float.h
f65fd747	338	@comment ISO
28f540f4 RM	339	@defvr Macro FLT_MIN
	340	The value of this macro is the minimum normalized positive
	341	floating-point number that is representable by type @code{float}, and is
	342	guaranteed to be no more than @code{1E-37}.
	343	@end defvr
	344
	345	@comment float.h
f65fd747	346	@comment ISO
28f540f4 RM	347	@defvr Macro DBL_MIN
	348	The value of this macro is the minimum normalized positive
	349	floating-point number that is representable by type @code{double}, and
	350	is guaranteed to be no more than @code{1E-37}.
	351	@end defvr
	352
	353	@comment float.h
f65fd747	354	@comment ISO
28f540f4 RM	355	@defvr Macro LDBL_MIN
	356	The value of this macro is the minimum normalized positive
	357	floating-point number that is representable by type @code{long double},
	358	and is guaranteed to be no more than @code{1E-37}.
	359	@end defvr
	360
	361
	362	@comment float.h
f65fd747	363	@comment ISO
28f540f4 RM	364	@defvr Macro FLT_EPSILON
	365	This is the minimum positive floating-point number of type @code{float}
	366	such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to
	367	be no greater than @code{1E-5}.
	368	@end defvr
	369
	370	@comment float.h
f65fd747	371	@comment ISO
28f540f4 RM	372	@defvr Macro DBL_EPSILON
	373	This is similar to @code{FLT_EPSILON}, but is for the @code{double}
	374	type. The maximum value is @code{1E-9}.
	375	@end defvr
	376
	377	@comment float.h
f65fd747	378	@comment ISO
28f540f4 RM	379	@defvr Macro LDBL_EPSILON
	380	This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
	381	type. The maximum value is @code{1E-9}.
	382	@end defvr
	383
	384
	385
	386	@node IEEE Floating Point
	387	@section IEEE Floating Point
	388
	389	Here is an example showing how these parameters work for a common
	390	floating point representation, specified by the @cite{IEEE Standard for
f65fd747 UD	391	Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 or ANSI/IEEE
f65fd747 UD	392	Std 854-1987)}.
28f540f4 RM	393
	394	The IEEE single-precision float representation uses a base of 2. There
	395	is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
	396	precision is 24 base-2 digits), and an 8-bit exponent that can represent
	397	values in the range -125 to 128, inclusive.
	398
	399	So, for an implementation that uses this representation for the
	400	@code{float} data type, appropriate values for the corresponding
	401	parameters are:
	402
	403	@example
	404	FLT_RADIX 2
	405	FLT_MANT_DIG 24
	406	FLT_DIG 6
	407	FLT_MIN_EXP -125
	408	FLT_MIN_10_EXP -37
	409	FLT_MAX_EXP 128
	410	FLT_MAX_10_EXP +38
	411	FLT_MIN 1.17549435E-38F
	412	FLT_MAX 3.40282347E+38F
	413	FLT_EPSILON 1.19209290E-07F
	414	@end example