* doc/autoconf.texi (Integer Overflow): Greatly expand and

author Paul Eggert <eggert@cs.ucla.edu>

Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)

committer Paul Eggert <eggert@cs.ucla.edu>

Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)
author Paul Eggert <eggert@cs.ucla.edu>
Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)
committer Paul Eggert <eggert@cs.ucla.edu>
Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)
diff --git a/ChangeLog b/ChangeLog

index 501f72b906b3485ef66ca828e4046b5f56ab6273..953c4f8110f2ef4b171b3bc937ef57b710fe7104 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2007-01-02  Paul Eggert  <eggert@cs.ucla.edu>
+
+       * doc/autoconf.texi (Integer Overflow): Greatly expand and
+       rewrite, taking notions from the recent discussion on the gcc and
+       autoconf mailing lists; please see
+       http://lists.gnu.org/archive/html/autoconf-patches/2006-12/msg00091.html
+       and follow the many links.
+       (Integer Overflow Basics, Signed Overflow Examples):
+       (Optimization and Wraparound, Signed Overflow Advice):
+       (Signed Integer Division): New sections.
+
  2006-12-28  Steven G. Johnson  <stevenj@alum.mit.edu>
  
         * lib/autoconf/general.m4 (AC_DEFINE_TRACE): Don't include
@@ -13034,8 +13045,8 @@
  
         -----
  
-       Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Free Software
-       Foundation, Inc.
+       Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free
+       Software Foundation, Inc.
  
         This file is part of GNU Autoconf.
  
diff --git a/doc/autoconf.texi b/doc/autoconf.texi

index f225e79428da0c8e2eff2f984ffa1ced2fd6c810..73255da09668b7bb275a4e2a721fcc28ea8cb345 100644 (file)
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -183,7 +183,7 @@ a package for creating scripts to configure source code packages using
  templates and an M4 macro package.
  
  Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
+2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
  
  @quotation
  Permission is granted to copy, distribute and/or modify this document
@@ -14949,15 +14949,296 @@ the programs work well enough in practice.
  
  @node Integer Overflow
  @section Integer Overflow
-@cindex overflow, arithmetic
-
-In C, signed integer overflow leads to undefined behavior.  However,
-many programs and Autoconf tests assume that signed integer overflow after
-addition, subtraction, or multiplication silently
-wraps around modulo a power of two, using two's complement arithmetic,
-so long as you cast the resulting value
-to an integer type or store it into an integer variable.  Such programs
-are portable to the vast majority of modern platforms.  However, signed
+@cindex integer overflow
+@cindex overflow, signed integer
+@cindex signed integer overflow
+@cindex wraparound arithmetic
+
+Many portable C programs assume that signed integer overflow wraps
+around reliably using two's complement arithmetic.  Yet the C standard
+says that program behavior is undefined on overflow, and in a few cases
+C programs do not work on some modern implementations because their
+overflows do not wrap around as their authors intended.  Conversely, in
+at least one common case related to overflow, the C standard requires
+behavior that is commonly not implemented.
+
+@menu
+* Integer Overflow Basics::      Why integer overflow is a problem
+* Signed Overflow Examples::     Examples of code assuming wraparound
+* Optimization and Wraparound::  Optimizations that break uses of wraparound
+* Signed Overflow Advice::       Practical advice for signed overflow issues
+* Signed Integer Division::      @code{INT_MIN / -1} and @code{INT_MIN % -1}
+@end menu
+
+@node Integer Overflow Basics
+@subsection Basics of Integer Overflow
+@cindex integer overflow
+@cindex overflow, signed integer
+@cindex signed integer overflow
+@cindex wraparound arithmetic
+
+In languages like C, unsigned integer overflow reliably wraps around
+modulo the word size.  This is guaranteed by the C standard and is
+portable in practice, unless you specify aggressive optimization options
+suitable only for special applications.
+
+In contrast, the C standard says that signed integer overflow leads to
+undefined behavior where a program can do anything, including dumping
+core or overrunning a buffer.  The misbehavior can even precede the
+overflow.  Such an overflow can occur during addition, subtraction,
+multiplication, division, and left shift.
+
+Despite this requirement of the standard, many C programs and Autoconf
+tests assume that signed integer overflow silently wraps around modulo a
+power of two, using two's complement arithmetic, so long as you cast the
+resulting value to a signed integer type or store it into a signed
+integer variable.  If you use conservative optimization flags, such
+programs are generally portable to the vast majority of modern
+platforms, with a few exceptions discussed later.
+
+For historical reasons the C standard also allows implementations with
+ones' complement or signed magnitude arithmetic, but it is safe to
+assume two's complement nowadays.
+
+@node Signed Overflow Examples
+@subsection Examples of Code Assuming Wraparound Overflow
+@cindex integer overflow
+@cindex overflow, signed integer
+@cindex signed integer overflow
+@cindex wraparound arithmetic
+
+There has long been a tension between what the C standard requires for
+signed integer overflow, and what C programs commonly assume.  The
+standard allows aggressive optimizations based on assumptions that
+overflow never occurs, but many practical C programs rely on overflow
+wrapping around.  These programs do not conform to the standard, but
+they commonly work in practice because compiler writers are
+understandably reluctant to implement optimizations that would break
+many programs, unless perhaps a user specifies aggressive optimization.
+
+The C Standard says that if a program has signed integer overflow its
+behavior is undefined, and the undefined behavior can even precede the
+overflow.  To take an extreme example:
+
+@c Inspired by Robert Dewar's example in
+@c <http://gcc.gnu.org/ml/gcc/2007-01/msg00038.html> (2007-01-01).
+@example
+if (password == expected_password)
+  allow_superuser_privileges ();
+else
+  printf ("%d password mismatches\n", counter++);
+@end example
+
+@noindent
+If @code{counter} is an @code{int} and a compiler can deduce that
+@code{counter == INT_MAX} or that @code{counter} previously overflowed,
+the C standard allows the compiler to optimize away the password test
+and generate code that allows superuser privileges unconditionally.
+
+Despite this requirement by the standard, it has long been common for C
+code to assume wraparound arithmetic after signed overflow, and all
+known practical C implementations support some C idioms that assume
+wraparound signed arithmetic, even if the idioms does not conform
+strictly to the standard.  If your code looks like the following
+examples it will almost surely work with real-world compilers.
+
+Here is an example derived from the 7th Edition Unix implementation of
+@code{atoi} (1979-01-10):
+
+@example
+char *p;
+int f, n;
+@dots{}
+while (*p >= '0' && *p <= '9')
+  n = n * 10 + *p++ - '0';
+return (f ? -n : n);
+@end example
+
+@noindent
+Even if the input string is in range, on most modern machines this has
+signed overflow when computing the most negative integer (the @code{-n}
+overflows) or a value near an extreme integer (the first @code{+}
+overflows).
+
+Here is another example, taken from the 7th Edition implementation of
+@code{rand} (1979-01-10).  Here the programmer expects both
+multiplication and addition to wrap on overflow:
+
+@example
+static long int randx = 1;
+@dots{}
+randx = randx * 1103515245 + 12345;
+return (randx >> 16) & 077777;
+@end example
+
+In the following example, derived from the @acronym{GNU} C Library 2.5
+implementation of @code{mktime} (2006-09-09), the code assumes
+wraparound arithmetic in @code{+} to detect signed overflow:
+
+@example
+time_t t, t1, t2;
+int sec_requested, sec_adjustment;
+@dots{}
+t1 = t + sec_requested;
+t2 = t1 + sec_adjustment;
+if (((t1 < t) != (sec_requested < 0))
+    | ((t2 < t1) != (sec_adjustment < 0)))
+  return -1;
+@end example
+
+If your code looks like these examples, it is probably safe even though
+it does not strictly conform to the C standard.  This might lead one to
+believe that one can generally assume wraparound on overflow, but that
+is not always true, as can be seen in the next section.
+
+@node Optimization and Wraparound
+@subsection Optimizations That Break Wraparound Arithmetic
+@cindex loop induction
+
+Compilers sometimes generate code that is incompatible with wraparound
+integer arithmetic.  A simple example is an algebraic simplification: a
+compiler might translate @code{(i * 2000) / 1000} to @code{i * 2}
+because it assumes that @code{i * 2000} does not overflow.  The
+translation is not equivalent to the original when overflow occurs:
+e.g., in the typical case of 32-bit signed two's complement wraparound
+@code{int}, if @code{i} has type @code{int} and value @code{1073742},
+the original expression returns @minus{}2147483 but the optimized
+version returns the mathematically correct value 2147484.
+
+More subtly, loop induction optimizations often exploit the undefined
+behavior of signed overflow.  Consider the following contrived function
+@code{sumc}:
+
+@example
+int
+sumc (int lo, int hi)
+@{
+  int sum = 0;
+  int i;
+  for (i = lo; i <= hi; i++)
+    sum ^= i * 53;
+  return sum;
+@}
+@end example
+
+@noindent
+To avoid multiplying by 53 each time through the loop, an optimizing
+compiler might internally transform @code{sumc} to the equivalent of the
+following:
+
+@example
+int
+transformed_sumc (int lo, int hi)
+@{
+  int sum = 0;
+  int hic = hi * 53;
+  int ic;
+  for (ic = lo * 53; ic <= hic; ic += 53)
+    sum ^= ic;
+  return sum;
+@}
+@end example
+
+@noindent
+This transformation is allowed by the C standard, but it is invalid for
+wraparound arithmetic when @code{INT_MAX / 53 < hi}, because then the
+overflow in computing expressions like @code{hi * 53} can cause the
+expression @code{i <= hi} to yield a different value from the
+transformed expression @code{ic <= hic}.
+
+For this reason, compilers that use loop induction and similar
+techniques often do not support reliable wraparound arithmetic when a
+loop induction variable like @code{ic} is involved.  Since loop
+induction variables are generated by the compiler, and are not visible
+in the source code, it is not always trivial to say whether the problem
+affects your code.
+
+Hardly any code actually depends on wraparound arithmetic in cases like
+these, so in practice these loop induction optimizations are almost
+always useful.  However, edge cases in this area can cause problems.
+For example:
+
+@example
+int j;
+for (j = 1; 0 < j; j *= 2)
+  test (j);
+@end example
+
+@noindent
+Here, the loop attempts to iterate through all powers of 2 that
+@code{int} can represent, but some test versions of @acronym{GCC}
+optimize away the comparison to zero and thus generate an infinite loop,
+under the argument that behavior is undefined on overflow.  As of this
+writing this optimization is not done by any production version of
+@acronym{GCC} with @option{-O2}, but it might be performed by more
+aggressive @acronym{GCC} optimization options, or by other compilers.
+
+@node Signed Overflow Advice
+@subsection Practical Advice for Signed Overflow Issues
+@cindex integer overflow
+@cindex overflow, signed integer
+@cindex signed integer overflow
+@cindex wraparound arithmetic
+
+Ideally the safest approach is to avoid signed integer overflow
+entirely.  For example, instead of multiplying two signed integers, you
+can convert them to unsigned integers, multiply the unsigned values,
+then test whether the result is in signed range.
+
+Rewriting code in this way will be inconvenient, though, particularly if
+the signed values might be negative.  Also, it will probably hurt
+performance.  Using unsigned arithmetic to check for overflow is
+particularly painful to do portably and efficiently when dealing with an
+integer type like @code{uid_t} whose width and signedness vary from
+platform to platform.
+
+Furthermore, many C applications pervasively assume wraparound behavior
+and typically it is not easy to find and remove all these assumptions.
+Hence it is often useful to maintain nonstandard code that assumes
+wraparound on overflow, instead of rewriting the code.  The rest of this
+section attempts to give practical advice for this situation.
+
+If your code uses a signed loop index, make sure that the index cannot
+overflow, along with all signed expressions derived from the index.
+Here is a contrived example of problematic code with two instances of
+overflow.
+
+@example
+for (i = INT_MAX - 10; i <= INT_MAX; i++)
+  if (i + 1 < 0)
+    @{
+      report_overflow ();
+      break;
+    @}
+@end example
+
+@noindent
+Because of the two overflows, a compiler might optimize away or
+transform the two comparisons in a way that is incompatible with the
+wraparound assumption.
+
+If your code uses an expression like @code{(i * 2000) / 1000} and you
+actually want the multiplication to wrap around reliably, put the
+product into a temporary variable and divide that by 1000.  This
+inhibits the algebraic optimization on many platforms.
+
+If your code assumes wraparound behavior and you want to insulate it
+against any @acronym{GCC} optimizations that would fail to support that
+behavior, you should use @acronym{GCC}'s @option{-fwrapv} option, which
+causes signed overflow to wrap around reliably (except for division and
+remainder, as discussed in the next section).
+
+If you need to port to platforms where signed integer overflow does not
+reliably wrap around (e.g., due to hardware overflow checking, or to
+highly aggressive optimizations), you should consider using
+@acronym{GCC}'s @option{-ftrapv} option, which causes signed overflow to
+raise an exception.
+
+@node Signed Integer Division
+@subsection Signed Integer Division and Integer Overflow
+@cindex division, integer
+
+Overflow in signed
  integer division is not always harmless: for example, on CPUs of the
  i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal
  which by default terminates the program.  Worse, taking the remainder
@@ -14965,14 +15246,6 @@ of these two values typically yields the same signal on these CPUs,
  even though the C standard requires @code{INT_MIN % -1} to yield zero
  because the expression does not overflow.
  
-@acronym{GCC} users might consider using the
-@option{-ftrapv} option if they are worried about porting their code to
-the rare platforms where signed integer overflow does not wrap around
-after addition, subtraction, or multiplication.
-
-Unsigned integer overflow reliably wraps around modulo the word size.
-This is guaranteed by the C standard and is portable in practice.
-
  @node Null Pointers
  @section Properties of Null Pointers
  @cindex null pointers
author	Paul Eggert <eggert@cs.ucla.edu>
	Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)
committer	Paul Eggert <eggert@cs.ucla.edu>
	Tue, 2 Jan 2007 23:06:07 +0000 (23:06 +0000)
ChangeLog		patch \| blob \| blame \| history
doc/autoconf.texi		patch \| blob \| blame \| history