gcc/doc/cpp/macros/macro-pitfalls.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 .. index:: problems with macros, pitfalls of macros
   7
   8 .. _macro-pitfalls:
   9
  10 Macro Pitfalls
  11 **************
  12
  13 In this section we describe some special rules that apply to macros and
  14 macro expansion, and point out certain cases in which the rules have
  15 counter-intuitive consequences that you must watch out for.
  16
  17 .. toctree::
  18   :maxdepth: 2
  19
  20
  21 .. _misnesting:
  22
  23 Misnesting
  24 ^^^^^^^^^^
  25
  26 When a macro is called with arguments, the arguments are substituted
  27 into the macro body and the result is checked, together with the rest of
  28 the input file, for more macro calls.  It is possible to piece together
  29 a macro call coming partially from the macro body and partially from the
  30 arguments.  For example,
  31
  32 .. code-block::
  33
  34   #define twice(x) (2*(x))
  35   #define call_with_1(x) x(1)
  36   call_with_1 (twice)
  37        → twice(1)
  38        → (2*(1))
  39
  40 Macro definitions do not have to have balanced parentheses.  By writing
  41 an unbalanced open parenthesis in a macro body, it is possible to create
  42 a macro call that begins inside the macro body but ends outside of it.
  43 For example,
  44
  45 .. code-block::
  46
  47   #define strange(file) fprintf (file, "%s %d",
  48   ...
  49   strange(stderr) p, 35)
  50        → fprintf (stderr, "%s %d", p, 35)
  51
  52 The ability to piece together a macro call can be useful, but the use of
  53 unbalanced open parentheses in a macro body is just confusing, and
  54 should be avoided.
  55
  56 .. index:: parentheses in macro bodies
  57
  58 .. _operator-precedence-problems:
  59
  60 Operator Precedence Problems
  61 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  62
  63 You may have noticed that in most of the macro definition examples shown
  64 above, each occurrence of a macro argument name had parentheses around
  65 it.  In addition, another pair of parentheses usually surround the
  66 entire macro definition.  Here is why it is best to write macros that
  67 way.
  68
  69 Suppose you define a macro as follows,
  70
  71 .. code-block:: c++
  72
  73   #define ceil_div(x, y) (x + y - 1) / y
  74
  75 whose purpose is to divide, rounding up.  (One use for this operation is
  76 to compute how many ``int`` objects are needed to hold a certain
  77 number of ``char`` objects.)  Then suppose it is used as follows:
  78
  79 .. code-block::
  80
  81   a = ceil_div (b & c, sizeof (int));
  82        → a = (b & c + sizeof (int) - 1) / sizeof (int);
  83
  84 This does not do what is intended.  The operator-precedence rules of
  85 C make it equivalent to this:
  86
  87 .. code-block:: c++
  88
  89   a = (b & (c + sizeof (int) - 1)) / sizeof (int);
  90
  91 What we want is this:
  92
  93 .. code-block:: c++
  94
  95   a = ((b & c) + sizeof (int) - 1)) / sizeof (int);
  96
  97 Defining the macro as
  98
  99 .. code-block:: c++
 100
 101   #define ceil_div(x, y) ((x) + (y) - 1) / (y)
 102
 103 provides the desired result.
 104
 105 Unintended grouping can result in another way.  Consider ``sizeof
 106 ceil_div(1, 2)``.  That has the appearance of a C expression that would
 107 compute the size of the type of ``ceil_div (1, 2)``, but in fact it
 108 means something very different.  Here is what it expands to:
 109
 110 .. code-block:: c++
 111
 112   sizeof ((1) + (2) - 1) / (2)
 113
 114 This would take the size of an integer and divide it by two.  The
 115 precedence rules have put the division outside the ``sizeof`` when it
 116 was intended to be inside.
 117
 118 Parentheses around the entire macro definition prevent such problems.
 119 Here, then, is the recommended way to define ``ceil_div`` :
 120
 121 .. code-block:: c++
 122
 123   #define ceil_div(x, y) (((x) + (y) - 1) / (y))
 124
 125 .. index:: semicolons (after macro calls)
 126
 127 .. _swallowing-the-semicolon:
 128
 129 Swallowing the Semicolon
 130 ^^^^^^^^^^^^^^^^^^^^^^^^
 131
 132 Often it is desirable to define a macro that expands into a compound
 133 statement.  Consider, for example, the following macro, that advances a
 134 pointer (the argument ``p`` says where to find it) across whitespace
 135 characters:
 136
 137 .. code-block:: c++
 138
 139   #define SKIP_SPACES(p, limit)  \
 140   { char *lim = (limit);         \
 141     while (p < lim) {            \
 142       if (*p++ != ' ') {         \
 143         p--; break; }}}
 144
 145 Here backslash-newline is used to split the macro definition, which must
 146 be a single logical line, so that it resembles the way such code would
 147 be laid out if not part of a macro definition.
 148
 149 A call to this macro might be ``SKIP_SPACES (p, lim)``.  Strictly
 150 speaking, the call expands to a compound statement, which is a complete
 151 statement with no need for a semicolon to end it.  However, since it
 152 looks like a function call, it minimizes confusion if you can use it
 153 like a function call, writing a semicolon afterward, as in
 154 ``SKIP_SPACES (p, lim);``
 155
 156 This can cause trouble before ``else`` statements, because the
 157 semicolon is actually a null statement.  Suppose you write
 158
 159 .. code-block:: c++
 160
 161   if (*p != 0)
 162     SKIP_SPACES (p, lim);
 163   else ...
 164
 165 The presence of two statements---the compound statement and a null
 166 statement---in between the ``if`` condition and the ``else``
 167 makes invalid C code.
 168
 169 The definition of the macro ``SKIP_SPACES`` can be altered to solve
 170 this problem, using a ``do ... while`` statement.  Here is how:
 171
 172 .. code-block:: c++
 173
 174   #define SKIP_SPACES(p, limit)     \
 175   do { char *lim = (limit);         \
 176        while (p < lim) {            \
 177          if (*p++ != ' ') {         \
 178            p--; break; }}}          \
 179   while (0)
 180
 181 Now ``SKIP_SPACES (p, lim);`` expands into
 182
 183 .. code-block:: c++
 184
 185   do {...} while (0);
 186
 187 which is one statement.  The loop executes exactly once; most compilers
 188 generate no extra code for it.
 189
 190 .. index:: side effects (in macro arguments), unsafe macros
 191
 192 .. _duplication-of-side-effects:
 193
 194 Duplication of Side Effects
 195 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 196
 197 Many C programs define a macro ``min``, for 'minimum', like this:
 198
 199 .. code-block:: c++
 200
 201   #define min(X, Y)  ((X) < (Y) ? (X) : (Y))
 202
 203 When you use this macro with an argument containing a side effect,
 204 as shown here,
 205
 206 .. code-block:: c++
 207
 208   next = min (x + y, foo (z));
 209
 210 it expands as follows:
 211
 212 .. code-block:: c++
 213
 214   next = ((x + y) < (foo (z)) ? (x + y) : (foo (z)));
 215
 216 where ``x + y`` has been substituted for ``X`` and ``foo (z)``
 217 for ``Y``.
 218
 219 The function ``foo`` is used only once in the statement as it appears
 220 in the program, but the expression ``foo (z)`` has been substituted
 221 twice into the macro expansion.  As a result, ``foo`` might be called
 222 two times when the statement is executed.  If it has side effects or if
 223 it takes a long time to compute, the results might not be what you
 224 intended.  We say that ``min`` is an :dfn:`unsafe` macro.
 225
 226 The best solution to this problem is to define ``min`` in a way that
 227 computes the value of ``foo (z)`` only once.  The C language offers
 228 no standard way to do this, but it can be done with GNU extensions as
 229 follows:
 230
 231 .. code-block:: c++
 232
 233   #define min(X, Y)                \
 234   ({ typeof (X) x_ = (X);          \
 235      typeof (Y) y_ = (Y);          \
 236      (x_ < y_) ? x_ : y_; })
 237
 238 The :samp:`({ ... })` notation produces a compound statement that
 239 acts as an expression.  Its value is the value of its last statement.
 240 This permits us to define local variables and assign each argument to
 241 one.  The local variables have underscores after their names to reduce
 242 the risk of conflict with an identifier of wider scope (it is impossible
 243 to avoid this entirely).  Now each argument is evaluated exactly once.
 244
 245 If you do not wish to use GNU C extensions, the only solution is to be
 246 careful when *using* the macro ``min``.  For example, you can
 247 calculate the value of ``foo (z)``, save it in a variable, and use
 248 that variable in ``min`` :
 249
 250 .. code-block:: c++
 251
 252   #define min(X, Y)  ((X) < (Y) ? (X) : (Y))
 253   ...
 254   {
 255     int tem = foo (z);
 256     next = min (x + y, tem);
 257   }
 258
 259 (where we assume that ``foo`` returns type ``int``).
 260
 261 .. index:: self-reference
 262
 263 .. _self-referential-macros:
 264
 265 Self-Referential Macros
 266 ^^^^^^^^^^^^^^^^^^^^^^^
 267
 268 A :dfn:`self-referential` macro is one whose name appears in its
 269 definition.  Recall that all macro definitions are rescanned for more
 270 macros to replace.  If the self-reference were considered a use of the
 271 macro, it would produce an infinitely large expansion.  To prevent this,
 272 the self-reference is not considered a macro call.  It is passed into
 273 the preprocessor output unchanged.  Consider an example:
 274
 275 .. code-block:: c++
 276
 277   #define foo (4 + foo)
 278
 279 where ``foo`` is also a variable in your program.
 280
 281 Following the ordinary rules, each reference to ``foo`` will expand
 282 into ``(4 + foo)`` ; then this will be rescanned and will expand into
 283 ``(4 + (4 + foo))`` ; and so on until the computer runs out of memory.
 284
 285 The self-reference rule cuts this process short after one step, at
 286 ``(4 + foo)``.  Therefore, this macro definition has the possibly
 287 useful effect of causing the program to add 4 to the value of ``foo``
 288 wherever ``foo`` is referred to.
 289
 290 In most cases, it is a bad idea to take advantage of this feature.  A
 291 person reading the program who sees that ``foo`` is a variable will
 292 not expect that it is a macro as well.  The reader will come across the
 293 identifier ``foo`` in the program and think its value should be that
 294 of the variable ``foo``, whereas in fact the value is four greater.
 295
 296 One common, useful use of self-reference is to create a macro which
 297 expands to itself.  If you write
 298
 299 .. code-block:: c++
 300
 301   #define EPERM EPERM
 302
 303 then the macro ``EPERM`` expands to ``EPERM``.  Effectively, it is
 304 left alone by the preprocessor whenever it's used in running text.  You
 305 can tell that it's a macro with :samp:`#ifdef`.  You might do this if you
 306 want to define numeric constants with an ``enum``, but have
 307 :samp:`#ifdef` be true for each constant.
 308
 309 If a macro ``x`` expands to use a macro ``y``, and the expansion of
 310 ``y`` refers to the macro ``x``, that is an :dfn:`indirect
 311 self-reference` of ``x``.  ``x`` is not expanded in this case
 312 either.  Thus, if we have
 313
 314 .. code-block:: c++
 315
 316   #define x (4 + y)
 317   #define y (2 * x)
 318
 319 then ``x`` and ``y`` expand as follows:
 320
 321 .. code-block::
 322
 323   x    → (4 + y)
 324        → (4 + (2 * x))
 325
 326   y    → (2 * x)
 327        → (2 * (4 + y))
 328
 329 Each macro is expanded when it appears in the definition of the other
 330 macro, but not when it indirectly appears in its own definition.
 331
 332 .. index:: expansion of arguments, macro argument expansion, prescan of macro arguments
 333
 334 .. _argument-prescan:
 335
 336 Argument Prescan
 337 ^^^^^^^^^^^^^^^^
 338
 339 Macro arguments are completely macro-expanded before they are
 340 substituted into a macro body, unless they are stringized or pasted
 341 with other tokens.  After substitution, the entire macro body, including
 342 the substituted arguments, is scanned again for macros to be expanded.
 343 The result is that the arguments are scanned *twice* to expand
 344 macro calls in them.
 345
 346 Most of the time, this has no effect.  If the argument contained any
 347 macro calls, they are expanded during the first scan.  The result
 348 therefore contains no macro calls, so the second scan does not change
 349 it.  If the argument were substituted as given, with no prescan, the
 350 single remaining scan would find the same macro calls and produce the
 351 same results.
 352
 353 You might expect the double scan to change the results when a
 354 self-referential macro is used in an argument of another macro
 355 (see :ref:`self-referential-macros`): the self-referential macro would be
 356 expanded once in the first scan, and a second time in the second scan.
 357 However, this is not what happens.  The self-references that do not
 358 expand in the first scan are marked so that they will not expand in the
 359 second scan either.
 360
 361 You might wonder, 'Why mention the prescan, if it makes no difference?
 362 And why not skip it and make the preprocessor faster?'  The answer is
 363 that the prescan does make a difference in three special cases:
 364
 365 * Nested calls to a macro.
 366
 367   We say that :dfn:`nested` calls to a macro occur when a macro's argument
 368   contains a call to that very macro.  For example, if ``f`` is a macro
 369   that expects one argument, ``f (f (1))`` is a nested pair of calls to
 370   ``f``.  The desired expansion is made by expanding ``f (1)`` and
 371   substituting that into the definition of ``f``.  The prescan causes
 372   the expected result to happen.  Without the prescan, ``f (1)`` itself
 373   would be substituted as an argument, and the inner use of ``f`` would
 374   appear during the main scan as an indirect self-reference and would not
 375   be expanded.
 376
 377 * Macros that call other macros that stringize or concatenate.
 378
 379   If an argument is stringized or concatenated, the prescan does not
 380   occur.  If you *want* to expand a macro, then stringize or
 381   concatenate its expansion, you can do that by causing one macro to call
 382   another macro that does the stringizing or concatenation.  For
 383   instance, if you have
 384
 385   .. code-block:: c++
 386
 387     #define AFTERX(x) X_ ## x
 388     #define XAFTERX(x) AFTERX(x)
 389     #define TABLESIZE 1024
 390     #define BUFSIZE TABLESIZE
 391
 392   then ``AFTERX(BUFSIZE)`` expands to ``X_BUFSIZE``, and
 393   ``XAFTERX(BUFSIZE)`` expands to ``X_1024``.  (Not to
 394   ``X_TABLESIZE``.  Prescan always does a complete expansion.)
 395
 396 * Macros used in arguments, whose expansions contain unshielded commas.
 397
 398   This can cause a macro expanded on the second scan to be called with the
 399   wrong number of arguments.  Here is an example:
 400
 401   .. code-block:: c++
 402
 403     #define foo  a,b
 404     #define bar(x) lose(x)
 405     #define lose(x) (1 + (x))
 406
 407   We would like ``bar(foo)`` to turn into ``(1 + (foo))``, which
 408   would then turn into ``(1 + (a,b))``.  Instead, ``bar(foo)``
 409   expands into ``lose(a,b)``, and you get an error because ``lose``
 410   requires a single argument.  In this case, the problem is easily solved
 411   by the same parentheses that ought to be used to prevent misnesting of
 412   arithmetic operations:
 413
 414   .. code-block::
 415
 416     #define foo (a,b)
 417     or#define bar(x) lose((x))
 418
 419   The extra pair of parentheses prevents the comma in ``foo`` 's
 420   definition from being interpreted as an argument separator.
 421
 422 .. index:: newlines in macro arguments
 423
 424 .. _newlines-in-arguments:
 425
 426 Newlines in Arguments
 427 ^^^^^^^^^^^^^^^^^^^^^
 428
 429 The invocation of a function-like macro can extend over many logical
 430 lines.  However, in the present implementation, the entire expansion
 431 comes out on one line.  Thus line numbers emitted by the compiler or
 432 debugger refer to the line the invocation started on, which might be
 433 different to the line containing the argument causing the problem.
 434
 435 Here is an example illustrating this:
 436
 437 .. code-block:: c++
 438
 439   #define ignore_second_arg(a,b,c) a; c
 440
 441   ignore_second_arg (foo (),
 442                      ignored (),
 443                      syntax error);
 444
 445 The syntax error triggered by the tokens ``syntax error`` results in
 446 an error message citing line three---the line of ignore_second_arg---
 447 even though the problematic code comes from line five.
 448
 449 We consider this a bug, and intend to fix it in the near future.