]> git.ipfire.org Git - thirdparty/gcc.git/blob - gcc/README.Portability
ccd05e7521d5c1da148ef7bf18e10a6b7d4fb1ad
[thirdparty/gcc.git] / gcc / README.Portability
1 Copyright (C) 2000 Free Software Foundation, Inc.
2
3 This file is intended to contain a few notes about writing C code
4 within GCC so that it compiles without error on the full range of
5 compilers GCC needs to be able to compile on.
6
7 The problem is that many ISO-standard constructs are not accepted by
8 either old or buggy compilers, and we keep getting bitten by them.
9 This knowledge until know has been sparsely spread around, so I
10 thought I'd collect it in one useful place. Please add and correct
11 any problems as you come across them.
12
13 I'm going to start from a base of the ISO C89 standard, since that is
14 probably what most people code to naturally. Obviously using
15 constructs introduced after that is not a good idea.
16
17 The first section of this file deals strictly with portability issues,
18 the second with common coding pitfalls.
19
20
21 Portability Issues
22 ==================
23
24 Unary +
25 -------
26
27 K+R C compilers and preprocessors have no notion of unary '+'. Thus
28 the following code snippet contains 2 portability problems.
29
30 int x = +2; /* int x = 2; */
31 #if +1 /* #if 1 */
32 #endif
33
34
35 Pointers to void
36 ----------------
37
38 K+R C compilers did not have a void pointer, and used char * as the
39 pointer to anything. The macro PTR is defined as either void * or
40 char * depending on whether you have a standards compliant compiler or
41 a K+R one. Thus
42
43 free ((void *) h->value.expansion);
44
45 should be written
46
47 free ((PTR) h->value.expansion);
48
49 Further, an initial investigation indicates that pointers to functions
50 returning void are okay. Thus the example given by "Calling functions
51 through pointers to functions" below appears not to cause a problem.
52
53
54 String literals
55 ---------------
56
57 Some SGI compilers choke on the parentheses in:-
58
59 const char string[] = ("A string");
60
61 This is unfortunate since this is what the GNU gettext macro N_
62 produces. You need to find a different way to code it.
63
64 K+R C did not allow concatenation of string literals like
65
66 "This is a " "single string literal".
67
68 Moreover, some compilers like MSVC++ have fairly low limits on the
69 maximum length of a string literal; 509 is the lowest we've come
70 across. You may need to break up a long printf statement into many
71 smaller ones.
72
73
74 Empty macro arguments
75 ---------------------
76
77 ISO C (6.8.3 in the 1990 standard) specifies the following:
78
79 If (before argument substitution) any argument consists of no
80 preprocessing tokens, the behavior is undefined.
81
82 This was relaxed by ISO C99, but some older compilers emit an error,
83 so code like
84
85 #define foo(x, y) x y
86 foo (bar, )
87
88 needs to be coded in some other way.
89
90
91 signed keyword
92 --------------
93
94 The signed keyword did not exist in K+R compilers; it was introduced
95 in ISO C89, so you cannot use it. In both K+R and standard C,
96 unqualified char and bitfields may be signed or unsigned. There is no
97 way to portably declare signed chars or signed bitfields.
98
99 All other arithmetic types are signed unless you use the 'unsigned'
100 qualifier. For instance, it is safe to write
101
102 short paramc;
103
104 instead of
105
106 signed short paramc;
107
108 If you have an algorithm that depends on signed char or signed
109 bitfields, you must find another way to write it before it can be
110 integrated into GCC.
111
112
113 Function prototypes
114 -------------------
115
116 You need to provide a function prototype for every function before you
117 use it, and functions must be defined K+R style. The function
118 prototype should use the PARAMS macro, which takes a single argument.
119 Therefore the parameter list must be enclosed in parentheses. For
120 example,
121
122 int myfunc PARAMS ((double, int *));
123
124 int
125 myfunc (var1, var2)
126 double var1;
127 int *var2;
128 {
129 ...
130 }
131
132 You also need to use PARAMS when referring to function protypes in
133 other circumstances, for example see "Calling functions through
134 pointers to functions" below.
135
136 Variable-argument functions are best described by example:-
137
138 void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...));
139
140 void
141 cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...))
142 {
143 VA_OPEN (ap, msgid);
144 VA_FIXEDARG (ap, cpp_reader *, pfile);
145 VA_FIXEDARG (ap, const char *, msgid);
146
147 ...
148 VA_CLOSE (ap);
149 }
150
151 See ansidecl.h for the definitions of the above macros and more.
152
153 One aspect of using K+R style function declarations, is you cannot
154 have arguments whose types are char, short, or float, since without
155 prototypes (ie, K+R rules), these types are promoted to int, int, and
156 double respectively.
157
158 Calling functions through pointers to functions
159 -----------------------------------------------
160
161 K+R C compilers require parentheses around the dereferenced function
162 pointer expression in the call, whereas ISO C relaxes the syntax. For
163 example
164
165 typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
166 *p->handler (pfile, p->arg);
167
168 needs to become
169
170 (*p->handler) (pfile, p->arg);
171
172
173 Macros
174 ------
175
176 The rules under K+R C and ISO C for achieving stringification and
177 token pasting are quite different. Therefore some macros have been
178 defined which will get it right depending upon the compiler.
179
180 CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d)
181
182 will paste the tokens passed as arguments. You must not leave any
183 space around the commas. Also,
184
185 STRINGX(x)
186
187 will stringify an argument; to get the same result on K+R and ISO
188 compilers x should not have spaces around it.
189
190
191 Passing structures by value
192 ---------------------------
193
194 Avoid passing structures by value, either to or from functions. It
195 seems some K+R compilers handle this differently or not at all.
196
197
198 Enums
199 -----
200
201 In K+R C, you have to cast enum types to use them as integers, and
202 some compilers in particular give lots of warnings for using an enum
203 as an array index.
204
205
206 Bitfields
207 ---------
208
209 See also "signed keyword" above. In K+R C only unsigned int bitfields
210 were defined (i.e. unsigned char, unsigned short, unsigned long.
211 Using plain int/short/long was not allowed).
212
213
214 free and realloc
215 ----------------
216
217 Some implementations crash upon attempts to free or realloc the null
218 pointer. Thus if mem might be null, you need to write
219
220 if (mem)
221 free (mem);
222
223
224 Reserved Keywords
225 -----------------
226
227 K+R C has "entry" as a reserved keyword, so you should not use it for
228 your variable names.
229
230
231 Type promotions
232 ---------------
233
234 K+R used unsigned-preserving rules for arithmetic expresssions, while
235 ISO uses value-preserving. This means an unsigned char compared to an
236 int is done as an unsigned comparison in K+R (since unsigned char
237 promotes to unsigned) while it is signed in ISO (since all of the
238 values in unsigned char fit in an int, it promotes to int).
239
240 Trigraphs
241 ---------
242
243 You weren't going to use them anyway, but trigraphs were not defined
244 in K+R C, and some otherwise ISO C compliant compilers do not accept
245 them.
246
247
248 Suffixes on Integer Constants
249 -----------------------------
250
251 K+R C did not accept a 'u' suffix on integer constants. If you want
252 to declare a constant to be be unsigned, you must use an explicit
253 cast.
254
255 You should never use a 'l' suffix on integer constants ('L' is fine),
256 since it can easily be confused with the number '1'.
257
258
259 Common Coding Pitfalls
260 ======================
261
262 errno
263 -----
264
265 errno might be declared as a macro.
266
267
268 Implicit int
269 ------------
270
271 In C, the 'int' keyword can often be omitted from type declarations.
272 For instance, you can write
273
274 unsigned variable;
275
276 as shorthand for
277
278 unsigned int variable;
279
280 There are several places where this can cause trouble. First, suppose
281 'variable' is a long; then you might think
282
283 (unsigned) variable
284
285 would convert it to unsigned long. It does not. It converts to
286 unsigned int. This mostly causes problems on 64-bit platforms, where
287 long and int are not the same size.
288
289 Second, if you write a function definition with no return type at
290 all:
291
292 operate(a, b)
293 int a, b;
294 {
295 ...
296 }
297
298 that function is expected to return int, *not* void. GCC will warn
299 about this. K+R C has no problem with 'void' as a return type, so you
300 need not worry about that.
301
302 Implicit function declarations always have return type int. So if you
303 correct the above definition to
304
305 void
306 operate(a, b)
307 int a, b;
308 ...
309
310 but operate() is called above its definition, you will get an error
311 about a "type mismatch with previous implicit declaration". The cure
312 is to prototype all functions at the top of the file, or in an
313 appropriate header.
314
315 Char vs unsigned char vs int
316 ----------------------------
317
318 In C, unqualified 'char' may be either signed or unsigned; it is the
319 implementation's choice. When you are processing 7-bit ASCII, it does
320 not matter. But when your program must handle arbitrary binary data,
321 or fully 8-bit character sets, you have a problem. The most obvious
322 issue is if you have a look-up table indexed by characters.
323
324 For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
325 WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
326 true. But if you read '\341' from a file and store it in a plain
327 char, isalpha(c) may look up character 225, or it may look up
328 character -31. And the ctype table has no entry at offset -31, so
329 your program will crash. (If you're lucky.)
330
331 It is wise to use unsigned char everywhere you possibly can. This
332 avoids all these problems. Unfortunately, the routines in <string.h>
333 take plain char arguments, so you have to remember to cast them back
334 and forth - or avoid the use of strxxx() functions, which is probably
335 a good idea anyway.
336
337 Another common mistake is to use either char or unsigned char to
338 receive the result of getc() or related stdio functions. They may
339 return EOF, which is outside the range of values representable by
340 char. If you use char, some legal character value may be confused
341 with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
342 The correct choice is int.
343
344 A more subtle version of the same mistake might look like this:
345
346 unsigned char pushback[NPUSHBACK];
347 int pbidx;
348 #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
349 #define get(c) (pbidx ? pushback[--pbidx] : getchar())
350 ...
351 unget(EOF);
352
353 which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
354 WITH UMLAUT.
355
356
357 Other common pitfalls
358 ---------------------
359
360 o Expecting 'plain' char to be either sign or unsigned extending
361
362 o Shifting an item by a negative amount or by greater than or equal to
363 the number of bits in a type (expecting shifts by 32 to be sensible
364 has caused quite a number of bugs at least in the early days).
365
366 o Expecting ints shifted right to be sign extended.
367
368 o Modifying the same value twice within one sequence point.
369
370 o Host vs. target floating point representation, including emitting NaNs
371 and Infinities in a form that the assembler handles.
372
373 o qsort being an unstable sort function (unstable in the sense that
374 multiple items that sort the same may be sorted in different orders
375 by different qsort functions).
376
377 o Passing incorrect types to fprintf and friends.
378
379 o Adding a function declaration for a module declared in another file to
380 a .c file instead of to a .h file.