]>
Commit | Line | Data |
---|---|---|
e15e66e7 | 1 | Copyright (C) 2000 Free Software Foundation, Inc. |
2 | ||
3 | This file is intended to contain a few notes about writing C code | |
4 | within GCC so that it compiles without error on the full range of | |
5 | compilers GCC needs to be able to compile on. | |
6 | ||
7 | The problem is that many ISO-standard constructs are not accepted by | |
8 | either old or buggy compilers, and we keep getting bitten by them. | |
9 | This knowledge until know has been sparsely spread around, so I | |
10 | thought I'd collect it in one useful place. Please add and correct | |
11 | any problems as you come across them. | |
12 | ||
13 | I'm going to start from a base of the ISO C89 standard, since that is | |
14 | probably what most people code to naturally. Obviously using | |
15 | constructs introduced after that is not a good idea. | |
16 | ||
17 | The first section of this file deals strictly with portability issues, | |
18 | the second with common coding pitfalls. | |
19 | ||
20 | ||
21 | Portability Issues | |
22 | ================== | |
23 | ||
24 | Unary + | |
25 | ------- | |
26 | ||
27 | K+R C compilers and preprocessors have no notion of unary '+'. Thus | |
28 | the following code snippet contains 2 portability problems. | |
29 | ||
30 | int x = +2; /* int x = 2; */ | |
31 | #if +1 /* #if 1 */ | |
32 | #endif | |
33 | ||
34 | ||
35 | Pointers to void | |
36 | ---------------- | |
37 | ||
38 | K+R C compilers did not have a void pointer, and used char * as the | |
39 | pointer to anything. The macro PTR is defined as either void * or | |
40 | char * depending on whether you have a standards compliant compiler or | |
41 | a K+R one. Thus | |
42 | ||
43 | free ((void *) h->value.expansion); | |
44 | ||
45 | should be written | |
46 | ||
47 | free ((PTR) h->value.expansion); | |
48 | ||
49 | ||
50 | String literals | |
51 | --------------- | |
52 | ||
53 | K+R C did not allow concatenation of string literals like | |
54 | ||
55 | "This is a " "single string literal". | |
56 | ||
57 | Moreover, some compilers like MSVC++ have fairly low limits on the | |
58 | maximum length of a string literal; 509 is the lowest we've come | |
59 | across. You may need to break up a long printf statement into many | |
60 | smaller ones. | |
61 | ||
62 | ||
63 | Empty macro arguments | |
64 | --------------------- | |
65 | ||
66 | ISO C (6.8.3 in the 1990 standard) specifies the following: | |
67 | ||
68 | If (before argument substitution) any argument consists of no | |
69 | preprocessing tokens, the behavior is undefined. | |
70 | ||
71 | This was relaxed by ISO C99, but some older compilers emit an error, | |
72 | so code like | |
73 | ||
74 | #define foo(x, y) x y | |
75 | foo (bar, ) | |
76 | ||
77 | needs to be coded in some other way. | |
78 | ||
79 | ||
80 | signed keyword | |
81 | -------------- | |
82 | ||
83 | The signed keyword did not exist in K+R comilers, it was introduced in | |
84 | ISO C89, so you cannot use it. In both K+R and standard C, | |
85 | unqualified char and bitfields may be signed or unsigned. There is no | |
86 | way to portably declare signed chars or signed bitfields. | |
87 | ||
88 | All other arithmetic types are signed unless you use the 'unsigned' | |
89 | qualifier. For instance, it is safe to write | |
90 | ||
91 | short paramc; | |
92 | ||
93 | instead of | |
94 | ||
95 | signed short paramc; | |
96 | ||
97 | If you have an algorithm that depends on signed char or signed | |
98 | bitfields, you must find another way to write it before it can be | |
99 | integrated into GCC. | |
100 | ||
101 | ||
102 | Function prototypes | |
103 | ------------------- | |
104 | ||
105 | You need to provide a function prototype for every function before you | |
106 | use it, and functions must be defined K+R style. The function | |
107 | prototype should use the PARAMS macro, which takes a single argument. | |
108 | Therefore the parameter list must be enclosed in parentheses. For | |
109 | example, | |
110 | ||
111 | int myfunc PARAMS ((double, int *)); | |
112 | ||
113 | int | |
114 | myfunc (var1, var2) | |
115 | double var1; | |
116 | int *var2; | |
117 | { | |
118 | ... | |
119 | } | |
120 | ||
121 | You also need to use PARAMS when referring to function protypes in | |
122 | other circumstances, for example see "Calling functions through | |
123 | pointers to functions" below. | |
124 | ||
125 | Variable-argument functions are best described by example:- | |
126 | ||
127 | void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...)); | |
128 | ||
129 | void | |
130 | cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...)) | |
131 | { | |
132 | #ifndef ANSI_PROTOTYPES | |
133 | cpp_reader *pfile; | |
134 | const char *msgid; | |
135 | #endif | |
136 | va_list ap; | |
137 | ||
138 | VA_START (ap, msgid); | |
139 | ||
140 | #ifndef ANSI_PROTOTYPES | |
141 | pfile = va_arg (ap, cpp_reader *); | |
142 | msgid = va_arg (ap, const char *); | |
143 | #endif | |
144 | ||
145 | ... | |
146 | va_end (ap); | |
147 | } | |
148 | ||
149 | For the curious, here are the definitions of the above macros. See | |
150 | ansidecl.h for the definitions of the above macros and more. | |
151 | ||
152 | #define PARAMS(paramlist) paramlist /* ISO C. */ | |
153 | #define VPARAMS(args) args | |
154 | ||
155 | #define PARAMS(paramlist) () /* K+R C. */ | |
156 | #define VPARAMS(args) (va_alist) va_dcl | |
157 | ||
158 | ||
159 | Calling functions through pointers to functions | |
160 | ----------------------------------------------- | |
161 | ||
162 | K+R C compilers require brackets around the dereferenced pointer | |
163 | variable. For example | |
164 | ||
165 | typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *)); | |
166 | p->handler (pfile, p->arg); | |
167 | ||
168 | needs to become | |
169 | ||
170 | (p->handler) (pfile, p->arg); | |
171 | ||
172 | ||
173 | Macros | |
174 | ------ | |
175 | ||
176 | The rules under K+R C and ISO C for achieving stringification and | |
177 | token pasting are quite different. Therefore some macros have been | |
178 | defined which will get it right depending upon the compiler. | |
179 | ||
180 | CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d) | |
181 | ||
182 | will paste the tokens passed as arguments. You must not leave any | |
183 | space around the commas. Also, | |
184 | ||
185 | STRINGX(x) | |
186 | ||
187 | will stringify an argument; to get the same result on K+R and ISO | |
188 | compilers x should not have spaces around it. | |
189 | ||
190 | ||
191 | Enums | |
192 | ----- | |
193 | ||
194 | In K+R C, you have to cast enum types to use them as integers, and | |
195 | some compilers in particular give lots of warnings for using an enum | |
196 | as an array index. | |
197 | ||
198 | Bitfields | |
199 | --------- | |
200 | ||
201 | See also "signed keyword" above. In K+R C only unsigned int bitfields | |
202 | were defined (i.e. unsigned char, unsigned short, unsigned long. | |
203 | Using plain int/short/long was not allowed). | |
204 | ||
205 | ||
206 | free and realloc | |
207 | ---------------- | |
208 | ||
209 | Some implementations crash upon attempts to free or realloc the null | |
210 | pointer. Thus if mem might be null, you need to write | |
211 | ||
212 | if (mem) | |
213 | free (mem); | |
214 | ||
215 | ||
216 | Reserved Keywords | |
217 | ----------------- | |
218 | ||
219 | K+R C has "entry" as a reserved keyword, so you should not use it for | |
220 | your variable names. | |
221 | ||
222 | ||
223 | Type promotions | |
224 | --------------- | |
225 | ||
226 | K+R used unsigned-preserving rules for arithmetic expresssions, while | |
227 | ISO uses value-preserving. This means an unsigned char compared to an | |
228 | int is done as an unsigned comparison in K+R (since unsigned char | |
229 | promotes to unsigned) while it is signed in ISO (since all of the | |
230 | values in unsigned char fit in an int, it promotes to int). | |
231 | ||
232 | ** Not having any argument whose type is a short type (char, short, | |
233 | float of any flavor) and subject to promotion. ** | |
234 | ||
235 | Trigraphs | |
236 | --------- | |
237 | ||
238 | You weren't going to use them anyway, but trigraphs were not defined | |
239 | in K+R C, and some otherwise ISO C compliant compilers do not accept | |
240 | them. | |
241 | ||
242 | ||
243 | Suffixes on Integer Constants | |
244 | ----------------------------- | |
245 | ||
246 | **Using a 'u' suffix on integer constants.** | |
247 | ||
248 | ||
249 | errno | |
250 | ----- | |
251 | ||
252 | errno might be declared as a macro. | |
253 | ||
254 | ||
255 | Common Coding Pitfalls | |
256 | ====================== | |
257 | Implicit int | |
258 | ------------ | |
259 | ||
260 | In C, the 'int' keyword can often be omitted from type declarations. | |
261 | For instance, you can write | |
262 | ||
263 | unsigned variable; | |
264 | ||
265 | as shorthand for | |
266 | ||
267 | unsigned int variable; | |
268 | ||
269 | There are several places where this can cause trouble. First, suppose | |
270 | 'variable' is a long; then you might think | |
271 | ||
272 | (unsigned) variable | |
273 | ||
274 | would convert it to unsigned long. It does not. It converts to | |
275 | unsigned int. This mostly causes problems on 64-bit platforms, where | |
276 | long and int are not the same size. | |
277 | ||
278 | Second, if you write a function definition with no return type at | |
279 | all: | |
280 | ||
281 | operate(a, b) | |
282 | int a, b; | |
283 | { | |
284 | ... | |
285 | } | |
286 | ||
287 | that function is expected to return int, *not* void. GCC will warn | |
288 | about this. K+R C has no problem with 'void' as a return type, so you | |
289 | need not worry about that. | |
290 | ||
291 | Implicit function declarations always have return type int. So if you | |
292 | correct the above definition to | |
293 | ||
294 | void | |
295 | operate(a, b) | |
296 | int a, b; | |
297 | ... | |
298 | ||
299 | but operate() is called above its definition, you will get an error | |
300 | about a "type mismatch with previous implicit declaration". The cure | |
301 | is to prototype all functions at the top of the file, or in an | |
302 | appropriate header. | |
303 | ||
304 | Char vs unsigned char vs int | |
305 | ---------------------------- | |
306 | ||
307 | In C, unqualified 'char' may be either signed or unsigned; it is the | |
308 | implementation's choice. When you are processing 7-bit ASCII, it does | |
309 | not matter. But when your program must handle arbitrary binary data, | |
310 | or fully 8-bit character sets, you have a problem. The most obvious | |
311 | issue is if you have a look-up table indexed by characters. | |
312 | ||
313 | For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A | |
314 | WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be | |
315 | true. But if you read '\341' from a file and store it in a plain | |
316 | char, isalpha(c) may look up character 225, or it may look up | |
317 | character -31. And the ctype table has no entry at offset -31, so | |
318 | your program will crash. (If you're lucky.) | |
319 | ||
320 | It is wise to use unsigned char everywhere you possibly can. This | |
321 | avoids all these problems. Unfortunately, the routines in <string.h> | |
322 | take plain char arguments, so you have to remember to cast them back | |
323 | and forth - or avoid the use of strxxx() functions, which is probably | |
324 | a good idea anyway. | |
325 | ||
326 | Another common mistake is to use either char or unsigned char to | |
327 | receive the result of getc() or related stdio functions. They may | |
328 | return EOF, which is outside the range of values representable by | |
329 | char. If you use char, some legal character value may be confused | |
330 | with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1). | |
331 | The correct choice is int. | |
332 | ||
333 | A more subtle version of the same mistake might look like this: | |
334 | ||
335 | unsigned char pushback[NPUSHBACK]; | |
336 | int pbidx; | |
337 | #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c)) | |
338 | #define get(c) (pbidx ? pushback[--pbidx] : getchar()) | |
339 | ... | |
340 | unget(EOF); | |
341 | ||
342 | which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y | |
343 | WITH UMLAUT. | |
344 | ||
345 | ||
346 | Other common pitfalls | |
347 | --------------------- | |
348 | ||
349 | o Expecting 'plain' char to be either sign or unsigned extending | |
350 | ||
351 | o Shifting an item by a negative amount or by greater than or equal to | |
352 | the number of bits in a type (expecting shifts by 32 to be sensible | |
353 | has caused quite a number of bugs at least in the early days). | |
354 | ||
355 | o Expecting ints shifted right to be sign extended. | |
356 | ||
357 | o Modifying the same value twice within one sequence point. | |
358 | ||
359 | o Host vs. target floating point representation, including emitting NaNs | |
360 | and Infinities in a form that the assembler handles. | |
361 | ||
362 | o qsort being an unstable sort function (unstable in the sense that | |
363 | multiple items that sort the same may be sorted in different orders | |
364 | by different qsort functions). | |
365 | ||
366 | o Passing incorrect types to fprintf and friends. | |
367 | ||
368 | o Adding a function declaration for a module declared in another file to | |
369 | a .c file instead of to a .h file. |