]> git.ipfire.org Git - thirdparty/binutils-gdb.git/blob - bfd/doc/bfdint.texi
add a section on relocations
[thirdparty/binutils-gdb.git] / bfd / doc / bfdint.texi
1 \input texinfo
2 @setfilename bfdint.info
3 @node Top
4 @top BFD Internals
5 @raisesections
6 @cindex bfd internals
7
8 This document describes some BFD internal information which may be
9 helpful when working on BFD. It is very incomplete.
10
11 This document is not updated regularly, and may be out of date. It was
12 last modified on $Date$.
13
14 The initial version of this document was written by Ian Lance Taylor
15 @email{ian@@cygnus.com}.
16
17 @menu
18 * BFD glossary:: BFD glossary
19 * BFD guidelines:: BFD programming guidelines
20 * BFD generated files:: BFD generated files
21 * BFD multiple compilations:: Files compiled multiple times in BFD
22 * BFD relocation handling:: BFD relocation handling
23 * Index:: Index
24 @end menu
25
26 @node BFD glossary
27 @section BFD glossary
28 @cindex glossary for bfd
29 @cindex bfd glossary
30
31 This is a short glossary of some BFD terms.
32
33 @table @asis
34 @item a.out
35 The a.out object file format. The original Unix object file format.
36 Still used on SunOS, though not Solaris. Supports only three sections.
37
38 @item archive
39 A collection of object files produced and manipulated by the @samp{ar}
40 program.
41
42 @item BFD
43 The BFD library itself. Also, each object file, archive, or exectable
44 opened by the BFD library has the type @samp{bfd *}, and is sometimes
45 referred to as a bfd.
46
47 @item COFF
48 The Common Object File Format. Used on Unix SVR3. Used by some
49 embedded targets, although ELF is normally better.
50
51 @item DLL
52 A shared library on Windows.
53
54 @item dynamic linker
55 When a program linked against a shared library is run, the dynamic
56 linker will locate the appropriate shared library and arrange to somehow
57 include it in the running image.
58
59 @item dynamic object
60 Another name for an ELF shared library.
61
62 @item ECOFF
63 The Extended Common Object File Format. Used on Alpha Digital Unix
64 (formerly OSF/1), as well as Ultrix and Irix 4. A variant of COFF.
65
66 @item ELF
67 The Executable and Linking Format. The object file format used on most
68 modern Unix systems, including GNU/Linux, Solaris, Irix, and SVR4. Also
69 used on many embedded systems.
70
71 @item executable
72 A program, with instructions and symbols, and perhaps dynamic linking
73 information. Normally produced by a linker.
74
75 @item NLM
76 NetWare Loadable Module. Used to describe the format of an object which
77 be loaded into NetWare, which is some kind of PC based network server
78 program.
79
80 @item object file
81 A binary file including machine instructions, symbols, and relocation
82 information. Normally produced by an assembler.
83
84 @item object file format
85 The format of an object file. Typically object files and executables
86 for a particular system are in the same format, although executables
87 will not contain any relocation information.
88
89 @item PE
90 The Portable Executable format. This is the object file format used for
91 Windows (specifically, Win32) object files. It is based closely on
92 COFF, but has a few significant differences.
93
94 @item PEI
95 The Portable Executable Image format. This is the object file format
96 used for Windows (specifically, Win32) executables. It is very similar
97 to PE, but includes some additional header information.
98
99 @item relocations
100 Information used by the linker to adjust section contents. Also called
101 relocs.
102
103 @item section
104 Object files and executable are composed of sections. Sections have
105 optional data and optional relocation information.
106
107 @item shared library
108 A library of functions which may be used by many executables without
109 actually being linked into each executable. There are several different
110 implementations of shared libraries, each having slightly different
111 features.
112
113 @item symbol
114 Each object file and executable may have a list of symbols, often
115 referred to as the symbol table. A symbol is basically a name and an
116 address. There may also be some additional information like the type of
117 symbol, although the type of a symbol is normally something simple like
118 function or object, and should be confused with the more complex C
119 notion of type. Typically every global function and variable in a C
120 program will have an associated symbol.
121
122 @item Win32
123 The current Windows API, implemented by Windows 95 and later and Windows
124 NT 3.51 and later, but not by Windows 3.1.
125
126 @item XCOFF
127 The eXtended Common Object File Format. Used on AIX. A variant of
128 COFF, with a completely different symbol table implementation.
129 @end table
130
131 @node BFD guidelines
132 @section BFD programming guidelines
133 @cindex bfd programming guidelines
134 @cindex programming guidelines for bfd
135 @cindex guidelines, bfd programming
136
137 There is a lot of poorly written and confusing code in BFD. New BFD
138 code should be written to a higher standard. Merely because some BFD
139 code is written in a particular manner does not mean that you should
140 emulate it.
141
142 Here are some general BFD programming guidelines:
143
144 @itemize @bullet
145 @item
146 Follow the GNU coding standards.
147
148 @item
149 Avoid global variables. We ideally want BFD to be fully reentrant, so
150 that it can be used in multiple threads. All uses of global or static
151 variables interfere with that. Initialized constant variables are OK,
152 and they should be explicitly marked with const. Instead of global
153 variables, use data attached to a BFD or to a linker hash table.
154
155 @item
156 All externally visible functions should have names which start with
157 @samp{bfd_}. All such functions should be declared in some header file,
158 typically @file{bfd.h}. See, for example, the various declarations near
159 the end of @file{bfd-in.h}, which mostly declare functions required by
160 specific linker emulations.
161
162 @item
163 All functions which need to be visible from one file to another within
164 BFD, but should not be visible outside of BFD, should start with
165 @samp{_bfd_}. Although external names beginning with @samp{_} are
166 prohibited by the ANSI standard, in practice this usage will always
167 work, and it is required by the GNU coding standards.
168
169 @item
170 Always remember that people can compile using --enable-targets to build
171 several, or all, targets at once. It must be possible to link together
172 the files for all targets.
173
174 @item
175 BFD code should compile with few or no warnings using @samp{gcc -Wall}.
176 Some warnings are OK, like the absence of certain function declarations
177 which may or may not be declared in system header files. Warnings about
178 ambiguous expressions and the like should always be fixed.
179 @end itemize
180
181 @node BFD generated files
182 @section BFD generated files
183 @cindex generated files in bfd
184 @cindex bfd generated files
185
186 BFD contains several automatically generated files. This section
187 describes them. Some files are created at configure time, when you
188 configure BFD. Some files are created at make time, when you build
189 time. Some files are automatically rebuilt at make time, but only if
190 you configure with the @samp{--enable-maintainer-mode} option. Some
191 files live in the object directory---the directory from which you run
192 configure---and some live in the source directory. All files that live
193 in the source directory are checked into the CVS repository.
194
195 @table @file
196 @item bfd.h
197 @cindex @file{bfd.h}
198 @cindex @file{bfd-in3.h}
199 Lives in the object directory. Created at make time from
200 @file{bfd-in2.h} via @file{bfd-in3.h}. @file{bfd-in3.h} is created at
201 configure time from @file{bfd-in2.h}. There are automatic dependencies
202 to rebuild @file{bfd-in3.h} and hence @file{bfd.h} if @file{bfd-in2.h}
203 changes, so you can normally ignore @file{bfd-in3.h}, and just think
204 about @file{bfd-in2.h} and @file{bfd.h}.
205
206 @file{bfd.h} is built by replacing a few strings in @file{bfd-in2.h}.
207 To see them, search for @samp{@@} in @file{bfd-in2.h}. They mainly
208 control whether BFD is built for a 32 bit target or a 64 bit target.
209
210 @item bfd-in2.h
211 @cindex @file{bfd-in2.h}
212 Lives in the source directory. Created from @file{bfd-in.h} and several
213 other BFD source files. If you configure with the
214 @samp{--enable-maintainer-mode} option, @file{bfd-in2.h} is rebuilt
215 automatically when a source file changes.
216
217 @item elf32-target.h
218 @itemx elf64-target.h
219 @cindex @file{elf32-target.h}
220 @cindex @file{elf64-target.h}
221 Live in the object directory. Created from @file{elfxx-target.h}.
222 These files are versions of @file{elfxx-target.h} customized for either
223 a 32 bit ELF target or a 64 bit ELF target.
224
225 @item libbfd.h
226 @cindex @file{libbfd.h}
227 Lives in the source directory. Created from @file{libbfd-in.h} and
228 several other BFD source files. If you configure with the
229 @samp{--enable-maintainer-mode} option, @file{libbfd.h} is rebuilt
230 automatically when a source file changes.
231
232 @item libcoff.h
233 @cindex @file{libcoff.h}
234 Lives in the source directory. Created from @file{libcoff-in.h} and
235 @file{coffcode.h}. If you configure with the
236 @samp{--enable-maintainer-mode} option, @file{libcoff.h} is rebuilt
237 automatically when a source file changes.
238
239 @item targmatch.h
240 @cindex @file{targmatch.h}
241 Lives in the object directory. Created at make time from
242 @file{config.bfd}. This file is used to map configuration triplets into
243 BFD target vector variable names at run time.
244 @end table
245
246 @node BFD multiple compilations
247 @section Files compiled multiple times in BFD
248 Several files in BFD are compiled multiple times. By this I mean that
249 there are header files which contain function definitions. These header
250 filesare included by other files, and thus the functions are compiled
251 once per file which includes them.
252
253 Preprocessor macros are used to control the compilation, so that each
254 time the files are compiled the resulting functions are slightly
255 different. Naturally, if they weren't different, there would be no
256 reason to compile them multiple times.
257
258 This is a not a particularly good programming technique, and future BFD
259 work should avoid it.
260
261 @itemize @bullet
262 @item
263 Since this technique is rarely used, even experienced C programmers find
264 it confusing.
265
266 @item
267 It is difficult to debug programs which use BFD, since there is no way
268 to describe which version of a particular function you are looking at.
269
270 @item
271 Programs which use BFD wind up incorporating two or more slightly
272 different versions of the same function, which wastes space in the
273 executable.
274
275 @item
276 This technique is never required nor is it especially efficient. It is
277 always possible to use statically initialized structures holding
278 function pointers and magic constants instead.
279 @end itemize
280
281 The following a list of the files which are compiled multiple times.
282
283 @table @file
284 @item aout-target.h
285 @cindex @file{aout-target.h}
286 Describes a few functions and the target vector for a.out targets. This
287 is used by individual a.out targets with different definitions of
288 @samp{N_TXTADDR} and similar a.out macros.
289
290 @item aoutf1.h
291 @cindex @file{aoutf1.h}
292 Implements standard SunOS a.out files. In principle it supports 64 bit
293 a.out targets based on the preprocessor macro @samp{ARCH_SIZE}, but
294 since all known a.out targets are 32 bits, this code may or may not
295 work. This file is only included by a few other files, and it is
296 difficult to justify its existence.
297
298 @item aoutx.h
299 @cindex @file{aoutx.h}
300 Implements basic a.out support routines. This file can be compiled for
301 either 32 or 64 bit support. Since all known a.out targets are 32 bits,
302 the 64 bit support may or may not work. I believe the original
303 intention was that this file would only be included by @samp{aout32.c}
304 and @samp{aout64.c}, and that other a.out targets would simply refer to
305 the functions it defined. Unfortunately, some other a.out targets
306 started including it directly, leading to a somewhat confused state of
307 affairs.
308
309 @item coffcode.h
310 @cindex @file{coffcode.h}
311 Implements basic COFF support routines. This file is included by every
312 COFF target. It implements code which handles COFF magic numbers as
313 well as various hook functions called by the generic COFF functions in
314 @file{coffgen.c}. This file is controlled by a number of different
315 macros, and more are added regularly.
316
317 @item coffswap.h
318 @cindex @file{coffswap.h}
319 Implements COFF swapping routines. This file is included by
320 @file{coffcode.h}, and thus by every COFF target. It implements the
321 routines which swap COFF structures between internal and external
322 format. The main control for this file is the external structure
323 definitions in the files in the @file{include/coff} directory. A COFF
324 target file will include one of those files before including
325 @file{coffcode.h} and thus @file{coffswap.h}. There are a few other
326 macros which affect @file{coffswap.h} as well, mostly describing whether
327 certain fields are present in the external structures.
328
329 @item ecoffswap.h
330 @cindex @file{ecoffswap.h}
331 Implements ECOFF swapping routines. This is like @file{coffswap.h}, but
332 for ECOFF. It is included by the ECOFF target files (of which there are
333 only two). The control is the preprocessor macro @samp{ECOFF_32} or
334 @samp{ECOFF_64}.
335
336 @item elfcode.h
337 @cindex @file{elfcode.h}
338 Implements ELF functions that use external structure definitions. This
339 file is included by two other files: @file{elf32.c} and @file{elf64.c}.
340 It is controlled by the @samp{ARCH_SIZE} macro which is defined to be
341 @samp{32} or @samp{64} before including it. The @samp{NAME} macro is
342 used internally to give the functions different names for the two target
343 sizes.
344
345 @item elfcore.h
346 @cindex @file{elfcore.h}
347 Like @file{elfcode.h}, but for functions that are specific to ELF core
348 files. This is included only by @file{elfcode.h}.
349
350 @item elflink.h
351 @cindex @file{elflink.h}
352 Like @file{elfcode.h}, but for functions used by the ELF linker. This
353 is included only by @file{elfcode.h}.
354
355 @item elfxx-target.h
356 @cindex @file{elfxx-target.h}
357 This file is the source for the generated files @file{elf32-target.h}
358 and @file{elf64-target.h}, one of which is included by every ELF target.
359 It defines the ELF target vector.
360
361 @item freebsd.h
362 @cindex @file{freebsd.h}
363 Presumably intended to be included by all FreeBSD targets, but in fact
364 there is only one such target, @samp{i386-freebsd}. This defines a
365 function used to set the right magic number for FreeBSD, as well as
366 various macros, and includes @file{aout-target.h}.
367
368 @item netbsd.h
369 @cindex @file{netbsd.h}
370 Like @file{freebsd.h}, except that there are several files which include
371 it.
372
373 @item nlm-target.h
374 @cindex @file{nlm-target.h}
375 Defines the target vector for a standard NLM target.
376
377 @item nlmcode.h
378 @cindex @file{nlmcode.h}
379 Like @file{elfcode.h}, but for NLM targets. This is only included by
380 @file{nlm32.c} and @file{nlm64.c}, both of which define the macro
381 @samp{ARCH_SIZE} to an appropriate value. There are no 64 bit NLM
382 targets anyhow, so this is sort of useless.
383
384 @item nlmswap.h
385 @cindex @file{nlmswap.h}
386 Like @file{coffswap.h}, but for NLM targets. This is included by each
387 NLM target, but I think it winds up compiling to the exact same code for
388 every target, and as such is fairly useless.
389
390 @item peicode.h
391 @cindex @file{peicode.h}
392 Provides swapping routines and other hooks for PE targets.
393 @file{coffcode.h} will include this rather than @file{coffswap.h} for a
394 PE target. This defines PE specific versions of the COFF swapping
395 routines, and also defines some macros which control @file{coffcode.h}
396 itself.
397 @end table
398
399 @node BFD relocation handling
400 @section BFD relocation handling
401 @cindex bfd relocation handling
402 @cindex relocations in bfd
403
404 The handling of relocations is one of the more confusing aspects of BFD.
405 Relocation handling has been implemented in various different ways, all
406 somewhat incompatible, none perfect.
407
408 @menu
409 BFD relocation concepts:: BFD relocation concepts
410 BFD relocation functions:: BFD relocation functions
411 BFD relocation future:: BFD relocation future
412 @end menu
413
414 @node BFD relocation concepts
415 @subsection BFD relocation concepts
416
417 A relocation is an action which the linker must take when linking. It
418 describes a change to the contents of a section. The change is normally
419 based on the final value of one or more symbols. Relocations are
420 created by the assembler when it creates an object file.
421
422 Most relocations are simple. A typical simple relocation is to set 32
423 bits at a given offset in a section to the value of a symbol. This type
424 of relocation would be generated for code like @code{int *p = &i;} where
425 @samp{p} and @samp{i} are global variables. A relocation for the symbol
426 @samp{i} would be generated such that the linker would initialize the
427 area of memory which holds the value of @samp{p} to the value of the
428 symbol @samp{i}.
429
430 Slightly more complex relocations may include an addend, which is a
431 constant to add to the symbol value before using it. In some cases a
432 relocation will require adding the symbol value to the existing contents
433 of the section in the object file. In others the relocation will simply
434 replace the contents of the section with the symbol value. Some
435 relocations are PC relative, so that the value to be stored in the
436 section is the difference between the value of a symbol and the final
437 address of the section contents.
438
439 In general, relocations can be arbitrarily complex. For
440 example,relocations used in dynamic linking systems often require the
441 linker to allocate space in a different section and use the offset
442 within that section as the value to store. In the IEEE object file
443 format, relocations may involve arbitrary expressions.
444
445 When doing a relocateable link, the linker may or may not have to do
446 anything with a relocation, depending upon the definition of the
447 relocation. Simple relocations generally do not require any special
448 action.
449
450 @node BFD relocation functions
451 @subsection BFD relocation functions
452
453 In BFD, each section has an array of @samp{arelent} structures. Each
454 structure has a pointer to a symbol, an address within the section, an
455 addend, and a pointer to a @samp{reloc_howto_struct} structure. The
456 howto structure has a bunch of fields describing the reloc, including a
457 type field. The type field is specific to the object file format
458 backend; none of the generic code in BFD examines it.
459
460 Originally, the function @samp{bfd_perform_relocation} was supposed to
461 handle all relocations. In theory, many relocations would be simple
462 enough to be described by the fields in the howto structure. For those
463 that weren't, the howto structure included a @samp{special_function}
464 field to use as an escape.
465
466 While this seems plausible, a look at @samp{bfd_perform_relocation}
467 shows that it failed. The function has odd special cases. Some of the
468 fields in the howto structure, such as @samp{pcrel_offset}, were not
469 adequately documented.
470
471 The linker uses @samp{bfd_perform_relocation} to do all relocations when
472 the input and output file have different formats (e.g., when generating
473 S-records). The generic linker code, which is used by all targets which
474 do not define their own special purpose linker, uses
475 @samp{bfd_get_relocated_section_contents}, which for most targets turns
476 into a call to @samp{bfd_generic_get_relocated_section_contents}, which
477 calls @samp{bfd_perform_relocation}. So @samp{bfd_perform_relocation}
478 is still widely used, which makes it difficult to change, since it is
479 difficult to test all possible cases.
480
481 The assembler used @samp{bfd_perform_relocation} for a while. This
482 turned out to be the wrong thing to do, since
483 @samp{bfd_perform_relocation} was written to handle relocations on an
484 existing object file, while the assembler needed to create relocations
485 in a new object file. The assembler was changed to use the new function
486 @samp{bfd_install_relocation} instead, and @samp{bfd_install_relocation}
487 was created as a copy of @samp{bfd_perform_relocation}.
488
489 Unfortunately, the work did not progress any farther, so
490 @samp{bfd_install_relocation} remains a simple copy of
491 @samp{bfd_perform_relocation}, with all the odd special cases and
492 confusing code. This again is difficult to change, because again any
493 change can affect any assembler target, and so is difficult to test.
494
495 The new linker, when using the same object file format for all input
496 files and the output file, does not convert relocations into
497 @samp{arelent} structures, so it can not use
498 @samp{bfd_perform_relocation} at all. Instead, users of the new linker
499 are expected to write a @samp{relocate_section} function which will
500 handle relocations in a target specific fashion.
501
502 There are two helper functions for target specific relocation:
503 @samp{_bfd_final_link_relocate} and @samp{_bfd_relocate_contents}.
504 These functions use a howto structure, but they @emph{do not} use the
505 @samp{special_function} field. Since the functions are normally called
506 from target specific code, the @samp{special_function} field adds
507 little; any relocations which require special handling can be handled
508 without calling those functions.
509
510 So, if you want to add a new target, or add a new relocation to an
511 existing target, you need to do the following:
512 @itemize @bullet
513 @item
514 Make sure you clearly understand what the contents of the section should
515 look like after assembly, after a relocateable link, and after a final
516 link. Make sure you clearly understand the operations the linker must
517 perform during a relocateable link and during a final link.
518
519 @item
520 Write a howto structure for the relocation. The howto structure is
521 flexible enough to represent any relocation which should be handled by
522 setting a contiguous bitfield in the destination to the value of a
523 symbol, possibly with an addend, possibly adding the symbol value to the
524 value already present in the destination.
525
526 @item
527 Change the assembler to generate your relocation. The assembler will
528 call @samp{bfd_install_relocation}, so your howto structure has to be
529 able to handle that. You may need to set the @samp{special_function}
530 field to handle assembly correctly. Be careful to ensure that any code
531 you write to handle the assembler will also work correctly when doing a
532 relocateable link. For example, see @samp{bfd_elf_generic_reloc}.
533
534 @item
535 Test the assembler. Consider the cases of relocation against an
536 undefined symbol, a common symbol, a symbol defined in the object file
537 in the same section, and a symbol defined in the object file in a
538 different section. These cases may not all be applicable for your
539 reloc.
540
541 @item
542 If your target uses the new linker, which is recommended, add any
543 required handling to the target specific relocation function. In simple
544 cases this will just involve a call to @samp{_bfd_final_link_relocate}
545 or @samp{_bfd_relocate_contents}, depending upon the definition of the
546 relocation and whether the link is relocateable or not.
547
548 @item
549 Test the linker. Test the case of a final link. If the relocation can
550 overflow, use a linker script to force an overflow and make sure the
551 error is reported correctly. Test a relocateable link, whether the
552 symbol is defined or undefined in the relocateable output. For both the
553 final and relocateable link, test the case when the symbol is a common
554 symbol, when the symbol looked like a common symbol but became a defined
555 symbol, when the symbol is defined in a different object file, and when
556 the symbol is defined in the same object file.
557
558 @item
559 In order for linking to another object file format, such as S-records,
560 to work correctly, @samp{bfd_perform_relocation} has to do the right
561 thing for the relocation. You may need to set the
562 @samp{special_function} field to handle this correctly. Test this by
563 doing a link in which the output object file format is S-records.
564
565 @item
566 Using the linker to generate relocateable output in a different object
567 file format is impossible in the general case, so you generally don't
568 have to worry about that. Linking input files of different object file
569 formats together is quite unusual, but if you're really dedicated you
570 may want to consider testing this case, both when the output object file
571 format is the same as your format, and when it is different.
572 @end itemize
573
574 @node BFD relocation future
575 @subsection BFD relocation future
576
577 Clearly the current BFD relocation support is in bad shape. A
578 wholescale rewrite would be very difficult, because it would require
579 thorough testing of every BFD target. So some sort of incremental
580 change is required.
581
582 My vague thoughts on this would involve defining a new, clearly defined,
583 howto structure. Some mechanism would be used to determine which type
584 of howto structure was being used by a particular format.
585
586 The new howto structure would clearly define the relocation behaviour in
587 the case of an assembly, a relocateable link, and a final link. At
588 least one special function would be defined as an escape, and it might
589 make sense to define more.
590
591 One or more generic functions similar to @samp{bfd_perform_relocation}
592 would be written to handle the new howto structure.
593
594 This should make it possible to write a generic version of the relocate
595 section functions used by the new linker. The target specific code
596 would provide some mechanism (a function pointer or an initial
597 conversion) to convert target specific relocations into howto
598 structures.
599
600 Ideally it would be possible to use this generic relocate section
601 function for the generic linker as well. That is, it would replace the
602 @samp{bfd_generic_get_relocated_section_contents} function which is
603 currently normally used.
604
605 For the special case of ELF dynamic linking, more consideration needs to
606 be given to writing ELF specific but ELF target generic code to handle
607 special relocation types such as GOT and PLT.
608
609 @node Index
610 @unnumberedsec Index
611 @printindex cp
612
613 @contents
614 @bye