]> git.ipfire.org Git - thirdparty/binutils-gdb.git/blame - bfd/doc/bfdsumm.texi
Update year range in copyright notice of binutils files
[thirdparty/binutils-gdb.git] / bfd / doc / bfdsumm.texi
CommitLineData
252b5132 1@c This summary of BFD is shared by the BFD and LD docs.
250d07de 2@c Copyright (C) 2012-2021 Free Software Foundation, Inc.
5bf135a7 3
252b5132
RH
4When an object file is opened, BFD subroutines automatically determine
5the format of the input object file. They then build a descriptor in
6memory with pointers to routines that will be used to access elements of
7the object file's data structures.
8
fe032580 9As different information from the object files is required,
252b5132
RH
10BFD reads from different sections of the file and processes them.
11For example, a very common operation for the linker is processing symbol
12tables. Each BFD back end provides a routine for converting
13between the object file's representation of symbols and an internal
14canonical format. When the linker asks for the symbol table of an object
15file, it calls through a memory pointer to the routine from the
16relevant BFD back end which reads and converts the table into a canonical
17form. The linker then operates upon the canonical form. When the link is
18finished and the linker writes the output file's symbol table,
19another BFD back end routine is called to take the newly
20created symbol table and convert it into the chosen output format.
21
22@menu
23* BFD information loss:: Information Loss
24* Canonical format:: The BFD canonical object-file format
25@end menu
26
27@node BFD information loss
28@subsection Information Loss
29
30@emph{Information can be lost during output.} The output formats
31supported by BFD do not provide identical facilities, and
32information which can be described in one form has nowhere to go in
33another format. One example of this is alignment information in
34@code{b.out}. There is nowhere in an @code{a.out} format file to store
35alignment information on the contained data, so when a file is linked
36from @code{b.out} and an @code{a.out} image is produced, alignment
37information will not propagate to the output file. (The linker will
38still use the alignment information internally, so the link is performed
39correctly).
40
41Another example is COFF section names. COFF files may contain an
42unlimited number of sections, each one with a textual section name. If
43the target of the link is a format which does not have many sections (e.g.,
44@code{a.out}) or has sections without names (e.g., the Oasys format), the
45link cannot be done simply. You can circumvent this problem by
46describing the desired input-to-output section mapping with the linker command
47language.
48
49@emph{Information can be lost during canonicalization.} The BFD
50internal canonical form of the external formats is not exhaustive; there
51are structures in input formats for which there is no direct
52representation internally. This means that the BFD back ends
53cannot maintain all possible data richness through the transformation
54between external to internal and back to external formats.
55
56This limitation is only a problem when an application reads one
57format and writes another. Each BFD back end is responsible for
58maintaining as much data as possible, and the internal BFD
59canonical form has structures which are opaque to the BFD core,
60and exported only to the back ends. When a file is read in one format,
61the canonical form is generated for BFD and the application. At the
62same time, the back end saves away any information which may otherwise
63be lost. If the data is then written back in the same format, the back
64end routine will be able to use the canonical form provided by the
65BFD core as well as the information it prepared earlier. Since
66there is a great deal of commonality between back ends,
67there is no information lost when
68linking or copying big endian COFF to little endian COFF, or @code{a.out} to
69@code{b.out}. When a mixture of formats is linked, the information is
70only lost from the files whose format differs from the destination.
71
72@node Canonical format
73@subsection The BFD canonical object-file format
74
75The greatest potential for loss of information occurs when there is the least
76overlap between the information provided by the source format, that
77stored by the canonical format, and that needed by the
78destination format. A brief description of the canonical form may help
79you understand which kinds of data you can count on preserving across
80conversions.
81@cindex BFD canonical format
82@cindex internal object-file format
83
84@table @emph
85@item files
86Information stored on a per-file basis includes target machine
87architecture, particular implementation format type, a demand pageable
88bit, and a write protected bit. Information like Unix magic numbers is
89not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
90file would have both the demand pageable bit and the write protected
91text bit set. The byte order of the target is stored on a per-file
92basis, so that big- and little-endian object files may be used with one
93another.
94
95@item sections
96Each section in the input file contains the name of the section, the
97section's original address in the object file, size and alignment
98information, various flags, and pointers into other BFD data
99structures.
100
101@item symbols
102Each symbol contains a pointer to the information for the object file
103which originally defined it, its name, its value, and various flag
104bits. When a BFD back end reads in a symbol table, it relocates all
105symbols to make them relative to the base of the section where they were
106defined. Doing this ensures that each symbol points to its containing
107section. Each symbol also has a varying amount of hidden private data
108for the BFD back end. Since the symbol points to the original file, the
109private data format for that symbol is accessible. @code{ld} can
110operate on a collection of symbols of wildly different formats without
111problems.
112
113Normal global and simple local symbols are maintained on output, so an
114output file (no matter its format) will retain symbols pointing to
115functions and to global, static, and common variables. Some symbol
116information is not worth retaining; in @code{a.out}, type information is
117stored in the symbol table as long symbol names. This information would
a05a5b64 118be useless to most COFF debuggers; the linker has command-line switches
252b5132
RH
119to allow users to throw it away.
120
121There is one word of type information within the symbol, so if the
122format supports symbol type information within symbols (for example, COFF,
fdef3943 123Oasys) and the type is simple enough to fit within one word
252b5132
RH
124(nearly everything but aggregates), the information will be preserved.
125
126@item relocation level
127Each canonical BFD relocation record contains a pointer to the symbol to
128relocate to, the offset of the data to relocate, the section the data
129is in, and a pointer to a relocation type descriptor. Relocation is
130performed by passing messages through the relocation type
131descriptor and the symbol pointer. Therefore, relocations can be performed
132on output data using a relocation method that is only available in one of the
133input formats. For instance, Oasys provides a byte relocation format.
134A relocation record requesting this relocation type would point
135indirectly to a routine to perform this, so the relocation may be
136performed on a byte being written to a 68k COFF file, even though 68k COFF
137has no such relocation type.
138
139@item line numbers
140Object formats can contain, for debugging purposes, some form of mapping
141between symbols, source line numbers, and addresses in the output file.
142These addresses have to be relocated along with the symbol information.
143Each symbol with an associated list of line number records points to the
144first record of the list. The head of a line number list consists of a
145pointer to the symbol, which allows finding out the address of the
146function whose line number is being described. The rest of the list is
147made up of pairs: offsets into the section and line numbers. Any format
148which can simply derive this information can pass it successfully
fdef3943 149between formats.
252b5132 150@end table