From ea22e23a0c9fbc97f7b501a9e3670c8ca764306b Mon Sep 17 00:00:00 2001 From: Matthew Malcomson Date: Fri, 4 Nov 2022 10:01:50 +0000 Subject: [PATCH] Avoid adjusting an eh_frame symbol into a section symbol plus offset GNU bfd linker removes duplicate CIE and FDE entries in the exception handling information. When it does this entries in the .eh_frame section end up in different positions to where they were originally. In order to account for that, when the linker removes an FDE/CIE entry from one object's .eh_frame section in order to prefer an equivalent entry in another object's .eh_frame section, the linker adjusts symbols which were pointing to the first entry to point to the second. If the assembler has changed symbols pointing into the .eh_frame section such that they are now described by a section symbol plus offset, the linker can not perform this transformation. This means that symbols can end up pointing at different information than they originally pointed at. NOTE1: This changes the behaviour of this on *all* targets. As it stands that seems like the correct approach since the linker behaviour that we are accounting for is a general behaviour. On top of that, this translation should not make a change in functionality if the linker behaviour were not enabled for some target (since without the linker behaviour this transformation should not affect anything -- which is why it's believed to be safe in the first place). However it is still important to note that we have not actually tested these changes on other architectures. NOTE2: Since the GNU linker makes its decision to look for items to merge or not based on the *output* section name, there is a mapping between output sections and input sections that can be modified by the user, and we may not even be using the GNU linker in the first place, our patch can not be 100% accurate and robust when choosing which sections to avoid this adjustment. It is still desirable to avoid problematic adjustment in the common case of using the GNU linker with the default mapping between input sections and output sections. Though it may not be desirable to hard-code a feature of the default linker script at the time of writing this patch into GAS. Here we use the same check that the assembler uses in gas/dw2gencfi.c to identify a .eh_frame section. This has the benefits of being a check the assembler is using already (so the assembler is internally consistent) and matching the split that the default bfd linker scripts make between all input sections that said scripts name. E.g. the default linker script for aarch64-none-elf matches the following patterns for an output section named .eh_frame_hdr { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } and matches the below for an output section named .eh_frame { KEEP (*(.eh_frame)) *(.eh_frame.*) }. The linker then applies the problematic transformation to the .eh_frame output section and not to the .eh_frame_hdr section. The check we use here makes a corresponding decision to all sections which would be caught by the above patterns. I.e. it avoids adjusting symbols in sections which would end up in the .eh_frame output section and does not avoid adjusting symbols in sections which would end up in the .eh_frame_hdr output section. NOTE3: This behaviour by itself is not causing any problems for us. The trigger for making this change (especially in morello binutils) is that when crtbeginT.o "registers" an object's exception handling information with a static glibc, it uses `adrp` and `add` to access the __EH_FRAME_BEGIN__ symbol. Currently the relocation on `adrp` is adjusted into a "section symbol plus offset" transformation (which ends up as exactly the start of the section symbol in the crtbeginT.o object), but the `add` instruction is not adjusted in this way. It is this *difference* that is problematic. It means that we can end up with a broken pointer using the .eh_frame page and the __EH_FRAME_BEGIN__ offset into a page. With base AArch64, both these instructions would be adjusted to point to the .eh_frame section of crtbeginT.o. This is still a buggy behaviour in the assembler due to the reasons given above, but it at least meant that the static glibc got a sensible pointer (though one starting after any exception frame information on the crti.o and crt1.o object files). With this change, both instructions stay pointing at __EH_FRAME_BEGIN__ in the object file. That means that the linker will leave both instructions pointing at the same place after de-duplication of exception information. That place is not guaranteed to be the start of the total exception frame information, but in practice it is always closer to the start of the debug frame than without having made this patch. We could have stopped static glibc from crashing by making sure that we accessed the .eh_frame section symbol for both instructions rather than using __EH_FRAME_BEGIN__ for both. This would behave in the same way as stock AArch64. This would mean that static glibc would not be affected by the particulars of how the GNU bfd linker merges CIE and FDE entries together. On the other hand it would mean that static glibc would never have the ability to unwind through start code in crt1.o and crti.o. I don't have a particularly strong opinion on which of these is the best approach, I chose this one since it gives the static glibc access to the full debug information for the moment. --- gas/testsuite/gas/aarch64/eh-frame-symbols.d | 24 +++++++++++++++ gas/testsuite/gas/aarch64/eh-frame-symbols.s | 31 ++++++++++++++++++++ gas/write.c | 22 ++++++++++++++ 3 files changed, 77 insertions(+) create mode 100644 gas/testsuite/gas/aarch64/eh-frame-symbols.d create mode 100644 gas/testsuite/gas/aarch64/eh-frame-symbols.s diff --git a/gas/testsuite/gas/aarch64/eh-frame-symbols.d b/gas/testsuite/gas/aarch64/eh-frame-symbols.d new file mode 100644 index 00000000000..8276648ea1c --- /dev/null +++ b/gas/testsuite/gas/aarch64/eh-frame-symbols.d @@ -0,0 +1,24 @@ +# Checking that our relocations against the symbol __EH_FRAME_BEGIN__ are not +# transformed into relocations against the section symbol .eh_frame. +#as: -march=morello+c64 +#objdump: -dr + +.*: file format .* + + +Disassembly of section \.text: + +.* : + *[0-9a-f]+: ........ stp c29, c30, \[csp, #-64\]! + *[0-9a-f]+: ........ adrp c0, 0 + 4: R_MORELLO_ADR_PREL_PG_HI20 __EH_FRAME_BEGIN__ + *[0-9a-f]+: ........ adrp c0, 0 + 8: R_MORELLO_ADR_PREL_PG_HI20 \.eh_frame + *[0-9a-f]+: ........ add c0, c0, #0x0 + c: R_AARCH64_ADD_ABS_LO12_NC __EH_FRAME_BEGIN__ + *[0-9a-f]+: ........ add c0, c0, #0x0 + 10: R_AARCH64_ADD_ABS_LO12_NC \.eh_frame + *[0-9a-f]+: ........ ldp c29, c30, \[csp\], #64 + *[0-9a-f]+: ........ ret c30 + +#... diff --git a/gas/testsuite/gas/aarch64/eh-frame-symbols.s b/gas/testsuite/gas/aarch64/eh-frame-symbols.s new file mode 100644 index 00000000000..a10b59e034e --- /dev/null +++ b/gas/testsuite/gas/aarch64/eh-frame-symbols.s @@ -0,0 +1,31 @@ +.section .eh_frame,"a" +.type __EH_FRAME_BEGIN__, %object +__EH_FRAME_BEGIN__: + +.text +.type get_eh_frame_begin, %function +get_eh_frame_begin: + .cfi_startproc purecap + stp c29, c30, [csp, -64]! + .cfi_def_cfa_offset 64 + .cfi_offset 227, -64 + .cfi_offset 228, -48 + adrp c0, __EH_FRAME_BEGIN__ + adrp c0, .eh_frame + add c0, c0, :lo12:__EH_FRAME_BEGIN__ + add c0, c0, :lo12:.eh_frame + ldp c29, c30, [csp], 64 + .cfi_restore 228 + .cfi_restore 227 + .cfi_def_cfa_offset 0 + ret + .cfi_endproc + +.global _start +.type _start, %function +_start: + mov x0, #0 + ret + +# .zero 0xff0 - 0x38 - 0x78 +.zero 0xff0 - 0x38 diff --git a/gas/write.c b/gas/write.c index 054f27987d5..5301bbd4650 100644 --- a/gas/write.c +++ b/gas/write.c @@ -878,6 +878,28 @@ adjust_reloc_syms (bfd *abfd ATTRIBUTE_UNUSED, continue; } + /* Avoid adjusting a relocation against a symbol pointing into a + ".eh_frame" section in an ELF binary. The GNU bfd linker attempts + to de-duplicate CIE/FDE's in *output* .eh_frame sections in ELF + binaries and adjust any relocations pointing at the now removed + duplicate to point to the remaining entry. + Hence a symbol which had been adjusted to a section symbol plus + offset would end up pointing at something completely different. + + We can not robustly match the exact linker-input sections that will + end up in the .eh_frame output section, since users may provide + their own linker scripts. However it does seem useful to avoid + transforming symbols in the sections that are expected to end up in + this linker output section. Rather than add a clause here to match + the current default linker script we use the same check based on + section name as is used in `gas/dw2gencfi.c` to check for .eh_frame + sections. */ + if (IS_ELF + && strncmp (segment_name (symsec), + ".eh_frame", sizeof ".eh_frame" - 1) == 0 + && segment_name (symsec)[9] != '_') + continue; + /* Never adjust a reloc against local symbol in a merge section with non-zero addend. */ if ((symsec->flags & SEC_MERGE) != 0 -- 2.39.2