git.ipfire.org Git - thirdparty/binutils-gdb.git/log

[Morello/gdbserver] Fix incorrect vector resize operation

This patch fixes an incorrect vector resize operation when reading the auxv. A
mistake makes the current code copy data over the end of the vector buffer,
leading to memory corruption.

Fix this by having a pointer to the end of the vector buffer before resizing the
took place.

gas: Implement categorization of Morello-specific instructions

While the concept of a core instruction relates to the idea of
instructions that are available irrespective of the presence of
architectural extensions, this concept breaks down with the
introduction of the Morello architecture.

Rather, what is observed in Morello is that when PSTATE.C64 == 1,
the A64C_INSN variant becomes the ONLY valid aarch64_opcode variant,
with the CORE_INSN variant becoming illegal.  Therefore, some way of
ruling out the use of such CORE_INSNs is needed.

Similarly, some A64C_INSN instructions are only valid for
PSTATE.C64 == 1 and are not valid when compiling for Morello A64
mode.

At the assembly level, the CORE_INSN and A64C_INSN variants share the
same mnemonic, differing only by whether they are passed a general-
purpose register argument or its capability counterpart, e.g.

  * CORE_INSN: adr x0, #0
  * A64C_INSN: adr c0, #0

This makes the prospect of combining both insn variants in binutils
into a single insn entry in aarch64_opcode_table[], resolving the
appropriate operand code (e.g. AARCH64_OPND_Can versus
AARCH64_OPND_Rn) at compile time by analyzing the -march and -mabi
flags.

This approach falls short when dealing with instructions such as `bl'
where the core and morello instructions share the same mnemonic but
have distinct encodings.

A more flexible approach is therefore presented here.  Special
restrictions to instructions are encoded in the FLAGS field, which
can then be used in checks carried out in `md_assemble'.

This fixes two issues:
  1. Wrong fix suggestions in `output_operand_error_record':
     - attempting to assemble `adr w0, #0' at present, for example,
     results in a suggestion that `w0' be changed to `x0' as opposed
     to `c0'.
  2. Purecap only instructions being accepted when assembling without
  the C64 extension:
     - `adr c0, #0' is currently accepted when assembling for
     Hybrid mode.

This patch defines the F_NONC64 and F_C64ONLY flags for labellig these
instructions in aarch64_opcode.flags, such that unavailable instructions
could be identified by cross-referencing this field along with whether
C64 is set in the `cpu_variant' aarch64_feature_set variable.  When
the conditions set by the flag is not met by `cpu_variant', the
instruction can be attributed a AARCH64_OPDE_SYNTAX_ERROR, allowing
for correct error handling in md_assemble.

ChangeLog:
  * include/opcode/aarch64.h (F_NONC64): New flag.
  * include/opcode/aarch64.h (F_C64ONLY): Likewise.

opcodes/ChangeLog:
  * aarch64-tbl.h (aarch64_opcode_table): Add F_NONC64 and F_C64ONLY
  to relevant aarch64_opcodes

gas/ChangeLog:
  * config/tc-aarch64.c (validate_opcode_for_feature): New.
  (md_assemble): Use `validate_opcode_for_feature' in template
  selection.
  * gas/testsuite/gas/aarch64/morello-exclude.l: New testcase.
  * gas/testsuite/gas/aarch64/morello-exclude.s: Likewise.
  * gas/testsuite/gas/aarch64/morello-exclude.l: Likewise.
  * gas/testsuite/gas/aarch64/morello_insn.s: Fix hybrid codegen.

Handle newer Morello Linux Kernels that set more restrictive bounds

Newer Morello Linux Kernels set more restrictive bounds in general [1], and
that is also true for CSP.

Adjust Morello GDB to cope with this, otherwise it will run into a tag fault
when attempting to start a program or call a function by hand.

Add support for readline 8.2

In readline 8.2 the type of rl_completer_word_break_characters changed to
include const.

morello-binutils: Testsuite fixup for linux build

The symbol that objdump reports for the start of the data section is not
important and is different between linux and bare metal builds.

Just avoiding specifying this symbol in our testcase fixes a testsuite
failure in the linux build.

Fix tests after rebase of 36b6002396d onto 8504495ada4

When we started to account for function stubs when determining our
PCC bounds and section bounds we stopped including padding to make PCC
bounds precise in the highest section that should be included in the PCC
range. Instead we put such padding *after* that section.

Our patch handling IFUNC's was written in parallel, and its testcases
were using the previous setup (where PCC padding was included in the
last section). Hence when calculating a fragment that we should see in
our testcase we took the highest address in the last RELRO section.

After the application of both patches that approach no longer works.
With the new mechanism for padding there is no longer in general a
guarantee that any value in the section table could correspond to the
PCC bounds (since the alignment of the first writeable section could be
high enough such that the PCC bounds are somewhere between the end of
the last RELRO section and the start of the first WRITE section).
However for all these testcases the start of the first WRITE section
seems good enough to not be flaky.

Disable some symbol -> section_symbol + offset translations

We're disabling transformations of relocations against symbols like this
in the assembler when the relocation is against something in the GOT and
when the relocation is against something which generates a capability.

For entries in the GOT we disable this transformation since the GNU bfd
linker relies on indexing into its internal representation of the GOT
using symbols and does not distiinguish between entries using the same
symbol but different offsets.  Hence transforming multiple symbols into
the same section symbol with different offsets would mean that at least
one will get an incorrect value.

Relocations which require the static linker to emit dynamic relocations
in order to generate capabilities (CAPINIT and capability relocations
into the GOT) require symbol information so that the dynamic linker can
put correct permissions and bounds on those relocations.

NOTE: We get to use an existing testcase for this change, but it showed
up something strange about objdump.  One `adrp` instruction has changed
in the output so that it shows as pointing to a different location.
This happens to be an `objdump` quirk.  Objdump looks at the relocation
associated with an address and attempts to include that relocation when
determining what address to print out.  This mechanism has two problems,
one is that objdump does not account for the offset in that relocation
(only the symbol).  Another is that on an object file (i.e. not a final
executable) the virtual memory address of all sections is zero.  These
combined mean that the vma is miscalculated, and the translation from
vma to symbol is not injective.  In other words: the extra change in
morello-ldst-reloc.d on top of switching the relocation symbols is in
order to account for an objdump bug and not a problem with this gas
change.

Avoid adjusting an eh_frame symbol into a section symbol plus offset

GNU bfd linker removes duplicate CIE and FDE entries in the exception
handling information.  When it does this entries in the .eh_frame
section end up in different positions to where they were originally.

In order to account for that, when the linker removes an FDE/CIE entry
from one object's .eh_frame section in order to prefer an equivalent
entry in another object's .eh_frame section, the linker adjusts symbols
which were pointing to the first entry to point to the second.

If the assembler has changed symbols pointing into the .eh_frame section
such that they are now described by a section symbol plus offset, the
linker can not perform this transformation.  This means that symbols can
end up pointing at different information than they originally pointed
at.

NOTE1: This changes the behaviour of this on *all* targets.  As it
stands that seems like the correct approach since the linker behaviour
that we are accounting for is a general behaviour.  On top of that, this
translation should not make a change in functionality if the linker
behaviour were not enabled for some target (since without the linker
behaviour this transformation should not affect anything -- which is why
it's believed to be safe in the first place).  However it is still
important to note that we have not actually tested these changes on
other architectures.

NOTE2: Since the GNU linker makes its decision to look for items to
merge or not based on the *output* section name, there is a mapping
between output sections and input sections that can be modified by the
user, and we may not even be using the GNU linker in the first place,
our patch can not be 100% accurate and robust when choosing which
sections to avoid this adjustment.

It is still desirable to avoid problematic adjustment in the common case
of using the GNU linker with the default mapping between input sections
and output sections.  Though it may not be desirable to hard-code a
feature of the default linker script at the time of writing this patch
into GAS.
Here we use the same check that the assembler uses in gas/dw2gencfi.c to
identify a .eh_frame section.  This has the benefits of being a check
the assembler is using already (so the assembler is internally
consistent) and matching the split that the default bfd linker scripts
make between all input sections that said scripts name.

E.g. the default linker script for aarch64-none-elf matches the
following patterns for an output section named .eh_frame_hdr
{ *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) }
and matches the below for an output section named .eh_frame
{ KEEP (*(.eh_frame)) *(.eh_frame.*) }.
The linker then applies the problematic transformation to the .eh_frame
output section and not to the .eh_frame_hdr section.
The check we use here makes a corresponding decision to all sections
which would be caught by the above patterns.  I.e. it avoids adjusting
symbols in sections which would end up in the .eh_frame output section
and does not avoid adjusting symbols in sections which would end up in
the .eh_frame_hdr output section.

NOTE3: This behaviour by itself is not causing any problems for us.  The
trigger for making this change (especially in morello binutils) is that
when crtbeginT.o "registers" an object's exception handling information
with a static glibc, it uses `adrp` and `add` to access the
__EH_FRAME_BEGIN__ symbol.  Currently the relocation on `adrp` is
adjusted into a "section symbol plus offset" transformation (which ends
up as exactly the start of the section symbol in the crtbeginT.o
object), but the `add` instruction is not adjusted in this way.
It is this *difference* that is problematic.

It means that we can end up with a broken pointer using the
.eh_frame page and the __EH_FRAME_BEGIN__ offset into a page.

With base AArch64, both these instructions would be adjusted to point to
the .eh_frame section of crtbeginT.o.  This is still a buggy behaviour
in the assembler due to the reasons given above, but it at least meant
that the static glibc got a sensible pointer (though one starting after
any exception frame information on the crti.o and crt1.o object files).

With this change, both instructions stay pointing at __EH_FRAME_BEGIN__
in the object file.  That means that the linker will leave both
instructions pointing at the same place after de-duplication of
exception information.  That place is not guaranteed to be the start of
the total exception frame information, but in practice it is always
closer to the start of the debug frame than without having made this
patch.

We could have stopped static glibc from crashing by making sure that we
accessed the .eh_frame section symbol for both instructions rather than
using __EH_FRAME_BEGIN__ for both.  This would behave in the same way as
stock AArch64.
This would mean that static glibc would not be affected by the
particulars of how the GNU bfd linker merges CIE and FDE entries
together.  On the other hand it would mean that static glibc would never
have the ability to unwind through start code in crt1.o and crti.o.

I don't have a particularly strong opinion on which of these is the best
approach, I chose this one since it gives the static glibc access to the
full debug information for the moment.

libiberty: Account for CHERI alignment requirement in objalloc

The calculation of OBJALLOC_ALIGN in include/objalloc.h ensures that
allocations are sufficiently aligned for doubles, but on CHERI
architectures it is possible that void * has a greater alignment
requirement than double.

Instead of deriving the alignment requirement from double alone, this
patch uses a union to compute the maximum alignment between double and
void *.

This fixes alignment faults seen when compiling the binutils for
pure-capability Morello. With this patch applied, the majority of
binutils tests pass when the binutils themselves are compiled for
purecap.

This patch is a backport of commit
a8af417a8a1559a3ebceb0c761cf26ebce5eab7f, initially upstreamed to
Morello GCC.

Fix signed/unsigned address extraction in capability maintenance commands

For some cases, I noticed the parse_and_eval_address function was sign-extending
addresses, which is not desirable. This leads to incorrect addresses being passed
to the target.

Fix this by using parse_and_eval_long instead.

Only check for valid Morello bounds on non-exec syms

Capabilities pointing to symbols in SEC_CODE sections are given the
bounds of the entire PCC.  We ensure that the PCC bounds are padded and
aligned as needed in the linker.

Capabilities pointing to other symbols (e.g. in data sections) are given
the bounds of the symbol that they point to.  It is the responsibility
of the assembly generator (i.e. usually the compiler) to ensure these
bounds are correctly aligned and padded as necessary.

We emit a warning for imprecise bounds in the second case, until this
patch that warning also looked at the first case.  This was a mistake
and is rectified in this commit.

Various fixes for capability IFUNCs

1) Enable having a CAPINIT relocation against an IFUNC.
   We update the `final_link_relocate` switch case around IFUNC's to
   also handle CAPINIT relocations.  The handling of CAPINIT relocations
   is slightly different than for AARCH64_NN (i.e. ABS64) relocations
   since we generally need to emit a dynamic relocation.

   Handling this relocation also needs to manage the PDE case when a
   hard-coded address has been put into code to satisfy something like
   an `adrp`.  In these cases the canonical address of the IFUNC becomes
   its PLT stub rather than the result of the resolver.  We then need to
   use a RELATIVE relocation rather than an IRELATIVE one.
   N.b. unlike the ABS64 relocation, since a CAPINIT will always emit a
   dynamic relocation we do not require pointer equality adjustments on
   a symbol from having seen a CAPINIT.  That means we do not need to
   request that the PLT stub of an IFUNC is treated as the canonical
   address just from having seen a CAPINIT relocation.

   A CAPINIT relocation against an IFUNC needs to be recorded internally
   so that _bfd_elf_allocate_ifunc_dyn_relocs does not garbage collect
   the PLT stub and associated IRELATIVE relocation.

   See changes in the CAPINIT case of the IFUNC switch of
   elfNN_aarch64_final_link_relocate, and in the CAPINIT case of
   elfNN_aarch64_check_relocs.

2) Ensure that GOT relocations against an IFUNC have their fragment
   populated with the LSB set.

   For GOT relocations against a capability IFUNC we need to introduce a
   relocation for the runtime to provide us with a valid capability.

   See changes in the GOT cases of the IFUNC switch of
   elfNN_aarch64_final_link_relocate, changes in the
   elfNN_aarch64_allocate_ifunc_dynrelocs function, and changes around
   handling an IFUNC GOT entry in elfNN_aarch64_finish_dynamic_symbol.

3) Ensure that mapping symbols are emitted for the .iplt.  Without this
   many of the testcases here are disassembled incorrectly.

   See changes in elfNN_aarch64_output_arch_local_syms.

4) IRELATIVE relocations are against symbols which are not in the
   dynamic symbol table, hence they need their fragment populated to
   inform the dynamic linker the bounds and permissions to call the
   associated resolver with.

   See part of the CAPINIT IFUNC handling in
   elfNN_aarch64_final_link_relocate, and the IRELATIVE handling in
   elfNN_aarch64_create_small_pltn_entry.

5) Disallow an ABS64 relocation against a purecap IFUNC.  Such a
   relocation is expecting a 64-bit value but the function will return a
   capability.  Some handling could be implemented by some communication
   method to the dynamic linker that this particular value should be
   64-bit (maybe by emitting an AARCH64_IRELATIVE relocation rather than
   a MORELLO_IRELATIVE one), but as yet GCC doesn't generate such a
   relocation and we believe it's unlikely to be needed.

   See new error check in AARCH64_NN clause of
   elfNN_aarch64_final_link_relocate.

6) Ensure that for statically linked PDE's, we segregate IRELATIVE and
   RELATIVE relocations.  IRELATIVE relocs should be in the .rela.iplt
   section, while RELATIVE relocs should be in the .rela.dyn section.

   Correspondingly all RELATIVE relocations should be between the
   __rela_dyn_{start,end} symbols, and all IRELATIVE relocations should
   be between the __rela_iplt_{start,end} symbols.

   This segregation is made based on dynamic relocation type rather than
   static relocation that generates it.  The segregation allows the
   static libc to more easily handle relocations.

Update testcases accordingly.
We introduce some new testcases, morello-ifunc.s contains uses of an
IFUNC which has been referenced directly in code.  When compiling a PDE
this triggers the pointer equality requirement and hence the canonical
address for this symbol becomes the PLT stub rather than the result of
the resolver.

morello-ifunc1.s does not use the IFUNC directly in code so that the
address used everywhere is the result of the resolver.

Both of these have testcases assembled and linked for static,
dynamically linked PDE, and PIE.  The testcase without a hard-coded
access also has a testcase for -shared.

morello-ifunc2.s is written to check that a CAPINIT relocation does
indeed stop the garbage collection of an IFUNC's PLT and IRELATIVE
relocation.

morello-ifunc3.s tests that we error on an ABS64 relocation against a
C64 IFUNC.

morello-ifunc-dynlink.s tests that a CAPINIT relocation against an IFUNC
symbol defined in a shared library behaves the same way as one against a
FUNC symbol defined in a shared library.

Implementation note:
When segregating IRELATIVE and RELATIVE relocs the change for
relocations against IFUNC symbols populated in the GOT is
straightforward.

For CAPINIT relocations the change is not as straightforward.  The
problem is that on sight of CAPINIT relocations in check_relocs we
immediately allocate space in the srelcaps section.  In trying to
satisfy the above we need to know whether we're going to be emitting an
IRELATIVE relocation or RELATIVE one in order to know which section it
should go in.  The determining factor between these two kinds of
relocations is whether there is a text relocation to this IFUNC symbol,
since that determines whether we need to make this CAPINIT relocation
a RELATIVE relocation pointing to the PLT stub (in order to satisfy
pointer equality) or an IRELATIVE relocation pointing to the resolver.

Whether such a relocation occurs is recorded against each symbol in the
pointer_equality_needed member.  This can only be known after all
relocations have been seen in check_relocs.  Hence, when coming across a
CAPINIT relocation in check_relocs we do not in general know whether
this CAPINIT relocation should end up as an IRELATIVE or RELATIVE
relocation.

This patch postpones the decision by recording the number of CAPINIT
relocations against a given symbol in a hash table while going through
check_relocs and allocating the relevant space in the required section
in size_dynamic_sections.

N.b. this is similar in purpose to the dyn_relocs linked list on a
symbol.  We do not use that existing member which is on every symbol
since the structure does not allow any indication of what kind of
relocation triggered the need.  Moreover the structure is used for
different purposes throughout the linker and disentangling the new
meaning from the existing ones seems overly confusing.

Overall, the decisions about which sections relocations against an IFUNC
should go in are:
  CAPINIT relocations:
    If this is a static PDE link, and the symbol does not need pointer
    equality handling, then this should emit an IRELATIVE relocation and
    that should go in the .rela.iplt section.

    If this is a PIC link, then this should go in the .rela.ifunc
    section (along with all other dynamic relocations against the IFUNC,
    as commented in _bfd_elf_allocate_ifunc_dyn_relocs).

    Otherwise this relocation should go in the srelcaps section (which
    goes in .rela.dyn).

  GOT relocations:
    If this is a static PDE link, and the symbol does not need pointer
    equality, then this should emit an IRELATIVE relocation into the
    .rela.iplt section.

    If this is a static PDE link, then this should emit a RELATIVE
    relocation and that should go in the srelcaps section (which is in
    .rela.dyn).

    Otherwise this should go in .rela.got section.

ld, aarch64: Account for stubs in bounds sizing

This patch deals with the interaction between the code that attempts to
make bounds precise (for both the PCC bounds and for some individual
sections) and the code that adds stubs (e.g. long-branch veneers and
interworking stubs) in the AArch64 backend.

We aim to set precise bounds for the PCC span and some individual
sections in elfNN_c64_resize_sections. However, it transpires that
elfNN_aarch64_size_stubs can change the layout in ways that extend
sections that should be covered under the PCC span outside of the bounds
set in elfNN_c64_resize_sections. The introduction of stubs can also
change (even reduce) the amount of padding required to make the bounds
on any given section precise.

To address this problem, we move the core logic from
elfNN_c64_size_sections into a new function, c64_resize_sections, that
is safe to be called repeatedly. Similarly, we move the core logic from
elfNN_aarch64_size_stubs into a new function aarch64_size_stubs which
again can be called repeatedly.

We then adjust elfNN_aarch64_size_stubs to call aarch64_size_stubs and
c64_resize_sections in a loop, stopping when c64_resize_sections no
longer makes any changes to the layout.

An important observation made above is that the introduction of stubs
can change the amount of padding needed to make bounds precise. Likewise,
introducing padding can in theory necessitate the introduction of stubs
(e.g. if the change in layout necessitates a long-branch veneer). This
is why we run the resizing/stubs code in a loop until no further changes
are necessary.

Since the amount of padding needed to achieve precise bounds for a
section can change (indeed reduce) with the introduction of stubs, we
need a mechanism to update the amount of padding applied to a section in
a subsequent iteration of c64_resize_sections. We achieve this by
introducing a new interface in ld/emultempl/aarch64elf.em. We have the
functions:

static void
c64_set_section_padding (asection *osec, bfd_vma padding, void **cookie);

static void
c64_get_section_padding (void *cookie);

Here, the "cookie" value is, to consumers of this interface (i.e.
bfd/elfnn-aarch64.c), an opaque handle used to refer to the padding that
was introduced for a given section. The consuming code then passes back
the cookie to later query the amount of padding already installed or to
update the amount of padding.

Internally, within aarch64elf.em, the "cookie" is just a pointer to the
node in the ldexp tree containing the integer amount of padding
inserted.

In the AArch64 ELF backend, we then maintain a (lazily-allocated)
mapping between output sections and cookies in order to be able to
update the padding we installed in subsequent iterations of
c64_resize_sections.

While working on this patch, an edge case became apparent: the case
where pcc_high_sec requires precise bounds (i.e. where we call
ensure_precisely_bounded_section on pcc_high_sec). As it stands, in this
case, the code to ensure precise PCC bounds may in fact make the bounds
on pcc_high_sec itself no longer representable (even if we previously
ensured this by calling ensure_precisely_bounded_section). In general,
it is not always possible to choose an amount of padding to add to the
end of pcc_high_sec to make both pcc_high_sec and the PCC span itself
have precise bounds (without introducing an unreasonably large alignment
requirement on pcc_high_sec).

To handle the edge case above, we decouple these two problems by adding
a separate amount of padding *after* pcc_high_sec to make the PCC bounds
precise. If pcc_high_sec is required to have precise bounds, then that
can be done in the usual way by adding padding to pcc_high_sec in
ensure_precisely_bounded_section. The new mechanism for adding padding
after an output section is implemented in
aarch64elf.em:c64_pad_after_section.

To avoid having to add yet another mechanism to update the padding
*after* pcc_high_sec, we avoid adding this padding until all other
resizing / bounds-setting work is done. This is not possible for
individual sections since padding introduced there may have a knock-on
effect requiring further work, but we believe this isn't the case for
the padding added after pcc_high_sec to make the PCC bounds precise.

This patch also reveals a pre-existing issue whereby we end up calling
ensure_precisely_bounded_section on the *ABS* section. Without a further
change to prevent this, this can lead to a null pointer dereference in
ensure_precisely_bounded_section, since the "owner" field on the *ABS*
pointer is NULL, and we use this field to obtain a pointer to the output
BFD in the new c64_get_section_padding_info function.

Of course, it doesn't make sense for ensure_precisely_bounded_section to
be called on the *ABS* section in the first place. This can happen when
there are relocations against ldscript-defined symbols which are defined
at the top level of the ldscript (i.e. not in a particular output
section). Those symbols initially have their output section set to the
*ABS* section. Later, we resolve such symbols to their correct output
section in ldexp_finalize_syms, but the code in c64_resize_sections is
running in ldemul_after_allocation, which comes before the call to
ldexp_finalize_syms in the lang_process flow.

For now, we just skip such symbols when looking for sections that need
precise bounds in c64_resize_sections, but this issue will later need
fixing properly. We choose to avoid fixing the pre-existing issue in
this patch to avoid over-complicating an already complex change.

Add CPSR C64 bit (26)

Teach Morello GDB about bit 26 (C64), which shows if we have C64
execution state enabled or disabled.

Make various linker tests more robust

Various linker tests have been failing on aarch64-none-linux-gnu for a
while now.  We've ignored their failure since we knew they were from us
writing the tests with hard-coded values based on the aarch64-none-elf
toolchain.

A while ago we introduced a record/check feature in the testsuite that
has allowed re-writing some tests in a more general manner.  This commit
adjusts the remaining tests using this new feature.

While updating these remaining tests we also changed the record/check
feature to replace *all* occurrences of a `check` pattern in the line we
want to use.  This seems like what would be the expected behaviour,
although we did not actually need that change in this commit.

Only one testcase was slightly tricky to generalise.

This was the `morello-pcc-bounds-include-readonly` testcase.  We wanted
to ensure that the `othersection` was included in PCC bounds, but that
othersection was usually not the last RO section.  Since it was usually
not the last RO section we could not calculate the bounds that should be
given to our capability.

When using hard-coded addresses this was fine since we could just ensure
that the hard-coded address was large enough to span .othersection.
The record/check functionality does not allow checking that a value is
acceptable (i.e. the "check" part does not pass the values we see to
some procedure).  Rather it generates a value that the line should
match.

Hence in order to make this particular test general we emitted the
section headers, found the last read-only section, then asserted that
the size of the capability with executable permissions spanned up to the
end of that section.

The method by which we find the last read-only section could change
between different targets, but it at least works on both
aarch64-none-elf and aarch64-none-linux.

Add linker tests for TLS changes

This includes:
  - New tests for the new functionality for Morello TLS.
  - Adding target check for `-shared` to Morello tests that require
    a target that supports shared libraries.
  - Tests to ensure the extra error-checking emits errors when needed.
  - New tests for the new relocations defined for Morello TLS.
  - Some fixups for existing tests that don't seem like they deserve a
    separate commit.

Notes about some changes that seem like they require it:

tlsle-symbol-offset testcase:
  Do not use `-shared` as a linker argument.
  This testcase is checking for Local-Exec relocations, using these
  relocations in a shared library is simply not valid.  AFAICS there's
  no need for the `-shared` flag in this testcase, since it's just there
  to check that the tls LE relocations accept an addend.

  This was working before since we did not error on such things, but
  with the extra hardening that I've added we are now erroring on them.

morello-sec-start_stop-round testcase:
  We were originally searching for a specific __data_start symbol
  marking the start of the data section.  This was not part of what we
  actually wanted to test, and the symbol which was printed was
  different on aarch64-none-linux-gnu.  Hence we change our regex to
  search for any symbol to allow the test to pass on both targets.

morello-tlsie-overflow testcase:
  This testcase adds some padding in the .text section so that the GOT
  is very far away from the relocation which attempts to access it.
  This means that the relocation can not be satisfied and we can check
  for the resulting error message.

Extra error checking around TLS relocations

We add the following extra error checking:
  1) That TLS relocations (including SIZE relocations, but excluding
     Local-Exec relocations) are not requested against a symbol plus
     addend.
  2) That SIZE relocations are requested against a defined symbol in the
     current binary (i.e. one that the static linker knows the size of).
  3) A TLS Local-Exec relocation must be against a symbol in the current
     binary.

All the above also have error messages that describe the problem so that
the user could fix it.

Treating a relocation against a "symbol plus addend" as an error is due
to a combination of factors.
  - The linker implementation does not have any way to represent a GOT
    entry of "symbol plus addend".  Hence we currently just have silent
    bugs if asked to implement those relocations which require a GOT
    entry if they have a "symbol plus addend" relocation.
  - It would be wasteful anyway to have multiple entries in the GOT for
    e.g. sym+off1, sym+off2.
  - Morello size relocations don't support "symbol plus addend" since
    the meaning would have to be defined (is this the *remaining* size
    of the symbol?) and there is no known use for this.

We allow local-exec relocation on "symbol plus addend" since then the
addend just implements an offset into the object we're accessing (rather
than a new GOT entry for the location of "symbol plus addend").
There is also an existing testcase in the BFD linker to allow such
relocations.  The compiler can always avoid emitting these if it wants.

Notes on implementation:
  - We choose to check errors in final_link_relocate rather than
    check_relocs since this is where most existing error checking is
    done.
  - We check for errors around addends in relocate_section rather than
    final_link_relocate or check_relocs since final_link_relocate does
    not get told the *original* relocation (before TLS relaxation) and
    check_relocs does not know about addends coming from the result of
    previous relocations on the same code.
    N.b. in order to emit multiple errors when there are multiple
    relocations with an addend we change things in relocate_section to
    store a "return value" in a local variable and set it to false if
    any problem was seen but not return early.

Remove layout_sections_again argument to size_stubs

This was originally the first place that a function in
bfd/elfnn-aarch64.c was given a reference to
gldaarch64_layout_sections_again, and hence was the natural place to
store the function onto the elf hash table.

Ever since the introduction of elfNN_c64_resize_sections we have been
performing this operation in that function before this size_stubs
function.

Hence it seems sensible to remove the argument and now superfluous
operation from elfNN_aarch64_size_stubs.

Implement Morello TLS relaxations

The majority of the code change here is around TLS data stubs.  The new
TLS ABI requires that when relaxing a General Dynamic or Initial Exec
access to a variable to a Local Exec access, the linker emits data stubs
in a read-only section.

We do this with the below approach:
  - check_relocs notices that we need TLS data stubs by recognising that
    some relocation will need to be relaxed to a local-exec relocation.
  - check_relocs then records a hash table entry mapping the symbol that
    we are relocating against to a position in some data stub section.
    It also ensures that this data stub section has been created, and
    increments our data stub section size.
  - This section is placed in the resulting binary using the standard
    subsection and wildcard matching implemented by the generic linker.
  - In elfNN_aarch64_size_dynamic_sections we allocate the actual buffer
    for our data stub section.
  - When it comes to actually relaxing the TLS sequence,
    relocate_section directly populates the data stub using the address
    and size of the TLS object that has already been calculated, it then
    uses final_link_relocate to handle adjusting the text so that it
    points to this data stub.

Notes on implementation:

Mechanism by which we create and populate a TLS data section:
  There are currently three different ways by which the AArch64 backend
  creates and populates a special section.  These are
    - The method by which the .got (and related sections) are populated.
    - The method by which the interworking stubs are populated.
    - The method by which erratum 843419 stub sections are populated.

  We have gone with an approach that mostly follows that used to
  populate the .got.  Here we give an outline of the approaches and
  provide the reasoning by which the approach used by the .got was
  chosen.

  Handling the .got section:
  - Create a section on an existing BFD.
  - Mark that section as SEC_LINKER_CREATED.
  - Record the existing BFD as the `dynobj`.
  - bfd_elf_final_link still calls elf_link_input_bfd on the object.
  - elf_link_input_bfd avoids emitting the section (because of
    SEC_LINKER_CREATED).
  - bfd_elf_final_link then emits the special sections on `dynobj` after
    all the non-special sections on all objects have been relocated.
  - Allows updating the .got input section in relocate_section &
    final_link_relocate knowing that its contents will be output once all
    relocations on standard input sections have been processed.

   Handling interworking stub sections.
   - Create a special stub file.
   - Create sections on that stub file for each input section we need a
     stub for.
   - Manually populate the sections in build_stubs (which is called
     through `ldemul_finish` *before* `ldwrite` and hence before any
     other files are relocated).

   Handling erratum 843419 stub sections.
   - Create a special stub file.
   - Create sections on that stub file for each input section we need a
     stub for.
   - Ensure that the stub file is marked with class ELFCLASSNONE.
   - Ensure that the list of input sections for the relevant output
     section statement has the veneered input section *directly before*
     the stub section which has the veneer.
   - When relocating and outputting sections, having ELFCLASSNONE means
     that we output sections on the stub_file only when we see the
     corresponding input statement.  Without that class marker
     bfd_elf_final_link calls elf_link_input_bfd which writes out the
     data for all input sections on the relevant BFD.
   - Since we have ensured the input statement for our stub section is
     directly after the input statement for the section we are emitting
     veneers for, we know that the veneered section will be relocated
     and output before we output our stub section.
   - Hence we can copy relocated data from the veneered section into our
     stub section and know that our stub section will be output after
     this modification has been made.

  In deciding what to do with the read-only TLS data stubs we noticed
  the following problems with each approach:
  - The ABI requires that the read-only TLS data stubs are emitted into
    a read-only section.  This will necessarily be a different output
    section to .text where the requirement for these stubs is found.
    The temporal order in which output sections are written to the
    output file is tied to the order in which the in-memory linker
    statements are kept, and that is tied to the linker script provided
    by the user.  Hence we can not rely on ordering and ELFCLASSNONE to
    ensure that our data stub section is emitted after the relevant TLS
    sequences have been relaxed.  (We need to know our data stub section
    is written to the output after we have populated it as otherwise the
    data would not propagate to the resulting binary).
  - I think it is easier and simpler to find the data needed for the TLS
    data stubs in relocate_section just as we relax the relevant TLS
    sequences.  Hence I don't want to use the approach used for
    interworking stubs of populating the entire section beforehand.
  - Adding a section to `dynobj` would mean that we're adding a section
    to a user input BFD, which is not quite as clear as having a
    separate BFD for our special stub section.  It also means we treat
    this particular section as a "dynamic" section.  This is a little
    confusing nomenclature-wise.

  Based on the above trade-offs we chose the .got approach (accepting
  the negative that this will be stored on a user BFD).  N.b. using the
  .got approach and requiring the section get allocated in
  `size_dynamic_sections` is not problematic for static executables
  despite the nomenclature.  This function always gets called.

  One difference between how we handle the data stubs and how the .got
  is handled is that we do not count the number of data stubs required
  in size_dynamic_sections, but rather total it as we see relocations
  needing these stubs in check_relocs.
  We do this largely to avoid requiring another data member on all
  symbols to indicate information about whether this symbol needs a data
  stub and where that data stub is.  The number of TLS symbols are
  expected to be much smaller than the number of symbols with an entry
  in the GOT and hence a separate hash table just containing entries for
  those symbols which need such information is likely to often be
  smaller.

  N.b. it is interesting to mention that for all relocations which need
  a data stub we would make an input section on `dynobj` in
  `check_relocs` if that relaxation were not performed.  This is since
  if we did not realise they could be relaxed these relocations would
  have needed a .got entry.

  N.b. we must use make_section_anyway to create our TLS data stubs
  section in order to avoid any problems with our linker defined section
  having the same name as a section already defined by the user.

We do not use local stub symbols:
  The TLS ABI describes data stubs using specially named symbols.  These
  are not part of the ABI.  We could have associated the position of a
  data stub with a particular symbol by generating a symbol internally
  using some name mangling scheme that matches that in the TLS ABI
  examples and points to the data stub for a particular symbol.  We take
  the current approach on the belief that it is "neater" to avoid
  relying on such a name-mangling scheme and the associated sprintf
  calls.

final_link_relocate handling the adjusted relocation for data stubs:
  final_link_relocate does not actually use the `h` or `sym` arguments
  to decide anything for the two relocations we need to handle once we
  have relaxed an IE or GD TLS access sequence to a LE one.

  The relocations we need are BFD_RELOC_MORELLO_ADR_HI20_PCREL and
  BFD_RELOC_AARCH64_ADD_LO12.  For both (and in fact for most
  relocations) we only use `h` and `sym` for catching and reporting
  errors.

  This means we don't actually have to update the `h` and/or `sym`
  variables before calling elfNN_aarch64_final_link_relocate.

Allocate TLS data stubs on dynobj
  This uses the same approach that the linker uses to ensure that the
  .got sections are emitted after all relocations have been processed.

  elf_link_input_bfd avoids sections with SEC_LINKER_CREATED, and
  bfd_elf_final_link emits all SEC_LINKER_CREATED sections on the dynobj
  *after* standard sections have been relocated.

  This means that we can populate the contents of the TLS data stub
  section while performing relocations on all our other sections (i.e.
  in the same place as we perform the relaxations on the TLS sequences
  that we recognise need these data stubs).

Assert that copy_indirect_symbol is not a problem
  copy_indirect_symbol takes information from one symbol and puts it
  onto another.  The point is to ensure that any symbol which simply
  refers to another has all its cached information on that symbol to
  which it refers rather than itself.  If we could ever call this
  function on a symbol which we have found needs an associated data stub
  created, then we could have to handle adjusting the hash table
  associating a symbol with a data stub.  We do not believe this is
  needed, and add an assert instead.

  The proof that this is not a problem is a little tricky.  However it
  *shouldn't* be a problem given what it's handling.  This is handling
  moving cached information from an indirected symbol to the symbol it
  represents.  That is needed when the information was originally put on
  the indirected symbol, and that happens when the indirection was
  originally the other way around.  The two ways that this reversal of
  indirection can happen is through resolving dynamic weak symbols and
  versioned symbols.  Both of these are not something we can see with
  SYMBOL_REFERENCES_LOCAL TLS symbols (see below).

  We only need to worry about copy_indirect_symbol transferring
  information *from* a symbol which we have generated a TLS relaxation
  against to LE.

  In order to satisfy the criteria that we have generated a TLS
  relaxation hash entry against a symbol, we must have already have run
  check_relocs.  This means that of the ways in which
  copy_indirect_symbol can be called we have eliminated all but
  _bfd_elf_fix_symbol_flags and bfd_elf_record_link_assignment.

  bfd_elf_record_link_assignment handles symbol assignments in a linker
  script.  Such assignments can not be made on TLS symbols (we end up
  generating a non-TLS symbol).

  _bfd_elf_fix_symbol_flags only calls copy_indirect_symbol on symbols
  which have is_weakalias set.  These are symbols "from a dynamic
  object", and we only ever call the hook when the real definition is in
  a non-shared object.  Hence we would not have performed this
  relaxation on the symbol (because it is not SYMBOL_REFERENCES_LOCAL).

  Hence I don't believe this is something that we can trigger and we add
  an assertion here rather than add code to handle the case.

Add new relocations to linker (excluding relaxations)

Some notes on the implementation decisions:

Use _bfd_aarch64_elf_resolve_relocation on :size: relocations
  This is unnecessary, since all that function does in the case of
  :size: relocations is to return the value it was given as an argument.
  For the analogous MOVW_G0 relocations this function adds the addend
  and emits a warning in the case of a weak undefined TLS symbol.

TPREL128/TLSDESC relocs now add size of symbol in fragment to satisfy
the ABI requirement.
  This only happens when we know the size of the relevant symbol, we
  also emit the location of the symbol in a TPREL128 fragment when that
  is known too.

See PR for documentation https://github.com/ARM-software/abi-aa/pull/80

Implementation note:
Handling the size of a symbol according to whether the static linker
knows what it is was very slightly tricky.  Using the macro
`SYMBOL_REFERENCES_LOCAL` to check whether we knew the size of a symbol
is a problem.  That macro treats PROTECTED visibility symbols as *not*
local.  This is in order to handle the case where a reference to a
protected function symbol could end up having the value of an
executable's PLT (in order to handle function equality and hard-coded
addresses in an executable).

Since TLS symbols can not be function symbols (n.b. this refers to the
TLS object and not the resolver), this requirement does not apply.  That
means we should check this property with something like
`SYMBOL_CALLS_LOCAL` (which is the existing macro to treat protected
symbols differently).

Given the confusing nomenclature here, we add a new AArch64 backend
macro called `TLS_SYMBOL_REFERENCES_LOCAL` so that we have a nice name
for it.

N.b. in this patch we adjust all uses of `SYMBOL_REFERENCES_LOCAL` which
are known to be acting on TLS symbols.  This includes some places where
it does not matter whether a symbol is protected or not because the
condition also requires that we're in an executable (like in deciding
whether a relocation can be relaxed).  This was done simply for
conformity and neatness.

Add new relocations to GAS

Also add the ability to disassemble these relocations correctly.

Include checking that many different sizes work with different
instructions, include error checking that the `size` relocation is not
allowed in a64 mode. Ensure that the size relocation is not allowed on
instructions other than mov[kz].

See the arm ABI aaelf64-morello document for the definition of these
new relocations.

Regenerate bfd/bfd-in2.h and bfd/libbfd.h from bfd/reloc.c.

Adjust TLS relaxation condition

In aarch64_tls_transition_without_check and elfNN_aarch64_tls_relax we
choose whether to perform a relaxation to an IE access model or an LE
access model based on whether the symbol itself is marked as local (i.e.
`h == NULL`).

This is problematic in two ways.  The first is that sometimes a global
dynamic access can be relaxed to an initial exec access when creating a
shared library, and if that happens on a local symbol then we currently
relax it to a local exec access instead.  This usually does not happen
since we only relax an access if aarch64_can_relax_tls returns true and
aarch64_can_relax_tls does not have the same problem.  However, it can
happen when we have seen both an IE and GD access on the same symbol.
This case is exercised in the newly added testcase tls-relax-gd-ie-2.

The second problem is that deciding based on whether the symbol is local
misses the case when the symbol is global but is still non-interposable
and known to be located in the executable.  This happens on all global
symbols in executables.
This case is exercised in the newly added testcase tls-relax-ie-le-4.

Here we adjust the condition we base our relaxation on so that we relax
to local-exec if we are creating an executable and the relevant symbol
we're accessing is stored inside that executable.

Alongside that general fix, we adjust the existing exclusion parameters
for Morello relaxations.  Patches are in-flight to replace the existing
Morello TLS relocation handling with the more recent TLS ABI.  This
patch simply adjusts the existing handling to use a more robust method
to determine the case when a GD -> LE relaxation can be performed.

-- Updating tests for new relaxation criteria

Many of the tests added to check our relaxation to IE were implemented
by taking advantage of the fact that we did not relax a global symbol
defined in an executable.

Since a global symbol defined in an executable is still not
interposable, we know that a TLS version of such a symbol will be in the
main TLS block.  This means that we can perform a stronger relaxation on
such symbols and relax their accesses to a local-exec access.

Hence we have to update all tests that relied on the older suboptimal
decision making.

The two cases when we still would want to relax a general dynamic access
to an initial exec one are:
1) When in a shared library and accessing a symbol which we have already
   seen accessed with an initial exec access sequence.
2) When in an executable and accessing a symbol defined in a shared
   library.

Both of these require shared library support, which means that these
tests are now only available on targets with that.

I have chosen to switch the existing testcases from a plain executable
to one dynamically linked to a shared object as that doesn't require
changing the testcases quite so much (just requires accessing a
different variable rather than requiring adding another code sequence).

The tls-relax-all testcase was an outlier to the above approach, since
it included a general dynamic access to both a local and global symbol
and inspected for the difference accordingly.

This is the same logical change as
https://sourceware.org/pipermail/binutils/2022-July/121660.html

Standardise check for static PDE

We have hit multiple problems checking for a static non-PIE binary using
incorrect conditions.  In looking into a TLS relaxation that should not
have happened we found another.  To help avoid this problem in the
future (and to make reading the code a lot easier for someone who isn't
familiar with the BFD linker flags) we now perform the check for a
static PDE with a macro called `static_pde`.

N.b. this macro can only be used after we've created any needed dynamic
sections.  That happens when loading symbols, which is very early on and
hence before any of the places we want to use this macro.  However it's
still good to note it's not always a valid check.

Improve Morello feature detection

Given HWCAP2_MORELLO changed for Linux Kernel 5.18, this breaks Morello GDB's
heuristic for detecting the Morello feature.

When possible, switch to detecting the Morello feature through the availability
of the NT_ARM_MORELLO register set (which means PTRACE_PEEKCAP and
PTRACE_POKECAP are also available). For corefiles, switch to using the presence
of the Morello register set section.

For extended-remote mode, check for the two possible values of HWCAP2_MORELLO.

[Morello GDB] Fix a couple hardware watchpoint issues around capabilities

This patch fixes a couple issues around hardware watchpoint triggers related to
capabilities.

1 - When a capability changes and the hardware watchpoint triggers, Morello
GDB should display the tag state for new/old values correctly.

2 - Take the capability tag into consideration when checking for content
changes to the watched area. The 128 bits may be the same, but the tag state
may differ.

[Morello GDB] Fix AUXV reading/parsing for corefiles and remote targets

The last fix to enable GDB to read pure-cap AUXV information (with 128-bit
entries) only handled native GDB.

The following patch enables the same logic for corefiles and remote targets.

[Morello GDB] Fix bug in conditional definition of morello structs

The previous commit fixing this issue (bf5ddcecc07c2d89e824851f5f940ebe7e2af0fd)
failed to spot an issue with include ordering.

Fix all such issues with this patch.

Neaten up a clause in final_link_relocate

Originally this clause included lines checking `!bfd_link_pic &&
bfd_link_executable`.  I left these in to ensure that the new Morello
part to the clause did not interfere with the original stock AArch64
part of the clause.

Now we have split the condition into multiple if statements for clarity,
we can remove the confusing parts of the clause.  This clause is to
catch any symbols that go in the GOT but would not be otherwise given a
relocation by finish_dynamic_symbol.  We can express that check better
with a modified condition.

What we want to do in this clause is to account for all GOT entries
which would not get a dynamic relocation otherwise, but need a RELATIVE
dynamic relocation for Morello.  This is any symbol for which
c64_should_not_relocate is false and WILL_CALL_FINISH_DYNAMIC_SYMBOL is
false.  Changing the clause to only mention these two predicates (plus
ensuring that we do not mess around with such relocations when creating
a relocatable object file rather than a final binary) explains the
purpose of this condition much better.

N.b. see the commit message of 8f5baae3d15 for the reasoning for the
original decision to not change the conditional.

Use global GOT type to determine GOT action

morello-binutils: Use global GOT type to determine GOT action

In final_link_relocate we currently use whether the relocation we're
looking at is a Morello relocation to decide whether we should treat the
GOT entry as a Morello GOT entry or not.

This is problematic since we can have an AArch64 relocation against a
capability GOT entry (even if it isn't a very useful thing to have).

The current patch decides whether we need to emit a MORELLO RELATIVE
relocation in the GOT based on whether the GOT as a whole contains
capabilities rather than based on whether the first relocation against
this GOT is a Morello relocation.

Until now we did not see any problem from this. Here we add a testcase
that triggers the problem.

Make emit-relocs-morello-6 work on different targets

This testcase has not been passing on linux targets for a while due to
the hard-coding of addresses. Here we use the record/check feature to
make the testcase more robust and explain what's getting checked a
little better.

Account for weak undefined symbols in Morello

Originally we believed we had accounted for these symbols within the
existing if conditional.  It turns out that with `-pie
--no-dynamic-linker` on the command line (which causes the `link_info`
member `dynamic_undefined_weak` to be set to 0) such symbols can bypass
elfNN_aarch64_allocate_dynrelocs putting these symbols into the dynamic
symbol table.  Hence we can have such symbols without a dynamic index
and our existing conditionals need to be adjusted.

On further inspection we notice that GOT entries for *hidden* undefined
weak symbols were still getting RELATIVE relocations.  This is quite
unnecessary since it's known that the entry should be the NULL
capability, but on top of that it relies on the runtime to have a
special case to not add the load displacement to RELATIVE relocations
with completely zero fragments.

We make two logical adjustments.
The first is that in our handling of CAPINIT relocations we add a
clause to avoid emitting a relocation for any undefined weak symbol
which we know for certain should end up with the NULL capability at
runtime.  In this clause we ensure that the fragment is completely zero.

The second is around handling GOT entries.  For these we ensure that
elfNN_aarch64_allocate_dynrelocs does not allocate a dynamic relocation
for the GOT entry of such symbols and that
elfNN_aarch64_final_link_relocate leaves the GOT entry empty and without
any relocation.

N.b. in implementing this change the conditionals became quite
confusing.  We split them up quite unnecessarily into different else/if
statements for clarity at the expense of verbosity.

We also add tests to check the behaviour of undefined weak symbols for
dynamically linked PDE's/PIE's/static executables/shared objects.

N.b.2 We also add an extra assert in final_link_relocate.  This function
deals with GOT entries for symbols both in the internal hash table and
not in the hash table.  Binutils decides whether symbols should be in
the hash table or not based on their binding.  WEAK binding symbols are
put in the hash table.  That said, final_link_relocate has a
`weak_undef_p` local flag to describe whether a given symbol is weak
undefined or not.  This flag is defined for both symbols in the hash
table and symbols not in the hash table.

I believe that the only time we have weak_undef_p set in
final_link_relocate when the relevant symbol is not in the hash table is
when we have "removed" a relocation from our work list by modifying it
to be a R_AARCH64_NONE relocation against the STN_UNDEF symbol (e.g.
during TLS relaxation).

Such cases would not fall into the GOT relocation clause.  Hence I don't
think we can ever see weak_undef_p symbols which are not in the hash
table in this clause.  It's worth an assertion to catch the possibility
that this is wrong.

Emit CAPINIT relocations for dynamically linked PDE's

Until now CAPINIT relocations were only emitted for position independent
code.  For a data relocation against a symbol in some other shared
object this was problematic since we don't know the address that said
symbol will be at.  We ended up emitting a broken RELATIVE relocation.

This also happened to be problematic for function pointers, since a
CAPINIT relocation did not ensure that a PLT entry was created in this
binary.  When a PLT entry was not created we again had a broken RELATIVE
relocation.

We could have fixed the problem with function pointers by ensuring that
a CAPINIT relocation caused a PLT entry to be emitted and the RELATIVE
relocation hence to point to that PLT entry.  Here we choose to always
emit a CAPINIT relocation and let the dynamic linker resolve that to a
local PLT entry if one exists, but if one does not exist let the dynamic
linker resolve it to the actual function in some other shared library.

Alongside this change we ensure that we leave 0 as the value in the
fragment for a CAPINIT relocation.  The dynamic linker already has to
decide which symbol to use, and it would have the value of the local
symbol available if it chooses to use it.  Hence there is no reason for
the static linker to leave the value of one option in the fragment of
this CAPINIT relocation.

This patch also introduces quite a few new testcases.
These are to check that we should only add a special PLT entry as the
canonical address for pointer equality when a function pointer is
accessed via code relocations -- and we ensure this does not happen for
accessing data pointers or accesses via CAPINIT data relocations.

Outside of the new testcases, we also adjust
emit-relocs-morello-3{,-a64c}.d.  These testcases checked for a CAPINIT
relocation in a shared object.  Now we no longer populate that fragment
we need to adjust the testcase accordingly.

Use htab->c64_rel more, do not use GOT_CAP

Each symbol that has a reference in the GOT has an associated got_type.
For capabilities we currently have a new got_type of GOT_CAP for
capability entries in the GOT.

We do not allow capability entries in the GOT for an A64 (or hybrid)
binary, and only allow capability entries in the GOT for a purecap
binary.  Hence there is no need to maintain a per-symbol indication of
whether the associated GOT entry for this symbol is a capability or not.

There is already an existing flag on the hash table to indicate whether
the GOT contains capabilities or addresses.  We can replace every use of
the existing GOT_CAP with a check of this flag.

Doing such a transformation means we can not express an invalid state
(there is no longer any way to express a GOT which contains some
addresses and some capabilities).  It also solves a bug where we
introduce a PLT to be the canonical address of a function after having
seen a R_AARCH64_LDST128_ABS_LO12_NC relocation.  The existing manner of
deciding whether an entry in the GOT should be a capability or address
based on the relocation we generated it from could not work in a binary
when we only have this relocation.  It should be determined based on the
flags of the input object files we saw (i.e. are these purecap object
files or not).

N.b. this also fixes an observed problem that could have been fixed in
the existing regime.  In this case the JUMP_SLOT of a PLT entry added to
be the canonical address of a function which was addressed directly in
code (with both a Morello and AArch64 relocation) had got_type of
GOT_UNKNOWN (because it was simply not marked) and hence
elfNN_aarch64_create_small_pltn_entry was generating an AArch64
relocation because the GOT entry was not GOT_CAP.

This patch also adjusts the "should this GOT contain capabilities" flag
to report yes/no based on the EF_AARCH64_CHERI_PURECAP flag of the
inputs rather than based on whether we've seen any morello relocations
pointing into the GOT.
NOTE:  We do not remove the existing times where we set this flag based
on MORELLO relocations.  This is left for a future patch when we look
into the handling of hybrid code and the GOT.

N.b. this required two changes in the testsuite.
morello-capinit.d required updating since the size of the GOT section
was previously incorrectly calculated.  There is no GOT relocation in
this testcase, which meant that the existing method of finding the size
of the dummy first GOT entry was incorrect (gave the size of an AArch64
entry).  Since the size of the GOT is now different the PCC bounds is
now different, and we hence need to update the values checking for the
PCC bounds in this testcase.

We take this opportunity to make the testcase more robust by using the
new record/check testsuite feature.  This means the testcase now passes
on other targets (i.e. both bare-metal and for none-linux).

emit-relocs-morello.d had a minor change for the same reason.  Since the
alignment requirement of the GOT changed this changed the start position
too.  When the start position changed objdump decided not to output an
extra line of 0000000.

Conditionally define user_morello_state and user_cap structs

If we don't have the Morello kernel headers, we need to define our own
structures to read capability registers and capabilities.

Make c-exp.y work with Bison 3.8+

When using Bison 3.8, we get this error:

../../gdb/c-exp.y:3455:1: error: 'void c_print_token(FILE*, int, YYSTYPE)' defined but not used [-Werror=unused-function]

That's because bison 3.8 removed YYPRINT support:
https://savannah.gnu.org/forum/forum.php?forum_id=10047

Accordingly, this patch only defines that function for Bison < 3.8.

Change-Id: I3cbf2f317630bb72810b00f2d9b2c4b99fa812ad

Teach gdbserver about 32-byte auxv entries for the PCuABI

In order for GDBServer to work correctly with the PCuABI, it needs to
understand that there are auxv entries of 16 bytes and 32 bytes.

Fix fetching of auxv for PCuABI

The entries of the auxv for the PCuABI are 128-bit in size. Teach GDB about
this so it can at least fetch some sane values. It currently can't fetch the
capabilities, but should be enough for now.

Morello do not create RELATIVE relocs for dynamic GOT entries

For dynamic symbol GOT entries, the linker emits relocations for that
entry in finish_dynamic_symbol.

Since Morello capabilities always need dynamic relocations to initialise
GOT entries at runtime, we need to emit relocations for any capability
GOT entries.  Two examples which are not needed for non-Morello linking
are static linking and for global symbols defined and referenced in a
PDE.

In order to ensure we emit those relocations we catch them in the
existing clause of final_link_relocate that looks for GOT entries that
require relocations which are not handled by finish_dynamic_symbol.
Before this patch, the clause under which those relocations were emitted
would include dynamic GOT entries in dynamically linked position
dependent executables.
These symbols hence had RELATIVE relocations emitted to initialise them
in the executables GOT by final_link_relocate, and GLOB_DAT relocations
emitted to initialise them by finish_dynamic_symbol.

The RELATIVE relocation is incorrect to use, since the static linker
does not know the value of this symbol at runtime (i.e. it does not know
the location in memory that the the shared library will be loaded).

This patch ensures that the clause in final_link_relocate does not
catch such dynamic GOT entries by ensuring that we only catch symbols
when we would not otherwise call finish_dynamic_symbol.

N.b. we also add an assertion in the condition-guarded block, partly to
catch similar problems earlier, but mainly to make it clear that
`relative_reloc` should not be set when finish_dynamic_symbol will be
called.

N.b.2 The bfd_link_pic check is a little awkward to understand.
Due to the definition of WILL_CALL_FINISH_DYNAMIC_SYMBOL, the only time
that `!bfd_link_pic (info) && !WILL_CALL_FINISH_DYNAMIC_SYMBOL` is false
and
`!WILL_CALL_FINISH_DYNAMIC_SYMBOL (is_dynamic, bfd_link_pic (info), h)`
is true is when the below holds:
  is_dynamic && !h->forced_local && h->dynindx == -1

This clause is looking for local GOT relocations that are not in the
dynamic symbol table, in a binary that will have dynamic sections.

This situation is the case that this clause was originally added to
handle (before the Morello specific code was added).  It is the case
when we need a RELATIVE relocation because we have a PIC object, but
finish_dynamic_symbol would not be called on the symbol.

Since all capability GOT entries need relocations to initialise them it
would seem unnecessary to include the bfd_link_pic check in our Morello
clause.  However the existing clause handling these relocations for
AArch64 specifically avoids adding a relocation for
bfd_link_hash_undefweak symbols.  By keeping the `!bfd_link_pic (info)`
clause in the Morello part of this condition we ensure such undefweak
symbols are still avoided.

I do not believe it is possible to trigger the above case that requires
this `bfd_link_pic` clause (where we have a GOT relocation against a
symbol satisfying):
  h->dynindx == -1 && !h->forced_local
  && h->root.type == bfd_link_hash_undefweak
  && bfd_link_pic (info) && bfd_link_executable (info)
I believe this is because when creating an undefweak symbol that has a
GOT reference we hit the clause in elfNN_aarch64_allocate_dynrelocs
which ensures that such symbols are put in `.dynsym` (and hence have a
`h->dynindx != -1`).  A useful commit to reference for understanding
this is ff07562f1e.

Hence there is no testcase for this part.  We do add some code that
exercises the relevant case (but does not exercise this particular
clause) into the morello-dynamic-link-rela-dyn testcase.

Predicate fixes around srelcaps and capability GOT relocations

This patch clears up some confusing checks around where to place
capability relocations initialising GOT entries.

Our handling of capability entries for the GOT had a common mistake in
the predicates that we used.  Statically linked executables need to have
all capability relocations contiguous in order to be able to mark their
start and end with __rela_dyn_{start,end} symbols.  These symbols are
used by the runtime to find dynamic capability relocations that must be
performed.  They are not needed when dynamically linking as then it is
the responsibility of the dynamic loader to perform these relocations.

We generally used `bfd_link_executable (info) && !bfd_link_pic (info)`
to check for statically linked executables.  This predicate includes
dynamically linked PDE's.  In most cases seen we do not want to include
dynamically linked PDE's.

This problem manifested in a few different ways.  When the srelcaps
section was non-empty we would generate the __rela_dyn_{start,end}
symbols -- which meant that these would be unnecessarily emitted for
dynamically linked PDE's.  In one case we erroneously increased the size
of this section on seeing non-capability relocations, and since no
relocations were actually added we would see a set of uninitialised
relocations.

Here we inspected all places in the code handling the srelcaps section
and identified 5 problems.  We add tests for those problems which can
be seen (some of the problems are only problems once others have been
fixed) and fix them all.

Below we describe what was happening for each of the problems in turn:

---
Avoid non-capability relocations during srelcaps sizing

elfNN_aarch64_allocate_dynrelocs increases the size for relocation
sections based on the number of dynamic symbol relocations.

When increasing the size of the section in which we store capability
relocations (recorded as srelcaps in the link hash table) our
conditional erroneously included non-capability relocations.  We were
hence allocating space in a section like .rela.dyn for relocations
populating the GOT with addresses of non-capability symbols in a
statically linked executable (for non-Morello compilation).

This change widens the original if clause so it should catch CAP
relocations that should go in srelgot, and tightens the fallback else if
clause in allocate_dynrelocs to only act on capability entries in the
GOT, since those are the only ones not already caught which still need
relocations to populate.

Implementation notes:
While not necessary, we also stop the fallback conditional checking
!bfd_link_pic and instead put an assertion that we only ever enter the
conditions block in the case of !bfd_link_pic && !dynamic.
This is done to emphasise that this condition is there to account for
all the capability GOT entries for the hash table which need relocations
and are not caught by the existing code.  The fact that this should only
happen when building static executables seems like an emergent property
rather than the thing we would want to check against.

This is tested with no-morello-syms-static.

---
size_dynamic_sections use srelcaps for statically linked executables
and srelgot for dynamically linked binaries.

When creating a statically linked executable the srelcaps section will
always be initialised and that is where we should put all capability
relocations.  When creating a dynamically linked executable the srelcaps
may or may not be initialised (depending on if we saw CAPINIT
relocations) and either way we should put GOT relocations into the
srelgot section.

Though there is no functional change to look for, this code path is
exercised with the morello-static-got test and
morello-dynamic-link-rela-dyn for statically linked and dynamically
linked PDE's respectively.

---
Capability GOT relocations go in .rela.got for dynamically linked PDEs

final_link_relocate generates GOT relocations for entries in the GOT
that are not handled by the generic ELF relocation code.  For Morello
we require relocations for any entry in the GOT that needs to be a
capability.

For *static* linking we keep track of a section specifically for
capability relocations.  This is done in order to be able to emit
__rela_dyn_{start,end} symbols at the start and end of an array of these
relocations (see commit 40bbb79e5a3 for when this was introduced and
commit 8d4edc5f8 for when we ensured that MORELLO_RELATIVE relocations
into the GOT were included in this section).

The clause in final_link_relocate that decides whether we should put
MORELLO_RELATIVE relocations for initialising capability GOT entries
into this special section currently includes dynamically linked PDE's.
This is unnecessary, since for dynamically linked binaries we do not
want to emit such __rela_dyn_{start,end} symbols.

While this behaviuor is in general harmless (especially since both
input sections srelcaps and srelgot have the same output section in the
default linker scripts), this commit changes it for clarity of the code.
We now only put these relocations initialising GOT entries into the
srelcaps section if we require it for some reason.  The only time we do
require this is when statically linking binaries and we need the
__rela_dyn_* symbols.  Otherwise we put these entries into the `srelgot`
section which exists for holding GOT entries together.

Since this diff is not about a functional change we do not include a
testcase.  However we do ensure that the testcase
morello-dynamic-link-rela-dyn is written so as to exercise the codepath
which has changed.

---
Only ensure that srelcaps is initialised when required

In commit 8d4edc5f8 we started to ensure that capability relocations for
initialising GOT entries were stored next to dynamic RELATIVE
relocations arising from CAPINIT static relocations.

This was done in order to ensure that all relocations creating a
capability were stored next to each other, allowing us to mark the range
of capability relocations with __rela_dyn_{start,end} symbols.

We only need to do this for statically linked executables, for
dynamically linked executables the __rela_dyn_{start,end} symbols are
unnecessary.

When doing this, and there were no CAPINIT relocations that initialised
the srelcaps section, we set that srelcaps section to the same section
as srelgot.  Despite what the comment above this clause claimed we
mistakenly did this action when dynamically linking a PDE (i.e. we did
not *just* do this for static non-PIE binaries).

With recent changes that ensure we do not put anything in this srelcaps
section when not statically linking this makes no difference, but
changing the clause to correctly check for static linking is a nice
cleanup to have.

Since there is no observable change expected this diff has no
testcase, but the code path is exercised with morello-dynamic-got.

---
Only emit __rela_dyn_* symbols for statically linked exes

The intention of the code to emit these symbols in size_dynamic_sections
was only to emit symbols for statically linked executables.  We recently
noticed that the condition that has been used for this also included
dynamically linked PDE's.

Here we adjust the condition so that we only emit these symbols for
statically linked executables.

This allows initailisation code in glibc to be written much simpler,
since it does not need to determine whether the relocations have been
handled by the dynamic loader or not -- if the __rela_dyn_* symbols
exist then this is definitely a statically linked executable and the
relocations have not been handled by the dynamic loader.

This is tested with morello-dynamic-link-rela-dyn.

Account for LSB on DT_INIT/DT_FINI entries

When DT_INIT and/or DT_FINI point to C64 functions they should have
their LSB set. I.e. these entries should contain the address of the
relevant functions and not a slight variation on them.

This is already done by Morello clang, and we want GNU ld updated to
match.

Here we account for these LSB's for Morello in the same way as the Arm
backend accounts for the Thumb LSB. This is done in the
finish_dynamic_sections hook by checking the two dynamic section
entries, looking up the relevant functions, and adding that LSB onto the
entry value.

In our testcase we simply check that the INIT and FINI section entries
have the same address as the _init and _fini symbols.

Handle locally-resolving entries in the GOT

In standard AArch64 linking by the BFD linker, dynamic symbols in PIC
code have their dynamic relocations created by
elfNN_aarch64_finish_dynamic_symbol.  Any required information in the
relevant fragment is added by elfNN_aarch64_final_link_relocate.

Non-dynamic symbols that are supposed to go in the GOT have their
RELATIVE relocations created in elfNN_aarch64_final_link_relocate next
to the place where the fragment is populated.

The code in elfNN_aarch64_finish_dynamic_symbol was not updated when we
ensured that RELATIVE relocations against function symbols were
generated with the PCC base stored in their fragment and an addend
defined to make up the difference so that the relocation pointed at the
relevant function.

On top of this, elfNN_aarch64_final_link_relocate was never written to
include the size and permission information in the GOT fragment for
RELATIVE relocations that will be generated by
elfNN_aarch64_finish_dynamic_symbol.

This patch resolves both issues by adding code to
elfNN_aarch64_final_link_relocate to handle setting up the fragment of a
RELATIVE relocation that elfNN_aarch64_finish_dynamic_symbol will
create, and adding code in elfNN_aarch64_finish_dynamic_symbol to use
the correct addend for the RELATIVE relocation that it generates.

Implementation choices:

The check in elfNN_aarch64_final_link_relocate for "cases where we would
generate a RELATIVE relocation through
elfNN_aarch64_finish_dynamic_symbol" is believed to handle undefined
weak symbols by checking SYMBOL_REFERENCES_LOCAL on the belief that the
latter would not return true if on undefined weak symbols.  This is not
as clearly correct as the rest of the condition, so seems reasonable to
bring to the attention of anyone interested.

We add an assertion that this is the case so we get alerted if it is
not, we could choose to include !UNDEFWEAK_NO_DYNAMIC_RELOC in the
condition instead, but believe that would lead to confusion in the code
(i.e. why check something that will always be false).

Similarly, when we check against SYMBOL_REFERENCES_LOCAL to decide
whether to populate the fragment for this relocation this does not
directly correspond to `h->dynindx == -1` (which would indicate that
this symbol is not in the dynamic symbol table).
This means that our clause catches symbols which would appear in the
dynamic symbol table as long as SYMBOL_REFERENCES_LOCAL returns true.
The only case in which we know this can happen is for PROTECTED
visibility data when GNU_PROPERTY_NO_COPY_ON_PROTECTED is set.
When this happens a RELATIVE relocation is generated (since this is
an object we know will resove to the current binary) and the static
linker provides the permissions and size of the associated object in the
relevant fragment.
This behaviour matches all other RELATIVE relocations and allows the
dynamic loader to assume that all RELATIVE relocations should have their
associated permissions and size provided.
We mention this behaviour since the symbol for this object will appear
in the dynamic symbol table and hence the dynamic loader *could*
determine the size and permissions itself.

In our condition to decide whether to update this relocation we include
a check that we `WILL_CALL_FINISH_DYNAMIC_SYMBOL`.  This is not
necessary, since the combination of conditions implies it, however it
makes things much clearer as to what we're checking for.

Testsuite notes:

When testing our change here we check:
  1) The addend and base of the RELATIVE relocation gives the required
     address of the hidden function.
  2) The bounds of the RELATIVE relocation is non-zero.
  3) The permissions of the RELATIVE relocation are executable.
Lacking in this particular test is a check that the PCC bounds are
calculated correctly, and that the base we define is the base of the
PCC.  We rely on existing tests to check our calculation of the PCC
bounds.

Allow WZR in alt-base loads and stores

The alt-base loads and stores allow WZR and XZR to be specified
as the register being loaded or stored. We were accepting the
XZR forms but not the WZR ones.

The easiest fix is to drop the separate Wt operand type. Most
other instructions handle the W/X distinction using the qualifiers
instead, and all instructions that used Wt already specified W
qualifiers.

Accept alternative-base LDRS[BHW] as an alias of LDURS[BHW]

Many load instructions have two forms: LDR<x> that takes either:

- a register index or
- an unsigned scaled immediate offset

and LDUR<x> that takes:

- a signed unscaled immediate offset in the range [-256, 255]

The assembler usually maps out-of-range LDR<x> offsets to LDUR<x>
where possible.  GAS does this using matching OP_* codes; see
try_to_encode_as_unscaled_ldst in gas/config/tc-aarch64.c.

Some alternative-base Morello instructions also come in these
LDR/LDUR pairs, so we can use the same approach for them.

However, the alternative-base forms of LDRS[BHW] only support a
register index.  They do not have a register+unsigned scaled form.
There is therefore no OP_* pair linking alternative-base LDRS[BHW]
and LDURS[BHW] instructions.

This patch therefore treats immediate LDRS[BHW] as a straight alias
of LDURS[BHW].  Following existing practice, LDURS[BHW] is still the
preferred disassembly, so the patch uses F_P1 to force LDURS[BHW] to
be chosen ahead of LDRS[BHW].

Following the general preference for using immediate forms where
possible:

   ldrsb x0, [c0]

is treated as:

   ldursb x0, [c0, #0]

rather than:

   ldrsb x0, [c0, xzr]

Co-Authored-By: Stam Markianos-Wright <stam.markianos-wright@arm.com>

Add a size to __ehdr_start

This symbol is defined in a binary when there is a segment which
contains both the file header and the program header.  The symbol points
at the file header.  The point of this symbol is to allow the program to
robustly examine its own output.

Glibc uses this symbol.  This symbol is currently not marked as a
linker or linker script defined symbol, and hence does not get its
bounds adjusted.  The symbol is given zero size, and consequently any
capability initialised as a relocation to this symbol is given zero
bounds.

In order to allow access to read the headers this symbol points at this
patch adds a size to the symbol.

We do not believe that the size of this symbol is used for anything
other than CHERI bounds, so we believe that this is a safe change to
make.  Setting the size of the symbol means that c64_fixup_frag uses
that size as the bounds to apply to a capability relocation pointing at
that symbol.  This allows access to the file and program headers loaded
into memory.

An alternative approach would be to *not* set the size of the symbol,
but only change the bounds of the relocation generated.  This would be
done by checking for the `__ehdr_start' name in c64_fixup_frag and
setting the size according to the `sizeof_ehdr' and
`elf_program_header_size' values stored on the output BFD object.

We chose the approach to set the size on the symbol for code-aesthetic
reasons under the belief that having this size on the symbol in the
final binary is a slight benefit in readability for a user and causes no
downside.

I do not believe that Morello lld sets the bounds of a capability to this
symbol correctly.  That issue has been raised separately.

Fix c64-ifunc-2 test

This was newly failing because it was checking for a value *without* the
LSB set. In a recent commit we have fixed the bug which lost the LSB,
and that caused this test to fail.

Here we use the new testsuite implementation to test for "one plus the
location" rather than "one of the values A, B, C, ...", which is a
better representation of what we're trying to check.

Treat `start_stop` symbols as having section size

There is special handling to ensure that symbols which look like they
are supposed to point at the start of a section are given a size to span
that entire section.

GNU ld has special `start_stop` symbols which are automatically provided
by the linker for sections where the output section and input section
share a name and that name is representable as a C identifier.
(see commit cbd0eecf2)

These special symbols represent the start and end address of the output
section.  These special symbols are used in much the same way in source
code as section-start symbols provided by the linker script.  Glibc uses
these for the __libc_atexit section containing pointers for functions to
run at exit.

This change accounts for these `start_stop` symbols by giving them the
size of the "remaining" range of the output section in the same way as
linker script defined symbols.  This means that the `start` symbols get
section-spanning bounds and the `stop` symbols get bounds of zero.

N.b. We will have to also account for these symbols in the
`resize_sections` function, but that's not done yet.

Record and check initial implementation

Really don't like that we use hard-coded addresses.  There are examples
in the existing testsuite that use options of hard-coded addresses, but
I want something more general that we can actually test the things we
need to test with.

Here we add an initial implementation to do such a thing.

This initial implementation has quite a lot of problems, but it adds
a lot in the fact that we can write testcases which should work across
different setups.
Hopefully we can work out the problems with use (or maybe identify that
the problems don't actually matter very much in practice) and eventually
upstream something better.

To document the problems:
  - The implementation means that recording something actually puts that
    into the regexp_diff namespace which could shadow existing variables.
  - We don't have a way to say "the previous value", but always have to
    write some TCL procedure to return that previous value.

Account for LSB in more relocations

The LSB on STT_FUNC symbols was missed in a few different places.

1) Absolute relocations coming from .xword, .word, and .hword
   directives and the lowest bit MOVW relocations did not account for
   the LSB at all.
2) Relocations for the ADR instruction only added the LSB on local
   symbols.

Here we account for these by adding the LSB in each clause in
elfNN_aarch64_final_link_relocate.
The change under the BFD_RELOC_AARCH64_NN clause handles absolute 64 bit
relocations, the change for BFD_RELOC_AARCH64_ADR_LO21_PCREL handles the
relocation on ADR instructions, and the extra relocations checked
against in the clause including BFD_RELOC_AARCH64_ADD_LO12 ore the
remaining items.

N.b. we noticed the MOVW relocation problem because glibc's start.S was
using these direct MOV relocations to access the value of `main`.  Since
`main` is a function we need to include the LSB in the resulting
relocation value.  These relocations did not include the LSB from
STT_FUNC symbols.

Others were found from inspection of each relocation in turn.

Assign correct size on Morello TLS relocations

The previous code was not actually using the size of a symbol when the
symbol was in the hash table.  This meant that our TLS relaxations
created an instruction sequence with bounds of zero so that the GCC TLS
instruction sequence eventually ended up giving a length-zero
capability.

Also handle extra size of pointers in TCB for c64.  For purecap we have
16 byte pointers.  Hence the TCB is 32 bytes.  This was not yet handled
in our relaxations.

Here we determine whether to use a 32 or 16 byte TCB based on the flags
of the current BFD (i.e. whether this is a purecap binary that we're
creating).

Testcases are updated to account for the fact that the length
of the capability to the symbol itself is now sometimes non-zero and for
the different offset required into the TLS block for modules loaded at
startup time.

Account for LSB on c64 e_entry in the same way as Thumb

The handling is done by putting the value that we want in a buffer and
using that as the entry_symbol.name which lang_end picks up.

Another option would be to find the entry symbol *after* lang_end has
finished (e.g. in elfNN_aarch64_init_file_header) and add the LSB to it
if that symbol is a C64 symbol.

This approach was mainly chosen in order to match more closely what
Thumb has done.

N.b. we set the LSB based on the LSB of the entry point symbol.
If the entry point symbol is in c64 code but is not an STT_FUNC (e.g.
it is an STT_NOTYPE) then the LSB will not be set.
This matches Morello clang behaviour.

Adjust which sections we resize for precise bounds

Before this change we would ensure the ability for precise bounds on any
section which had a linker defined symbol pointing in it *unless* that
linker defined symbol looked like a section starting symbol.

In that case we would adjust the *next* section if the current section
had no padding between this one and the next.

I believe this was a mistake.  The testcase we add here is a simple case
of having a `__data_relro_start` symbol in the .data.rel.ro section, and
that would not ensure that the .data.rel.ro section were precisely
padded.  The change we make here is to perform padding for precise
bounds on all sections with linker defined symbols in them *and* the
next section if there is no padding between this and the next section.

This is a huge overfit and we do it for the reason described in the
existing comment (that we have no information on the offset of this
symbol within the output section).

In the future we may want to remove this padding for linker script
defined symbols which do not look like section start symbols.  We would
do this in conjunction with changing the bounds we put on such linker
script defined symbols.  This would be for another patch.

resize_sections Add testsuite changes

Unfortunately writing a test for many of these changes was difficult
since we needed to check that thing A pointed to place B or that
thing A had a length that spanned places B and C.
Hence quite a few of the added testcases check for literal values.

These are problematic for a few different reasons, both that they may
not work with different configurations and because unrelated updates
need to update the tests.

We need to figure out a better way to test this.  We believe the
testsuite doesn't have facilities for checking this. For now we're
leaving making the tests robust for later.

Here we add all the tests which the recent patch series changes and
include updates to the existing tests.

Updated existing tests:

emit-relocs-morello-2.d
  needed updating since the new alignment meant the relocation fragment
  pointed to a different place.
emit-relocs-morello-7.d
  Needed updating because we added some padding, and re-laid the
  sections.  This just ended up with a layout with the .text section at
  a different location and with a different range.
emit-relocs-morello-8.d
  Just needed updating because the extra padding moved the .data.rel.ro
  section a bit.  The position of the relocation was not part of the
  test so we replaced that with an "any number" regex.
emit-morello-reloc-markers-{2,3}.d
  Needed updating because the PCC bounds required extra alignment and
  that was put on the first section in the PCC bounds.  That happened to
  be the section we were looking at.
morello-capinit.d
  Needed updating because this change actually fixed a problem with our
  calculation of the pcc bounds.  Now we calculate the pcc_low and
  pcc_high *after* having made all our adjustments, the bounds we add on
  the capability start at the first section we want to contain.
morello-stubs.d
  Updated just becase the location of our functions was changed due to
  the text section change.
morello-stubs-static.d
  Updated because the location of the .text_low section changed along
  with all the code in it.  Hence our stubs that needed to point to
  these functions also changed.
c64-ifunc-{2,2-local,3a}.d
  Updated because the .text section moved and hence the address of the
  foo function we wanted to point to was adjusted.
morello-sizeless-{got,global,local}-syms.d
  Again, .data .bss and .got sections ended up moved so addresses of the
  capabilities we wanted to point at all changed.

Rework the resize_sections function

Now instead of iterating through relocations recording changes to make
and sorting those changes according to the section VMA before going on
to make the changes in section VMA order, we instead modify the sections
as we iterate through relocations.

This change can be done now that the alignment is always done, even if
the VMA of the start and end was good for precise bounds and now that
padding is added via an expression rather than setting the point to a
specific location. Having these two things means that changing the
layout of earlier sections does not affect the precise bounds property
of later sections.

It also makes things easier that we keep the padding of a section inside
that section, so can tell whether a section has the correct size just
from the size of the section rather recording in an out-of-bounds manner
which sections have had their padding assigned.

This patch makes no functional change, but simply changes the code to be
more readable.

Pad and align sections in more cases

Before this change individual sections would not be padded or aligned if
there was no C64 code or if there were no capability GOT relocations in
the binary.  This meant that if we had a data-only PURECAP shared
library with a CAPINIT relocation in it pointing at something, then the
section that relocation pointed into would not be padded accordingly.

This patch changes this so that we look for sections which may need
individual padding if we see the PURECAP elf header flag, and if there
is a `srelcaps` section (i.e. the RELATIVE capability relocations).

We keep the behaviour that we do not adjust the size of sections unless
there is a static relocation pointing at a zero-sized symbol at that
section.  That is, we do not make any adjustment to try and handle
section padding in the case where other binaries would dynamically link
against such symbols.  We do this since the "symbol pointing to section
start implies spanning entire section" decision is a hack to enable some
linker script uses, and we don't want to extend it without a known
motivating example.

Finally, this patch ignores padding PCC bounds on PURECAP binaries if
there is no C64 code in this binary, but ensures that the PCC bounds are
made precise even if there are no static relocations in the file.
We could still get the current PCC and offset it using `adr` if there
are no static relocations, but without there being any C64 code there
will be no PCC to bound and hence we don't need the bounds on a
hypothetical PCC to be precise.

Always ensure that the PCC bounds are precise for Morello

The mechanism by which we were ensuring the PCC bounds were made precise
for Morello happened to only work if there were some sections which
needed to be made precisely representable for Morello individually.
This was because we included the PCC bounds calculation in the `queue`
iteration that only iterated over adjustments to individual sections.

Here we move the PCC bounds calculation to after the `queue` iteration
so that we always perform this operation.

We suspect the original implementation was chosen to ensure that padding
was added in sequential section ordering.  This ordering seems to have
been in order to ensure that padding at one position would not adjust
sections that had already been adjusted (because padding one section
changes the location of the sections after it).

We have already found in previous patches that this approach was not
sufficient to ensure an adjustment being permanent.  The alignment
change to the first section that the PCC should span can change the
location of all sections after it, or the linker can simply have extra
space before .text that it removes on a call to layout_sections_again.

In the patches to fix those problems, we have adjusted the code here to
represent the padding in a way that stays stable across changes.  That
has meant that the iteration in VMA order is no longer necessary, and
that means that our movement of the PCC bounds calculation to outside of
the `queue` iteration loop can be performed.

One interesting part of this adjustment is that given a set of sections,
the length of memory that they span can change if the first sections
alignment is adjusted.
For example, if we have the below:
  sectionA   VMA 0xf   size 0x1   alignment 0x1
  sectionB   VMA 0x10  size 0x10  alignment 0x10
Then aligning sectionA to 0x10 gives the below:
  sectionA   VMA 0x10  size 0x1   alignment 0x10
  sectionB   VMA 0x20  size 0x10  alignment 0x10
The total range of the first case is [0xf -> 0x20] for a size of 0x11
and of the second case it is [0x10 -> 0x30] for a size of 0x20.

This means that we should handle the alignment adjustment for the PCC
bounds first, and must handle it in a loop to ensure that we handle the
case that this change in length requires an extra alignment.
Only then do we know the size that we want to add into the last section
in the range so that the entire bounds are correct.

elfNN_c64_resize_section always sets alignment

Before this patch we would only change the alignment of a section if it
did not have a start and end address that was aligned properly.

This meant that there was nothing stopping the alignment of this section
degrading in the future.  On first glance this looks like it would not
be a problem since this function only adjusts sections in order of
increasing VMA (hence it would seem that the alignment of the current
section can not be reduced).

However, in some cases layout_sections_again can be seen to reduce the
alignment of sections if there was some initial space before the .text
section that it shrinks for some reason.  This led to a degredation of
the alignment of all sections after that point (until another highly
aligned section).

The testcase added for this change (in the final "testsuite" commit of
this patch series) is a good example of this, on first entry to the
elfNN_c64_resize_sections function .text happened to have a start
address of 0xb0 (which meant that .data.rel.ro was also aligned to such
a boundary and the function did not believe there was a need to align
.data.rel.ro to a 16 byte boundary).  However after the first call to
layout_sections_again this changed to 0x78, reducing the alignment of
.data.rel.ro in the process.

Add padding with an expression rather than a hard-address

When adding padding to ensure section bounds do not overlap we were
implementing the padding using `lang_add_newdot`.  This interacts with
what is essentially an in-memory linker script that the linker will
use at the very end to emit its sections according to those rules that
have been built up.

`lang_add_newdot` is essentially the same as defining the position to be
the given address in a linker script.  This means that all sections
after this point in the linker script will be at an address starting
from this known address.

I.e. the method by which we add padding is essentially changing the
description of how we will lay a binary out from:

<sections before the padded one>
<section to be padded>
<sections after the padded one>

to the description:
<sections before the padded one>
<section to be padded>
<current position must be 0x[number calculated now]>
<sections after the padded one>

This works fine in most cases.  The address we calculate is a known-good
value and sections after this "point" are moved to after the known-good
value.

However, the fact that we choose a specific value when we call
`c64_pad_section` means that adjusting sections which occur *before* the
current point will not change anything that occurs after it.

I.e. a description of

<sections before the padded one>
<section to be padded>
<current position must be 0x[number calculated now]>
<sections after the padded one>

being changed to a description of

<sections before section X>
<New padding>
<sections before padded one>
<section to be padded>
<current position must be 0x[number calculated now]>
<sections after the padded one>

leaves the `<sections after the padded one>` with the same address.
This can lead to the padded section and the section after it
overlapping.

This rarely happens, because our padding always happens after a section
and we iterate over sections in memory order.  However, when we align
the very start of the PCC range in order to produce precise bounds
across this range that can change the start position of the first
section that should be spanned by the PCC range.

Since it can change the start position we can hit the problem described
above.  This happens when attempting to build glibc.  It causes an error
message like the one below.
  section .got LMA [000000000053c0c0,000000000053cfff] overlaps section .data.rel.ro LMA [0000000000525fe0,000000000053d08f]

This patch solves this problem by adding an entry into this in-memory
linker script that describes padding without specifying a given address.
I.e. the outline of the script we produce becomes

<sections before the padded one>
<section to be padded>
<current position goes from P to P+0x[padding calculated now]>
<sections after the padded one>

This is safe w.r.t. adjustments occuring before the padding we have
inserted, and it avoids the warning we noticed when trying to build
glibc.

We also fix up some other bugs in this area around double-padding
sections.

First, the calculation of the padding required was based on the
output section VMA and size.  The calculation was done by taking the
current start and end VMA then finding the resulting start and end VMA
that we want using c64_valid_cap_range.  Then we calculated the padding
we wanted by finding the difference between the current and requested
end VMA's.  This ignored the fact that the output section was also
getting aligned, which would change the start VMA -- hence the resulting
end VMA would not end up where we wanted.
Here we do the calculation of how much padding to add based on the size
we want rather than based on the ending VMA we want.

Second, the reported size of the output section was not changing after
adding our padding.  This meant that the second time around this loop
(if for example a relocation into a given section was used in more than
one place and hence this section was enqueued twice) we would again find
that the section size was not padded and try again.  We fix this by
introducing the padding statement to the output section statement
children rather than to the main statement list.  This means that the
padding will be accounted for in the output section size and hence the
loop will avoid padding this section again.

Just to note: LLD does not report the sizes of sections including their
padding.  This is so that programs which read binary information (such
as readelf and objdump) do not need to read the padded zeros in the
file.  We choose to include this padding in the section size information
on the premise that it is usually quite small and that the output from
these programs is then more readable.  The bug that we fixed by
including this padding in the size of the output section could be fixed
in another way.

PCC bounds now span READONLY and RELRO sections

Before this they would span sections which are SEC_CODE or some specific
known sections like the GOT and PLT.

This is not enough, since the compiler can want to access .rodata via
relative offsets to PCC.  Hence we need to include READONLY sections.

Similarly, we want to include .data.rel.ro sections in the PCC bounds so
that they can be accessed via PCC -- this allows the capability
indirection table to be accessed.

We have not been noticing this until now because the default linker
script happens to order sections such that the PCC being required to
span .got and .text happens to end up including these problematic
sections.

RELRO sections are a bit interesting since the fact they are RELRO is
not recorded anywhere on the section itself.  Rather it is stored in the
fact that the section is covered by the RELRO segment.

This means that we need to check if the sections VMA is within the
relevant range rather than just look at the section.  This turns out to
be pretty easy since we have a structure containing the RELRO range,
however we do need to ensure that we don't mix up the uses of the
section VMA and the RELRO start and end around calls of
layout_sections_again since this call can change both.

Return the alignment required from c64_valid_cap_range

We were specifying section alignment requirements based on the alignment
that the section base happened to have. This sometimes resulted in very
strange alignment requests that were much greater than actually
required.
That is not usually a problem, but it does give unnecessary padding upon
re-adjustments due to changing the PCC bounds after individual sections
have been padded.

This patch adds an interface such that we return the alignment actually
required for exact capability bounds from c64_valid_cap_range. We then
use that alignment as our alignment requirement on the sections which
have a section-sized symbol associated with them.

Provide default permissions if section has no permission flags

The permissions that a capability to an object should end up with is
based on the section it should point into.  With symbols that point into
SHN_ABS sections we have nothing to base the permissions on (since these
sections don't have associated permission flags).

For the moment we are making a default of choosing Read-Write
permissions and warning the user about it.  The permissions match what
Morello LLD currently does (from observation).
When Morello linkers use the symbol type to determine whether a
capability should have executable permissions or not, this should end up
being able to handle all uses (since STT_FUNC would get RX perms while
everything else gets RW perms).

In the only case we know of in the GNU team the symbol ends up with
zero-size anyway, so the choice of Read-Write doesn't seem too lax.
(Having zero-size is fine for the use-case we know of in glibc, since
that use case simply checks if the address of the symbol is non-zero.
Hence we have no need as yet to dereference the symbol).

The use case we know about are the `_nl_current_<LANG>_used` symbols
defined with `_NL_CURRENT_DEFINE` in the locale/lc-<lang>.c files in
statically linked glibc.  If any case that requires non-zero size or
different permissions becomes important then something more will be
required across the toolchain.

Error linking binaries with differing e_flags.

This commit is partly changing two existing (believed buggy) behaviours
in elfNN_aarch64_merge_private_bfd_data and partly accounting for a
capability-specific requirement.

The existing behaviours in elfNN_aarch64_merge_private_bfd_data were:
1) It returned `TRUE` by default.  This effectively ignored the ELF
   flags on the binaries, despite there being code looking at them.
2) We do not mark the output BFD as initialised until we see flags with
   non-default architecture and flags.  This can't tell the difference
   between linking default objects to non-default objects if the default
   objects are given first on the command line.

The capability-specific requirement is:
- This function originally returned early if the object file getting
  merged into the existing output object file is not dynamic and has no
  code sections.  The code reasoned that differing ELF flags did not
  matter in this case since there was no code that would be expecting
  it.
  For capabilities the binary compatibility is still important.
  Data sections now contain capabilities as pointers, got sections now
  have a different got element size.
  Hence we avoid this short-circuit if any of the flags we're checking
  are the CHERI_PURECAP flag.

Only warn on badly sized symbols

The reasoning behind only warning for symbols which have a size which
cannot be precisely bounded is that there is nothing *requiring* precise
bounds, GCC knowingly avoids changing the size of some symbols for
precise bounds (TLS and symbols with user-specified alignment and
user-specified section), and LLD only warns on imprecise bounds rather
than erroring.
N.b. the reasoning for GCC avoiding padding in these cases is explained
in the commit message of b302420cb55 in the GCC branch
vendors/ARM/heads/morello.

All in all it's not something that we want in our toolchain as a
requirement, and it's not something that other toolchains have as a
requirement, so there doesn't seem to be much of a reason to include it.

In order to make this warning a little nicer for anyone reading it, we
add the name of the symbol to the warning. Update the testsuite to
account for this.

Co-Author: Alex Coplan <alex.coplan@arm.com>

Fixing cap_meta

It had two problems:

1) The linker was storing permission flags in the bottom byte and the
   size in the top 56 bits.  Newlib was looking for the permission flags
   in the top byte and the length in the bottom 56 bits of a uint64_t
   stored as bytes 8:16 of the fragment.
   N.b. The ABI requires a given storage order between the size and
   permission flags (as opposed to requiring a given uint64_t value be
   stored in the relevant position).  This means that our current
   implementation would not work for a hypothetical big-endian Morello.
2) The linker prioritised SEC_READONLY flags over SEC_CODE ones on the
   section, this meant that function symbols into the .text section
   (which has both flags on it) would be given read-only permissions
   rather than executable permissions.

This patch also must update all tests to account for this change.

ld: Adjust bounds, base, and size for various symbols

This patch has two main goals:

- Relax an existing diagnostic to permit the linker to accept
   capability relocations against symbols without size information.
- Adjust the capability base and bounds for symbols which point into
   sections which may be accessed via the PCC.

The Morello ABI accesses global data using ADR and ADRP, and has no
special indirection to jump to other functions.  Given this, the PCC
must maintain its bounds and base so that during execution loading
global data and jumping to other functions can be done without worrying
about the current PCC permissions and bounds.

To implement this, all capabilities that could be loaded into the PCC
(via BLR or similar) must have a bounds and base according to the PCC.
This must span all global data and text sections (i.e. .got, .text,
.got.plt and the like).
There is already code finding the range that the PCC should span, this
patch records the information in a variable that we can query later.

There are two places where we create a relocation requesting a
capability to be initialised at runtime.  When handling relocations
which request a capability from the GOT, and when handling a CAPINIT
relocation.  This patch adjusts both.

We can't tell from inspection which symbols would be loaded into the
PCC, but we know that those symbols must point into a section which is
executable.  For now, we do this operation for all symbols which point
into an executable section.

Most RELATIVE relocations don't use the addend.  Rather the VA and size
we want are put in the relative fragment and the addend is zero.
This is because the *base* of the capability usually matches the VA we
want that capability initialised to.
In these possibly-code symbols we want the base of the capability bounds
to be the base of the PCC, and the VA to be something very different.
Hence we make use of the addend in the RELA relocations to encode this
offset.

Note on implementation:

c64_fixup_frag takes the base and size of a capability we want to
request from the runtime and checks that these are exactly representable
in a capability.  This patch changes many of the capabilities we request
from the runtime to have the same bounds (those of the PCC).  We leave
the check to look at the bounds requested by the symbol rather than to
check the PCC bounds multiple times.  That means that if a symbol that
points into an executable section has incorrect bounds then this will
trigger a linker error even though it will cause no security problem
when this executes.  This is a trade-off between getting extra checks
that the compiler is handling object bounds sizes and erroring on
non-problematic code.

We have a compatibility hack that if a symbol is defined in the linker
script to be directly after a given section but is *named* something
like __.*_start or __start_.* then we treat it as if it is defined at
the very start of the next section.  The new behaviour introduced in
this patch needs to take account of the above compatibility hack.

This patch also updates the testsuite according to these changes.
In some places the original test no longer checks what it wanted, since
the base of all symbols pointing into executable sections are now the
same.  There we add extra symbols and things to check so we ensure that
this behaviour of PCC bounds is seen and that the original behaviour is
still seen on non-executable sections.

This commit also includes a few tidy-ups:

We adjust the base and limit that are checked in c64_fixup_frag.
Originally this would calculate the base as value + addend.  As
discussed above the way we treat capabilities in Morello is such that
the value determines the base and the addend determines the initial
value pointing from that base.  Hence the check that these capabilities
had correct bounds was not correct.

We add an extra assertion in final_link_relocate for robustness
purposes.  There is an existing bug in the assembler where GOT
relocations against local symbols can be turned into relocations against
the relevant section symbol plus an addend.  This is problematic for
multiple reasons, one being that the linker implementation does not have
any way to associate different GOT entries with the same symbol but
multiple offsets.  In fact the linker ignores any offset.  Here we
simply add an assertion that this never happens.  It turns a silent
pre-existing error into a noisy one.

2022-02-03  Alex Coplan  <alex.coplan@arm.com>
    Matthew Malcomson  <matthew.malcomson@arm.com>

bfd/ChangeLog:

* elfnn-aarch64.c (pcc_low): New.
(pcc_high): New.
(elfNN_c64_resize_sections): Update new global variables
pcc_{low,high} instead of local variables to track PCC span.
(enum c64_section_perm_type): New.
(c64_symbol_section_adjustment): New.
(c64_fixup_frag): Rework to calculate size appropriately for
symbols that need adjustment.
(c64_symbol_adjust): New. Use it ...
(elfNN_aarch64_final_link_relocate): ... here.

ld/ChangeLog:

* testsuite/ld-aarch64/aarch64-elf.exp: Add new tests.
* testsuite/ld-aarch64/emit-relocs-morello-6.d: New test.
* testsuite/ld-aarch64/emit-relocs-morello-6.s: Assembly.
* testsuite/ld-aarch64/emit-relocs-morello-6b.d: New test.
* testsuite/ld-aarch64/emit-relocs-morello-7.d: New test.
* testsuite/ld-aarch64/emit-relocs-morello-7.ld: Linker script thereof.
* testsuite/ld-aarch64/emit-relocs-morello-7.s: Assembly.
* testsuite/ld-aarch64/morello-capinit.d: New test.
* testsuite/ld-aarch64/morello-capinit.ld: Linker script.
* testsuite/ld-aarch64/morello-capinit.s: Assembly.
* testsuite/ld-aarch64/morello-sizeless-global-syms.d: New test.
* testsuite/ld-aarch64/morello-sizeless-global-syms.s: Assembly.
* testsuite/ld-aarch64/morello-sizeless-got-syms.d: New test.
* testsuite/ld-aarch64/morello-sizeless-got-syms.s: Assembly.
* testsuite/ld-aarch64/morello-sizeless-local-syms.d: New test.
* testsuite/ld-aarch64/morello-sizeless-local-syms.s: Assembly.

Co-authored-by: Matthew Malcomson <matthew.malcomson@arm.com>

Bugfixes in MORELLO GOT relocations

Trying to link code against newlib with the current BFD Morello linker
we get quite a lot of cases of the error below.
"relocation truncated to fit: R_MORELLO_LD128_GOT_LO12_NC against symbol
`<whatever>' defined in .text.<whatever> section in <filename>"

This happens because the relocation gets transformed into a relocation
pointing into the GOT in elfNN_aarch64_final_link_relocate, but the
h->target_internal flag that indicates whether this is a C64 function
symbol or not is then added to the *end* value rather than the value
that is stored in the GOT.

This then correctly falls foul of a check in _bfd_aarch64_elf_put_addend
that ensures the value we get from this relocation is 8-byte aligned
since it must be pointing to the start of a valid entry in the GOT.

Here we ensure that this LSB is set on the value newly added into the
GOT rather than on the offset pointing into the GOT.  This both means
that loading function symbols from the GOT will have the LSB correctly
set (hence we stay in C64 mode when branching to this function as we
should) and it means that the error about a misaligned GOT address is
fixed.

In this patch we also ensure that we add a dynamic relocation to
initialise the correct GOT entry when we are resolving a MORELLO
relocation that requires an entry in the GOT.
This was already handled in the case of a global symbol, but had not
been handled in the case of a local symbol.  This is why we set
`relative_reloc` to TRUE in if resolving a MORELLO GOT relocation
against a static executable.

In writing the testcase for this patch we found an existing bug to do
with static relocations of this kind (of this kind meaning that are
handled in this case statement).  The assembler often chooses to create
the relocation against the section symbol rather than the original
symbol, and make up for that by giving the relocation an addend.  The
linker does not have any mechanism to create "symbol plus addend"
entries in the GOT -- it indexes into the GOT based on the symbol only.
Hence all relocations which are a section symbol plus addend end up
pointing at one value in the GOT just containing the value of the
symbol.
We do not fix this existing bug, but just note it given that this is in the
same area.

Switch __cap_dynrelocs* to __rela_dyn* symbols

The name has been changed in LLVM, so we adjust it in binutils to match.

We also move where these symbols are created.  Previously they were
created in elfNN_aarch64_always_size_sections, but we move this to
elfNN_aarch64_size_dynamic_sections.

We do the moving since these symbols are supposed to span all dynamic
capability relocations stored in the .rela.dyn section for static
executables.  In the case of a static binary we place relocations for
the GOT into this section as well as internal relocations.

These relocations for the GOT are handled in
elfNN_aarch64_size_dynamic_sections, which is called *after*
elfNN_aarch64_always_size_sections.  The size of this section is only
fully known after those GOT relocations are managed, so the position
these symbols should be placed in is only known at that point.  Hence we
only initialise the __rela_dyn* symbols at that point.

2021-10-06  Matthew Malcomson  <matthew.malcomson@arm.com>
ChangeLog:

* bfd/elfnn-aarch64.c (elfNN_aarch64_always_size_sections): Move
initialisation of __rela_dyn* symbols ...
(elfNN_aarch64_size_dynamic_sections): ... to here.
* ld/testsuite/ld-aarch64/aarch64-elf.exp: Run new tests.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-1.d: New test.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-1.s: New test.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-2.d: New test.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-2.s: New test.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-3.d: New test.
* ld/testsuite/ld-aarch64/emit-morello-reloc-markers-3.s: New test.

ld: Ignore TLS relocs against weak undef symbols

The behaviour of weak undef thread-local variables is not well defined.
TLS relocations against weak undef symbols are not handled properly by
the linker, and in some cases cause the linker to crash (notably when
linking glibc for purecap Morello). This patch simply ignores these and
emits a warning to that effect. This is a compromise to enable progress
for Morello.

bfd/ChangeLog:

2022-01-17 Alex Coplan <alex.coplan@arm.com>

* elfnn-aarch64.c (elfNN_aarch64_relocate_section): Skip over TLS
relocations against weak undef symbols.
(elfNN_aarch64_check_relocs): Likewise, but also warn.

ld/ChangeLog:

2022-01-17 Alex Coplan <alex.coplan@arm.com>

* testsuite/ld-aarch64/aarch64-elf.exp: Add morello-weak-tls test.
* testsuite/ld-aarch64/morello-weak-tls.d: New test.
* testsuite/ld-aarch64/morello-weak-tls.s: New test.
* testsuite/ld-aarch64/weak-tls.d: Update test wrt new behaviour.

Display tags for internal variable values

Handle internal variable values when displaying capability types.

Simplify tag management in value structs

Instead of actively setting the tagged field of struct value *, initialize
the tagged field right when allocating a new value. This simplifies the
management of this field.

Preserve tag when passing pointers/capabilities as parameters

Fix a bug where Morello GDB wasn't setting the capability tag when calling a
function by hand, and said function received a pointer/capability as
parameter.

This was observed when attempting to call strlen by hand and passing the
argv[] entries.

Code cleanup and refactoring

A small refactoring to reduce duplication of code.

Support assignment of capabilities to C registers

Enable assignment of capabilities to C registers while preserving
the capability tag. This enables operations like the following:

set $c0=$c1
set $c0=p, where "p" is a capability (pointer) in AARCH64-CAP

Due to the X/C register overlap, we also force a re-read of register
data after every register write. So any 'G' packets are immediately
followed by a 'g' packet.

Improve error messages for capability reads/writes to memory

Improve the messages a little so the errors are more clear. One case in
particular is when we attempt to write a tagged capability to a shared
mapping area that doesn't support tags.

GDB and GDBserver share the same error messages now.

Improve GDBserver ptrace error message

Issue: https://git.morello-project.org/morello/binutils-gdb/-/issues/9

Improve error message when setting capability registers fails. Mention
the name of the set (capability) explicitly so it is clear what register
set we are talking about.

Adjust PCC bounds when calling a function by hand in AAPCS64-CAP

For dynamically-linked binaries using the AAPCS64-CAP ABI, the PCC bounds for
distinct DSO's can be different, and that needs to be taken into account.

Since there isn't a good interface for GDB to fetch the precise bounds (the
.plt.got could be used for this, but doesn't contain data for symbols not
being used by the program), we use maximum bounds and the existing PCC
permissions.

This allows GDB to call functions by hand in AAPCS64-CAP mode.

Also, revert a previous change to do partial writes (lower 64 bits) of the
PCC, as this isn't needed anymore.

Introduce capability pseudo registers

Introduce corresponding C pseudo registers that contain the following
fields:

type = struct __gdb_builtin_type_capability {
    uint64_t l;
    uint64_t u;
    bool t;
}

For each register of the C set, users can reference the pseudo registers by
adding a 'p' prefix.  For example, the pseudo register of c0 and pcc are pc0
and ppcc.

Users can set the entire 129 bits of the capability this way, if the
cheri.ptrace_forge_cap flag is enabled.

Fix segfault when creating builtin types

Sanity check the existence of a type field before dereferencing it.

Workaround GDBserver register cache management

Given X and C registers have an overlap, if we write to an X register, we need
to fetch the C registers again so we get updated values (side effects). The
same happens when we attempt to write to the lower half of C registers, we need
to fetch the X registers again so we get updated values.

This patch forces the register cache to get updated values after a register
store request.

Handle C regset write warnings better in GDBServer

GDBServer handles register reads/writes a bit differently compared to GDB.

Since C register writes are only allowed with cheri.ptrace_forge_cap=1,
GDBServer will keep issuing warnings if such setting is 0.

This patch improves it by issuing a single more helpful warning to the
user. If cheri.ptrace_forge_cap is flipped, the warning will be issued
again appropriately to remind the user about it.

Bug-ID: https://git.morello-project.org/morello/binutils-gdb/-/issues/5

Handle bitfield instructions in the prologue

Morello GCC seems to generate bitfield instructions in the prologue, which
throws Morello GDB's prologue analyzis off. Handle such instructions so we
can properly skip past the prologue.

New --enable-threading configure option to control use of threads in GDB/GDBserver

Add the --enable-threading configure option so multithreading can be disabled
at configure time. This is useful for statically-linked builds of
GDB/GDBserver, since the thread library doesn't play well with that setup.

If you try to run a statically-linked GDB built with threading, it will crash
when setting up the number of worker threads.

This new option is also convenient when debugging GDB in a system with lots of
threads, where the thread discovery code in GDB will emit too many messages,
like so:

[New Thread 0xfffff74d3a50 (LWP 2625599)]

If you have X threads, that message will be repeated X times.

The default for --enable-threading is "yes".

aarch64: Fix scbnds validation

Prior to this patch, we were failing to validate scbnds instructions
properly in multiple ways. The code in tc-aarch64.c:parse_operands
failed to check if the expression parsing code actually returned a
constant (O_constant) immediate. For sufficiently large immediates this
would result in O_big instead and this was not handled.

Moreover, the code to coerce the immediate form into the immediate +
shift form of the instruction was buggy in multiple ways: using the
wrong mask to check if the lower bits were set and checking the wrong
variable.

Finally, the code in operand_general_constraint_met_p was only checking
if the immediate is in range for the shifted case: it should be checking
this in both cases.

As well as fixing these issues, this patch improves the error messages
in a couple of cases and adds tests for various valid and invalid cases.

gas/ChangeLog:

2021-11-11 Alex Coplan <alex.coplan@arm.com>

* config/tc-aarch64.c (parse_shift): Improve error message for
O_big expressions.
(parse_operands): In AARCH64_OPND_A64C_IMM6_EXT case, handle
parse_shifter_operand_imm returning non-O_constant
expressions; fix logic for coercion to the shifted form.
* testsuite/gas/aarch64/scbnds-immed.d: New test.
* testsuite/gas/aarch64/scbnds-immed.s: Assembly thereof.
* testsuite/gas/aarch64/scbnds-invalid.d: New test.
* testsuite/gas/aarch64/scbnds-invalid.l: Error output thereof.
* testsuite/gas/aarch64/scbnds-invalid.s: Assembly thereof.

opcodes/ChangeLog:

2021-11-11 Alex Coplan <alex.coplan@arm.com>

* aarch64-opc.c (operand_general_constraint_met_p): Always check
if the immediate is in range for AARCH64_OPND_A64C_IMM6_EXT.

Harden checks for capability maintenance commands

Add some sanity checks to make sure the input data is sane.

Add some more documentation for those commands.

aarch64: Mark purecap object files with EF_AARCH64_CHERI_PURECAP

This simple patch sets the ELF header flag EF_AARCH64_CHERI_PURECAP for
purecap Morello object files, as documented in aaelf64-morello:
https://github.com/ARM-software/abi-aa/blob/main/aaelf64-morello/aaelf64-morello.rst

gas/ChangeLog:

2021-09-24 Alex Coplan <alex.coplan@arm.com>

* config/tc-aarch64.c (md_begin): Set the ELF header flag
EF_AARCH64_CHERI_PURECAP if we have the C64 extension.

morello: Fix encoding of ldtr/sttr

This patch fixes the encoding of the immediate in the A64C ldtr/sttr
instructions. Prior to this patch, GAS would accept immediates for these
instructions that were not multiples of 16, and would not scale the
immediate by 16.

gas/ChangeLog:

2021-09-24 Alex Coplan <alex.coplan@arm.com>

* testsuite/gas/aarch64/morello_ldst-c64.d: Update following
test + encoding change.
* testsuite/gas/aarch64/morello_ldst-invalid.d: New test.
* testsuite/gas/aarch64/morello_ldst-invalid.l: New test.
* testsuite/gas/aarch64/morello_ldst-invalid.s: New test.
* testsuite/gas/aarch64/morello_ldst.d: Update following
test + encoding change.
* testsuite/gas/aarch64/morello_ldst.s: Update to use valid
immediates for ldtr/sttr instructions.

opcodes/ChangeLog:

2021-09-24 Alex Coplan <alex.coplan@arm.com>

* aarch64-tbl.h (aarch64_opcode_table): Update A64C_INSNs
ldtr/sttr to take A64C_ADDR_SIMM9 instead of ADDR_SIMM9
operands.

morello-binutils: Adjust c64_valid_cap_range calculation

This function had a buggy implementation of rounding a value up to a
given power of 2. Aligning to a multiple of 16 would align to a
multiple of 32 and so on.

This was observable when linking object files that had very large
objects in them. The compiler would ensure that these objects are large
enough that they are exactly representable, but the linker would
complain that they are not because the linker asserted extra alignment
than the compiler.

Here we fix the bug, add a few testcases, and adjust an existing
testcase in the area.

aarch64: Correct feature bits for Morello

As it stands, the architecture feature bits for Morello include FP16FML
(i.e. ARMv8.2-FHM) but not FP16: this is an invalid combination.
Looking at the Morello Arm ARM [1], it seems that Morello wants the
feature FP16 (ARMv8.2-FP16) but not FP16FML.

[1] : https://developer.arm.com/documentation/ddi0606/latest

include/ChangeLog:

2021-09-10 Alex Coplan <alex.coplan@arm.com>

* opcode/aarch64.h (AARCH64_ARCH_MORELLO): Change F16_FML
feature bit to F16.

Support linkmap offsets for the AAPCS64-CAP ABI

With the AAPCS64-CAP ABI, any pointers in data structures become capabilities.
Capabilities for Morello are 128-bit in size, so the offsets of individual
fields within the structure might change.

One such structure that needs to be adjusted is the linkmap. The linkmap is
used by GDB to list shared libraries that get loaded/unloaded.

This patch enables AAPCS64-CAP linkmap offsets for both GDB and GDBserver,
allowing GDB/GDBserver to list the shared libraries correctly.

gas: Add whitespace in morello-capinit test output regexp

This whitespace is present in the output from objdump but is not in our test
patterns. Adding it makes the testcase pass.

gas: ADR_LO21_PCREL accounts for LSB in symbol

After d30dd5c GAS now accounts for the LSB getting set on STT_FUNC by
maintaining the value of the relevant functions to include that LSB.

Previously GAS attempted to account for the LSB only when outputting the
file (i.e. in the obj_adjust_symtab hook and when a relocation is
getting made for the relevant symbol).
The obj_adjust_symtab hook is still needed, since this is about adding a
flag to an elf_sym rather than adjusting the value of the symbol.

We changed from this so that expressions given by the user would
naturally account for the LSB set on C64 STT_FUNC symbols.  This means
that we no longer need to adjust local pc-relative relocations in
`parse_operands` since the relative relocation will naturally include
whether the LSB is set on the relevant symbol.

Here we remove the previous code to do this adjustment.  With both
methods of accounting we ended up adding 2 to the relocation rather than
just setting the LSB.

Note that the combination of this change and d30dd5c has meant that a
`AARCH64_ADR_PREL_LO21` relocation to a locally defined function now points
directly to that function rather than to that function plus 1.
These relocations are left in the object file when the locally defined
function is declared global.  This matches the behaviour of LLVM.

gas: Remove requirement of getting a target symbol

Before this change we had a check that any capinit directive had a plain
symbol (possibly plus an addend) as an argument.
Unfortunately, the check itself is actually that GAS can identify that
the expression we have is in that category *before* applying all
adjustments after alignment etc.  Internally this not only required that
the expression was of a simple enough form, but also that if we had an
expression of the form `f+((.Ltmp+1)-f)` (which is a form that compilers
use for label addresses) this required that the `f` and `.Ltmp` labels
were in the same `frag`.

In order to be in the same `frag` there could be no alignment between
them, whether from alignment directives between the two labels, or
because we had a data directive in between them and the assembler
ensured we were aligned when re-entering code state.

This artificial requirement triggered an assembler error when running
the GCC testsuite, hence we have removed it.  This matches LLVM
behaviour.  More obvious errors like subtracting symbols from different
sections are still caught in the general expression handling code.

gas: Allow MORELLO branch relocations to addresses with LSB set

Now that we internally handle a set LSB as part of a C64 STT_FUNC value
throughout the assembler rather than as something that is just
introduced by the linker, relocations to code labels now may or may not
include that LSB.

GAS checks that the target of an AARCH64 BRANCH19, TSTBR14, CALL26, or
JUMP26 relocation is aligned, since all uses should point to an
instruction and all instructions should be aligned.
Now that we are including the LSB in the value of STT_FUNC C64 symbols,
the relevant MORELLO_* relocations do not also satisfy this alignment
behaviour.  When these relocations target a location generated from an
STT_FUNC C64 symbol, their value includes that LSB.

This behaviour is not relevant to the user since these relocations lose
the bottom 2 bits of the value they target.  It does however match the
specification of the relocations in the ABI document, which includes the
`C` bit.
This fix avoids requiring that this LSB is unset when in `md_apply_fix`.
For extra robustness we also assert that when setting this LSB on the
symbol in the first place it was not set to begin with.

A downside is that if the LSB is set on non-function symbols the user
will not be warned about that.  Any method to handle that would always
need to determine which expressions should include this LSB and which
shouldn't, which would be difficult to make perfect.  On top of that,
the relevant code would either have to duplicate the code in
`fixup_segment` that resolves an expression into a single value, or
record another bit in the `TC_FIX_TYPE` structure just for this warning.

This seems like more complexity than the extra warning is worth.

We add two tests since `objdump` shows the resulting disassembly but
`readelf` shows the LSB getting set on the relevant functions.

Apply changes to allow compiling with -ansi

This is just to help anyone trying to build Morello binutils with a very
old system compiler.

At the time we branched, binutils wanted to be able to be build using C89.
These changes are what is needed to compile using the `-ansi` flag (i.e.
using that C89 flag).

gas: aarch64: Accept `purecap` and `hybrid` ABI parameters

When GCC is given an ABI parameter with `-mabi=<whatever>` it passes
that argument down to GAS.  GAS does not need to know the Morello ABI
that is being used, since all decisions are based on the processor state
(whether +c64 is enabled or not).

GAS doesn't currently accept `purecap` or `hybrid` as arguments to the
`-mabi` option.  Even though it does not need this information, I think
it should accept the arguments.  This would mean GCC does not need
implement special handling to avoid passing the `-mabi` argument to GAS
in these specific cases.

gas/ChangeLog:

2021-07-30  Matthew Malcomson  <matthew.malcomson@arm.com>

* config/tc-aarch64.c (aarch64_abi_type): Introduce PURECAP and
HYBRID enum entries.
(aarch64_abis): Add "purecap" and "hybrid" parameters.
* testsuite/gas/aarch64/morello-abis-ignored.s: New.
* testsuite/gas/aarch64/morello-abis-ignored.d: New.

gas: aarch64: Make chericap and capinit auto-align

Morello LLVM assumes that these directives should auto-align, it emits
assembly that does not explicitly align these directives.

Fix and testcases added.

gas/ChangeLog:

2021-07-29 Matthew Malcomson <matthew.malcomson@arm.com>

* config/tc-aarch64.c (s_aarch64_capinit, s_aarch64_chericap):
Automatically align to 16 bytes.
* testsuite/gas/aarch64/morello-capinit-align.s: New.
* testsuite/gas/aarch64/morello-capinit-align.d: New.

Fixing missed ChangeLog entries.

Overlooked them in the last three commits.

gas: aarch64: Require 16 bytes for Morello capinit relocation

The `capinit` directive does not allocate space for the relevant
relocation, rather it creates a CAPINIT relocation on the 16 bytes
immediately following it.

Our implementation works by ensuring we can grow the existing `frag` (an
internal structure that describes known contiguous bytes) by 8 bytes
and then recording that we have an 8 byte sized CAPINIT relocation.
It should be 16 bytes, since the relocation is on a 16 byte quantity.

One symptom this problem can cause is where the section that a given
CAPINIT relocation is recorded may not have enough space for the entire
capability the CAPINIT relocation requests.

The testcase we add demonstrated this problem before the current change.
Now it errors out. Unfortunately the error is an internal one with a
error message that references internal data structures, but I believe
that is better than creating a faulty binary without complaint.

gas: aarch64: Introduce the chericap directive

This directive is the equivalent of the capinit directive except that it
allocates space as well as creating a CAPINIT relocation.
It is useful to be added to binutils since LLVM intend to transition
away from using capinit to chericap and this helps enable that
transition.

Here we just emit the required number of zeros into the output file with
md_number_to_chars. Since we don't have to worry about endianness for a
big zero this is not complicated.

gas: aarch64: Fixing expression calculation using C64 symbols

Expressions involving function symbols should take into account the fact
that the LSB of C64 functions is set (while the LSB of labels is not
set).
The main motivating example of this is `capinit` expressions of the form
  f+((.Ltmp0+1)-f)
where `f` is a function and `.Ltmp0` is a label.
These should result in a CAPINIT relocation of the form
`f + <constant multiple of 4>` since the `+1` on the `.Ltmp0` should
cancel out the LSB that is set on `f`.

This is slightly different to the handling of a set LSB for THUMB
functions, since in THUMB the LSB is not kept when computing relations.
For THUMB expressions using the function are emitted as relocations to
let the linker apply the adjustment.

To implement this, we have two options (we choose the second):
  - Handle the LSB in the target hook `md_optimize_expr` (similar to how
    the arm backend handles expressions involving THUMB functions).
  - Set the LSB on the value assigned to the symbol in `tc_frob_label`.

The approach using the `md_optimize_expr` hook would involve adding one
to the expression that describes an operand, when that operand is a
C64 function symbol with no addend.  Then returning `FALSE` from that
hook in order to let the generic code handle the expression from then
on.

We would only want to do this when there is no addend to avoid applying
this adjustment multiple times in an expression (e.g. when a
subexpression reduces to a function symbol plus addend).
This adjustment would also want to avoid doing this to any expression
that would end up as a relocation involving that function symbol, since
then the artificial adjustment would be propagated to the relocation
(resulting in an expression like `f - 63` where the addend has been
adjusted to account for the LSB in `f`, but the linker will account for
the LSB in `f` itself).
Such avoidance is simple enough for expressions like `O_add` since we
can always avoid them, but it is more awkward to tell for `O_subtract`
expressions where some expressions can be reduced to a constant while
others will end up as a relocation.

Another difficulty with this approach is that the value of an expression
can be different depending on the relative location of the `type`
directive to the expression in the assembly source.  If the directive is
before an expression then the expression will account for the LSB but if
the directive is after the expression it will not.
While behaviour depending on the location of the `type` directive is a
tricky problem and has problems in the Morello LLVM compiler as well,
these behaviours do not match the behaviour of Morello LLVM.

The second approach is to adjust a symbols value in `tc_frob_label` if
it is a C64 function.  This will automatically mean that all symbol
expressions use this LSB correctly.
This approach does still have difficulties with relative locations of
the `type` directive, but here the behaviour matches Morello LLVM.  The
important factor in this case is whether the `type` directive is before
or after the function symbols label.  If the function label is before
the `type` directive then *all* expressions using the function label
will not account for the LSB, otherwise all expressions will utilise it.

There are two known differences with the Morello LLVM behaviour when
taking this approach.

The first is around calculating an expression of the form `operand - f`.
If `operand` is known, then both GAS and LLVM will account for the LSB
of `f`, but if `operand` is not known at the time this expression is
found then GAS will account for the LSB in the final relocation put into
the binary while Morello LLVM will not.  This is a Morello LLVM bug.

The second is that Morello LLVM does not allow expressions of the form
`f > altlabel` while GAS does.  In this case we have chosen to account
for the LSB, so that even if `f` and `altlabel` are defined in the same
place, if `f` is a C64 function symbol and `altlabel` is not then `f >
altlabel` will evaluate to true.

gas: Use correct data type in parse_operands

We're choosing the bfd_reloc_code_real_type value to set on
inst.reloc.type.  Hence we should use the bfd_reloc_code_real_type type
for our temporaries.

This was not failing since the temporaries could hold the relevant
types, but was causing warnings that broke the build if running with
-Werror.  I saw the warning on gcc version 10 and 11, I did not see the
warning on gcc version 7.5.

Testsuite as a cross-build for aarch64-none-elf and aarch64-linux native
shows no change.

---

2021-07-20  Matthew Malcomson  <matthew.malcomson@arm.com>

gas/ChangeLog:

* config/tc-aarch64.h (parse_operands): Use correct enum type for
temporaries.