Self parameter parsing errors may come from different situations, which
should not be handled in the same way. It is now possible to
differentiate a missing self parameter from a self pointer or a parsing
error.
gcc/rust/ChangeLog:
* parse/rust-parse-impl.h (Parser::parse_function): Early return on
unrecoverable errors.
(Parser::parse_trait_item): Likewise.
(Parser::parse_self_param): Update return type.
* parse/rust-parse.h (enum ParseSelfError): Add enumeration to describe
different self parameter parsing errors.
The UNSPEC_XTHEAD* macros ended up in the unspecv enum,
which broke gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c.
The INSNs expect these unspecs to be not volatile.
Further, there is not reason to have them defined volatile.
So let's simply move the macros into the unspec enum.
With this patch we have again 0 fails in riscv.exp.
gcc/ChangeLog:
* config/riscv/riscv.md: Move UNSPEC_XTHEADFMV* to unspec enum.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Iain Sandoe [Wed, 24 Jan 2024 08:05:01 +0000 (08:05 +0000)]
testsuite, GDC: Update link flags [PR112861].
The regressions here are because we do not generate a runpath for
the uninstalled libstdc++. This patch updates the link flags handling
to simplify it.
We need to add options to locate both libgphobos and libstdc++
Usually '-L' options are added to point to the relevant directories for
the uninstalled libraries.
In cases where libraries are available as both shared and convenience
some additional checks are made.
For some targets -static-xxxx options are handled by specs substitution
and need a '-B' option rather than '-L'. For Darwin, when embedded
runpaths are in use (the default for all versions after macOS 10.11),
'-B' is also needed to provide the runpath.
When '-B' is used, this results in a '-L' for each path that exists (so
that appending a '-L' as well is a needless duplicate). There are also
cases where tools warn for duplicates, leading to spurious fails.
Therefore the objective is to add a single -B/-L option for each needed
path.
PR target/112861
gcc/testsuite/ChangeLog:
* lib/gdc.exp: Decide on whether to present -B or -L to reference
the paths to uninstalled libphobos and libstdc++ and use that to
generate the link flags.
Iain Sandoe [Sun, 28 Jan 2024 13:31:56 +0000 (13:31 +0000)]
libgcc: Make heap trampoline support dynamic [PR113403].
In order to handle system security constraints during GCC build
and test and that most platform versions cannot link to libgcc_eh
since the unwinder there is incompatible with the system one.
1. We make the support functions weak definitions.
2. We include them as a CRT for platform conditions that do not
allow libgcc_eh.
3. We ensure that the weak symbols are exported from DSOs (which
includes exes on Darwin) so that the dynamic linker will
pick one instance (which avoids duplication of trampoline
caches).
* config.host: Build libheap_t.a for i686/x86_64 Darwin.
* config/aarch64/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/i386/heap-trampoline.c (HEAP_T_ATTR): New.
(allocate_tramp_ctrl): Allow a target to build this as a weak def.
(__gcc_nested_func_ptr_created): Likewise.
* config/t-darwin: Build libheap_t.a (a CRT with heap trampoline
support).
Iain Sandoe [Fri, 19 Jan 2024 15:57:04 +0000 (15:57 +0000)]
libgcc: Make heap trampoline support dynamic [PR113403].
This removes the heap trampoline support functions from libgcc.a and
adds them to libgcc_eh.a. They are also present in libgcc_s.
PR libgcc/113403
libgcc/ChangeLog:
* config/aarch64/t-heap-trampoline: Move the heap trampoline
support functions from libgcc.a to libgcc_eh.a.
* config/i386/t-heap-trampoline: Likewise.
early-ra would allocate ptr to an FPR for the first asm, thus
leaving an FPR address in the second asm. The address was then
reloaded by LRA to make it valid.
But early-ra shouldn't be allocating at all in that kind of
situation. Doing so caused the ICE in the PR (with LDP fusion).
Fixed by making sure that we record address references as
GPR references.
gcc/
PR target/113623
* config/aarch64/aarch64-early-ra.cc (early_ra::preprocess_insns):
Mark all registers that occur in addresses as needing a GPR.
gcc/testsuite/
PR target/113623
* gcc.c-torture/compile/pr113623.c: New test.
aarch64: Handle debug references to removed registers [PR113636]
In this PR, we entered early-ra with quite a bit of dead code.
The code was duly removed (to avoid wasting registers), but there
was a dangling reference in debug instructions, which caused an
ICE later.
Fixed by resetting a debug instruction if it references a register
that is no longer needed by non-debug instructions.
gcc/
PR target/113636
* config/aarch64/aarch64-early-ra.cc (early_ra::replace_regs): Take
the containing insn as an extra parameter. Reset debug instructions
if they reference a register that is no longer used by real insns.
(early_ra::apply_allocation): Update calls accordingly.
gcc/testsuite/
PR target/113636
* go.dg/pr113636.go: New test.
Jakub Jelinek [Tue, 30 Jan 2024 08:58:05 +0000 (09:58 +0100)]
tree-ssa-strlen: Fix up handle_store [PR113603]
Since r10-2101-gb631bdb3c16e85f35d3 handle_store uses
count_nonzero_bytes{,_addr} which (more recently limited to statements
with the same vuse) can walk earlier statements feeding the rhs
of the store and call get_stridx on it.
Unlike most of the other functions where get_stridx is called first on
rhs and only later on lhs, handle_store calls get_stridx on the lhs before
the count_nonzero_bytes* call and does some si->nonzero_bytes comparison
on it.
Now, strinfo structures are refcounted and it is important not to screw
it up.
What happens on the following testcase is that we call get_strinfo on the
destination idx's base (g), which returns a strinfo at that moment
with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3
(the vector of strinfos was unshared from the dominator there because some
other strinfo was added) and finally we process a store in bb 6.
Now, count_nonzero_bytes is called and that sees &g[1] in a PHI and
calls get_stridx on it, which in turn calls get_stridx_plus_constant
because &g + 1 address doesn't have stridx yet. This creates a new
strinfo for it:
si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars),
basesi->full_string_p);
set_strinfo (idx, si);
and the latter call, because it is the first one in bb 6 that needs it,
unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes
3).
Now, get_stridx_plus_constant needs to chain the new strinfo of &g[1] in
between the related strinfos, so after the g record. Because the strinfo
is now shared between the current bb and 2 other bbs, it needs to
unshare_strinfo it (creating a new strinfo which can be modified as a copy
of the old one, decrementing refcount of the old shared one and setting
refcount of the new one to 1):
if (strinfo *nextsi = get_strinfo (chainsi->next))
{
nextsi = unshare_strinfo (nextsi);
si->next = nextsi->idx;
nextsi->prev = idx;
}
chainsi = unshare_strinfo (chainsi);
if (chainsi->first == 0)
chainsi->first = chainsi->idx;
chainsi->next = idx;
Now, the bug is that the caller of this a couple of frames above,
handle_store, holds on a pointer to this g strinfo (but doesn't know
about the unsharing, so the pointer is to the old strinfo with refcount
of 2), and later needs to update it, so it
si = unshare_strinfo (si);
and modifies some fields in it.
This creates a new strinfo (with refcount of 1 which is stored into
the vector of the current bb) based on the old strinfo for g and
decrements refcount of the old one to 1. So, now we are in inconsistent
state, because the old strinfo for g is referenced in bb 2 and bb 3
vectors, but has just refcount of 1, and then have one strinfo (the one
created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which
has refcount of 1 but isn't referenced from anywhere anymore.
Later on when we free one of the bb 2 or bb 3 vectors (forgot which)
that decrements refcount from 1 to 0 and poisons the strinfo/returns it to
the pool, but then maybe_invalidate when looking at the other bb's pointer
to it ICEs.
The following patch fixes it by calling get_strinfo again, it is guaranteed
to return non-NULL, but could be an unshared copy instead of the originally
fetched shared one.
I believe we only need to do this refetching for the case where get_strinfo
is called on the lhs before get_stridx is called on other operands, because
we should be always modifying (apart from the chaining changes) the strinfo
for the destination of the statements, not other strinfos just consumed in
there.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113603
* tree-ssa-strlen.cc (strlen_pass::handle_store): After
count_nonzero_bytes call refetch si using get_strinfo in case it
has been unshared in the meantime.
The expansion of this builtin emits an error if the argument is not
INTEGER_CST, otherwise uses tree_to_uhwi on the argument (which is declared
int) and then uses EH_RETURN_DATA_REGNO macro which on most targets returns
INVALID_REGNUM for all values but some small number (2 or 4); if it returns
INVALID_REGNUM, we silently expand to -1.
Now, I think the error for non-INTEGER_CST makes sense to catch when people
unintentionally don't call it with a constant (but, users shouldn't really
use this builtin anyway, it is for the unwinder only). Initially I thought
about emitting an error for the negative values as well on which
tree_to_uhwi otherwise ICEs, but given that the function will silently
expand to -1 for INT_MAX - 1 or INT_MAX - 3 other values, I think treating
the negatives the same silently is fine too.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101195
* except.cc (expand_builtin_eh_return_data_regno): If which doesn't
fit into unsigned HOST_WIDE_INT, return constm1_rtx.
Jakub Jelinek [Tue, 30 Jan 2024 08:31:22 +0000 (09:31 +0100)]
testsuite: Fix up pr113622-{2,3}.c for i686-linux [PR113622]
The 2 new tests FAIL for me on i686-linux:
.../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:14: error: data type of 'a' isn't suitable for a register
.../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:29: error: data type of 'b' isn't suitable for a register
.../gcc/testsuite/gcc.target/i386/pr113622-2.c:5:44: error: data type of 'c' isn't suitable for a register
The problem is that the tests use vectors of double, something added
only in SSE2, while the testcases ask for just -msse which only provides
vectors of floats.
So, either it should be using floats instead of doubles, or we need
to add -msse2 to dg-options.
I've done the latter.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113622
* gcc.target/i386/pr113622-2.c: Use -msse2 instead of -msse in
dg-options.
* gcc.target/i386/pr113622-3.c: Likewise.
Jin Ma [Mon, 29 Jan 2024 09:57:00 +0000 (17:57 +0800)]
RISC-V: THEAD: Fix improper immediate value for MODIFY_DISP instruction on 32-bit systems.
When using '%ld' to print 'long long int' variable, 'fprintf' will
produce messy output on a 32-bit system, in an incorrect instruction
being generated, such as 'th.lwib a1,(a0),-16,4294967295'. And the
following error occurred during compilation:
Nathaniel Shead [Sat, 27 Jan 2024 11:46:44 +0000 (22:46 +1100)]
c++: Handle error header names in modules [PR107594]
When there are no include paths while preprocessing a header-name token,
an empty STRING_CST is returned. This patch ensures this is handled when
attempting to create a module for this name.
The changes to strub-unsupported* were incorrect, those tests verify
the error messages issued when strub support is properly disabled with
TARGET_HAVE_STRUB_SUPPORT_FOR.
aarch64: fix handling of reversed mem ops in ldp/stp policy model
The current ldp/stp policy framework implementation would miss cases,
where the memory operands were reversed. To address this, the call to
the framework function is moved after the lower mem check with the
suitable parameters.
This change removes the mode of aarch64_operands_ok_for_ldpstp, which
becomes unused.
gcc/ChangeLog:
* config/aarch64/aarch64-ldpstp.md: Remove unused mode.
* config/aarch64/aarch64-protos.h (aarch64_operands_ok_for_ldpstp):
Likewise.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Call on framework moved later.
Alexandre Oliva [Mon, 29 Jan 2024 18:08:35 +0000 (15:08 -0300)]
testsuite: require libc sym for -shared
Targets whose binutils support -shared, but that don't have a shared
libc, and that can't add PDC (non-PIC) to shared libraries, may
succeed at the effective target test for -shared, because it brings
nothing from libc, but tests that rely on -shared and that use bits
from libc, such as g++.dg/lto/pr108772, fail despite requiring the
shared effective target.
Extend the effective target test to bring malloc() from libc, that's
likely to be present in libc and bring a substantial amount of code if
no shared libc is available.
for gcc/testsuite/ChangeLog
* lib/target-supports.exp (check_effective_target_shared):
Check for a static-only libc.
Alexandre Oliva [Mon, 29 Jan 2024 18:08:32 +0000 (15:08 -0300)]
testsuite: no dfp run without dfprt
newlib-src/libc/include/sys/fenv.h doesn't define the FE_* macros that
libgcc expects to enable decimal float support. Only after newlib is
configured and built does an overriding header that defines those
macros become available in objdir/<target>/newlib/targ-include/, but
by then, libgcc has already been built without dfp and libbid.
This has exposed a number of tests that attempt to link dfp programs
without requiring a dfprt effective target.
dfp.exp already skips if dfp support is missing altogether, and sets
the default to compile rather than run if dfp support is present in
the compiler but missing in the runtime libraries.
However, some of the dfp tests override the default without requiring
dfprt. Drop the overriders where reasonable, and add the explicit
requirement elsewhere.
Jose E. Marchesi [Mon, 29 Jan 2024 16:47:00 +0000 (17:47 +0100)]
bpf: emit empty epilogues in naked functions
This patch fixes the BPF backend to not generate `exit' (return)
instructions in epilogues of functions that are declared as naked via
the corresponding compiler attribute. Having extra exit instructions
upsets the kernel BPF verifier.
Tested in bpf-unknown-none target in x86_64-linux-gnu host.
gcc/ChangeLog
* config/bpf/bpf.cc (bpf_expand_epilogue): Do not emit a return
instruction in naked function epilogues.
gcc/testsuite/ChangeLog
* gcc.target/bpf/naked-1.c: Update test to not expect an exit
instruction in naked function.
* gcc.target/bpf/naked-2.c: New test.
Jason Merrill [Fri, 26 Jan 2024 22:33:51 +0000 (17:33 -0500)]
c++: local class in generic lambda [PR113544]
My earlier commit r14-278-gd60cbbfaa9a3ad was a start toward better
handling of local classes in generic lambdas, but isn't actually useful by
itself and breaks this testcase, so let's revert it for now.
The rev16 pattern was not recognised anymore as a change in the bswap
tree pass was introducing a new GIMPLE form, not recognized by the
assembly final transformation pass.
Also, fix the output patterns for arm_rev16si_alt[12] to correctly
handle the instructions being made conditional.
PR target/108933
* gcc.target/arm/rev16.c: Moved to...
* gcc.target/arm/rev16_1.c: ...here.
* gcc.target/arm/rev16_2.c: New test to check that rev16 is emitted.
Richard Biener [Mon, 29 Jan 2024 09:24:39 +0000 (10:24 +0100)]
middle-end/113622 - handle store with variable index to register
The following implements storing to a non-MEM_P with a variable
offset. We usually avoid this by forcing expansion to memory but
this doesn't work for hard register variables. The solution is
to spill and operate on the stack.
PR middle-end/113622
* expr.cc (expand_assignment): Spill hard registers if
we index them with a variable offset.
* gcc.target/i386/pr113622-2.c: New testcase.
* gcc.target/i386/pr113622-3.c: Likewise.
Richard Biener [Mon, 29 Jan 2024 08:47:31 +0000 (09:47 +0100)]
middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs
The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers. This can avoid some ICEs later and create
better code.
PR middle-end/113622
* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
Also allow DECL_HARD_REGISTER variables.
Alex Coplan [Mon, 29 Jan 2024 13:28:04 +0000 (13:28 +0000)]
aarch64: Ensure iterator validity when updating debug uses [PR113616]
The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop. In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.
This patch fixes that by introducing a general mechanism to avoid this
sort of problem. We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value. It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.
We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.
We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616. While doing this, I
remembered that cleanup_tombstones () had the same problem. I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.
While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast<set_info *> to as_a<set_info *>,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.
gcc/ChangeLog:
PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.
gcc/testsuite/ChangeLog:
PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
PR112950: Use #pragma GCC for including arm_sve.h.
gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
and instead use #pragma GCC for including arm_sve.h.
This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts. We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount. But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.
We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.
The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges. This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).
pr113281-1.c is the original testcase. pr113281-[23].c failed
before the patch due to overly optimistic narrowing. pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.
gcc/
PR target/113281
* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.
Tobias Burnus [Mon, 29 Jan 2024 12:06:27 +0000 (13:06 +0100)]
nvptx.opt: Add sm_89 and sm_90a to -march-map=
The -march-map= options maps the compute capability to the closest
lower compute capability that has been implemented; for sm_89 and
sm_90a, that were previously missing, that's currently -march=sm_80
alias -misa=sm_80.
gcc/ChangeLog:
* config/nvptx/nvptx.opt (march-map=): Add sm_89 and sm_90a.
Tobias Burnus [Mon, 29 Jan 2024 10:10:33 +0000 (11:10 +0100)]
gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]
Some more '-g' fixes as the .mkoffload.dbg.o debug file's has elf flags
which did not match those generated for the compilation, leading to linker
errors. For .mkoffload.dbg.o, the elf flags are generated by mkoffload
itself - while for the other .o files, that's done by the compiler via
setting default and mainly via the ASM_SPEC.
This is a follow up to r14-8332-g13127dac106724 which fixed an issue
caused by the default arch. In this patch, it is mainly for gfx1100
and gfx1030 which always failed. It also affects gfx906 and possibly
gfx900 but only when using the -mxnack/-msram-ecc flags explicitly.
What happens on the compiler side is mainly determined by gcn-hsa.h's
and otherwise by some default setting. In particular for xnack and
sram_ecc, there is:
For gfx1100 and gfx1030, neither xnack nor sram_ecc is set (only
'+wavefrontsize64').
For fiji, gfx900, gfx906 and gfx908 there is always -mattr=-xnack and
for all but gfx908 also -msram-ecc=no - independent of what has been
passed to the compiler. However, on the elf flags, the result differs:
For fiji, due to the HSACOv3, it is always set to 0 via
copy_early_debug_info; for gfx900, gfx906 and gfx908, xnack is OFF.
For sram-ecc, it is 'unset' for gfx900, 'any' for gfx906 and for
gfx908 it is 'any' unless overridden.
For gfx90a, the -msram-ecc= and -mxnack= are passed on, or if not present,
...=any is passed on. Note that this "any" is different from argument
nor present at elf flag level:
For XNACK: unset/unsupported is 0, any = 0x100, off = 0x200, on = 0x300.
For SRAMECC: unset/unsupported is 0, any = 0x400, off = 0x800, on = 0xc00.
The obstack_ptr_grow changes are more to avoid confusion than having an
actual effect as they would overwise be filtered out via the ASM_SPEC.
gcc/ChangeLog:
PR other/111966
* config/gcn/mkoffload.cc (SET_XNACK_UNSET, TEST_SRAM_ECC_UNSET): New.
(SET_SRAM_ECC_UNSUPPORTED): Renamed to ...
(SET_SRAM_ECC_UNSET): ... this.
(copy_early_debug_info): Remove gfx900 special case, now handled as
part of the generic handling.
(main): Update SRAM_ECC and XNACK for the -march as done in gcn-hsa.h.
On the following testcase we emit an invalid range of [2, 1] due to
UB in the source. Older VRP code silently swapped the boundaries and
made [1, 2] range out of it, but newer code just ICEs on it.
The reason for pdata->minlen 2 is that we see a memcpy in this case
setting both elements of the array to non-zero value, so strlen (a)
can't be smaller than 2. The reason for pdata->maxlen 1 is that in
char a[2] array without UB there can be at most 1 non-zero character
because there needs to be '\0' termination in the buffer too.
IMHO we shouldn't create invalid ranges like that and even creating
for that case a range [1, 2] looks wrong to me, so the following patch
just doesn't set maxlen in that case to the array size - 1, matching
what will really happen at runtime when triggering such UB (strlen will
be at least 2, perhaps more or will crash).
This is what the second hunk of the patch does.
The first hunk fixes a fortunately harmless thinko.
If the strlen pass knows the string length (i.e. get_string_length
function returns non-NULL), we take a different path, we get to this
only if all we know is that there are certain number of non-zero
characters but we don't know what it is followed with, whether further
non-zero characters or zero termination or either of that.
If we know exactly how many non-zero characters it is, such as
char a[42];
...
memcpy (a, "01234567890123456789", 20);
then we take an earlier if for the INTEGER_CST case and set correctly
just pdata->minlen to 20 in that case, but if we have something like
int len;
...
if (len < 15 || len > 32) return;
memcpy (a, "0123456789012345678901234567890123456789", len);
then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
correctly to 15, but incorrectly set also pdata->maxlen to 32. That is
not what the above implies, it just means that in some cases we know that
there are at least 32 non-zero characters, followed by something we don't
know. There is no guarantee that there is '\0' right after it, so it
means nothing.
The reason this is harmless, just confusing, is that the code a few lines
later fortunately overwrites this incorrect pdata->maxlen value with
something different (either array length - 1 or all ones etc.).
2024-01-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110603
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
setting of pdata->maxlen to vr.upper_bound (which is unconditionally
overwritten anyway). Avoid creating invalid range with minlen
larger than maxlen. Formatting fix.
Richard Biener [Fri, 26 Jan 2024 14:11:47 +0000 (15:11 +0100)]
debug/103047 - argument order of inlined functions
The inliner puts variables for parameters of the inlined functions
in the inline scope in reverse order. The following reverses them
again so that we get consistent ordering between the
DW_TAG_subprogram DW_TAG_formal_parameter and the
DW_TAG_inlined_subroutine DW_TAG_formal_parameter set.
I failed to create a testcase with regexps since the inline
instances have just abstract origins and so I can't match them up.
PR debug/103047
* tree-inline.cc (initialize_inlined_parameters): Reverse
the decl chain of inlined parameters.
Libatomic: Add checks in ifunc selectors for LSE/LSE2 requirements.
At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register. This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register. Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.
In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers. Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.
libatomic/ChangeLog:
* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
libatomic: Enable LSE128 128-bit atomics for Armv9.4-a
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:
* LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
value held in a pair of registers, with original data loaded into
the same 2 registers.
* LDSETP - Atomic OR (bitset) of a location with 128-bit value held
in a pair of registers, with original data loaded into the same 2
registers.
* SWPP - Atomic swap of one 128-bit value with 128-bit value held
in a pair of registers.
It is worth noting that in keeping with existing 128-bit atomic
operations in `atomic_16.S', we have chosen to merge certain
less-restrictive orderings into more restrictive ones. This is done
to minimize the number of branches in the atomic functions, minimizing
both the likelihood of branch mispredictions and, in keeping code
small, limit the need for extra fetch cycles.
Past benchmarking has revealed that acquire is typically slightly
faster than release (5-10%), such that for the most frequently used
atomics (CAS and SWP) it makes sense to add support for acquire, as
well as release.
Likewise, it was identified that combining acquire and release typically
results in little to no penalty, such that it is of negligible benefit
to distinguish between release and acquire-release, making the
combining release/acq_rel/seq_cst a worthwhile design choice.
This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.
In order to do this, the following changes are made:
1. Add a configure-time check to check for LSE128 support in the
assembler.
2. Edit host-config.h so that when N == 16, nifunc = 2.
3. Where available due to LSE128, implement the second ifunc, making
use of the novel instructions.
4. For atomic functions unable to make use of these new
instructions, define a new alias which causes the _i1 function
variant to point ahead to the corresponding _i2 implementation.
libatomic: Add support for __ifunc_arg_t arg in ifunc resolver
With support for new atomic features in Armv9.4-a being indicated by
HWCAP2 bits, Libatomic's ifunc resolver must now query its second
argument, of type __ifunc_arg_t*.
We therefore make this argument known to libatomic, allowing us to
query hwcap2 bits in the following manner:
* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Conditionally-defined if `sys/ifunc.h' not found.
(_IFUNC_ARG_HWCAP): Likewise.
(IFUNC_COND_1): Pass __ifunc_arg_t argument to ifunc.
(ifunc1): Modify function signature to accept __ifunc_arg_t
argument.
* configure.tgt: Add second `const __ifunc_arg_t *features'
argument to IFUNC_RESOLVER_ARGS.
libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i<n>' suffixes to functions
cumbersome to work with. It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.
This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:
#define LSE(NAME) NAME##_i1
Where we wish to generate ifunc names with the pre-processor's token
concatenation feature, we add a level of indirection to previous macro
calls. If before we would have had`MACRO(<name>_i<n>)', we now have
`MACRO_FEAT(name, feature)'. Where we wish to refer to base
functionality (i.e., functions where ifunc suffixes are absent), the
original `MACRO(<name>)' may be used to bypass suffixing.
Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same. For example, the entry
and endpoints of `libat_store_16' remain defined by:
ENTRY (libat_store_16)
and
END (libat_store_16)
For the LSE2 implementation of the same 16-byte atomic store, we now
have:
ENTRY_FEAT (libat_store_16, LSE2)
and
END_FEAT (libat_store_16, LSE2)
For the aliasing of function names, we define the following new
implementation of the ALIAS macro:
ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)
Defining the `CORE(NAME)' macro to be the identity operator, it
returns the base function name unaltered and allows us to alias
target-specific ifuncs to the corresponding base implementation.
For example, we'd alias the LSE2 `libat_exchange_16' to it base
implementation with:
ALIAS (libat_exchange_16, LSE2, CORE)
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
(ALIAS1): New.
Harald Anlauf [Thu, 25 Jan 2024 21:19:10 +0000 (22:19 +0100)]
Fortran: NULL actual to optional dummy with VALUE attribute [PR113377]
gcc/fortran/ChangeLog:
PR fortran/113377
* trans-expr.cc (conv_dummy_value): Treat NULL actual argument to
optional dummy with the VALUE attribute as not present.
(gfc_conv_procedure_call): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/113377
* gfortran.dg/optional_absent_11.f90: New test.
Iain Sandoe [Thu, 25 Jan 2024 20:11:09 +0000 (20:11 +0000)]
Objective-C, Darwin: Do not overalign CFStrings and Objective-C metadata.
We have reports of regressions in both Objective-C and Objective-C++ on
Darwin23 (macOS 14). In some cases, these are linker warnings about the
alignment of CFString constants; in other cases the built executables
crash during runtime initialization. The underlying issue is the same in
both cases; since the objects (CFStrings, Objective-C meta-data) are TU-
local, we are choosing to increase their alignment for efficiency - to
values greater than ABI alignment.
However, although these objects are TU-local, they are also visible to the
linker (since they are placed in specific named sections). In many cases
the metadata can be regarded as tables of data, and thus it is expected
that these sections can be concatenated from multiple TUs and the data
treated as tabular. In order for this to work the data cannot be allowed
to exceed ABI alignment - which leads to the crashes.
For GCC-15+ it would be nice to find a more elegant solution to this issue
(perhaps by adjusting the concept of binds-locally to exclude specific
named sections) - but I do not want to do that in stage 4.
The solution here is to force the alignment to be preserved as created by
setting DECL_USER_ALIGN on the relevant objects.
gcc/ChangeLog:
* config/darwin.cc (darwin_build_constant_cfstring): Prevent over-
alignment of CFString constants by setting DECL_USER_ALIGN.
Two of the encode testcases include '-lobjc' as their dg-options.
Since the library is already appended as part of the generic testsuite
handling, this means that two instances appear on the link line leading
to spurious warnings from Darwin's new linker.
Iain Sandoe [Tue, 16 Jan 2024 10:21:14 +0000 (10:21 +0000)]
Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402]
The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.
As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*. In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.
PR libgcc/113402
gcc/ChangeLog:
* builtins.cc (expand_builtin): Handle BUILT_IN_GCC_NESTED_PTR_CREATED
and BUILT_IN_GCC_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_GCC_NESTED_PTR_CREATED,
BUILT_IN_GCC_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_GCC_NESTED_PTR_CREATED and BUILT_IN_GCC_NESTED_PTR_DELETED local
builtins only for non-explicit.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jakub Jelinek <jakub@redhat.com>
Iain Sandoe [Sat, 13 Jan 2024 21:14:07 +0000 (21:14 +0000)]
testsuite, jit: Stabilize error output.
Currently when a test fails, we print out a lot of information,
this includes items that are not stable between invocations (e.g.
the PID for the executable). That makes automated comparisons
between test runs flag any persistent fails as new ones each time
which is not usually what is wanted.
This patch amends the error output to drop the variable portion
of the message and retain items that should only change if the
failure mode changes.
gcc/testsuite/ChangeLog:
* jit.dg/jit.exp: Filter error output to remove per-run
variable content.
Jose E. Marchesi [Sat, 27 Jan 2024 19:08:12 +0000 (20:08 +0100)]
bpf: add constant pointer to helper-skb-ancestor-cgroup-id.c test
The purpose of this test is to make sure that constant propagation is
achieved with the proper optimization level, so a BPF call instruction
to a kernel helper is generated. This patch updates the patch so it
also covers kernel helpers defined with constant static pointers.
Harald Anlauf [Sat, 27 Jan 2024 16:41:43 +0000 (17:41 +0100)]
Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]
Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies. However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies. The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.
gcc/fortran/ChangeLog:
PR fortran/104908
* trans-array.cc (gfc_conv_array_ref): Restrict use of transformed
descriptor (sym->backend_decl) to the unlimited polymorphic case.
gcc/testsuite/ChangeLog:
PR fortran/104908
* gfortran.dg/pr104908.f90: New test.
H.J. Lu [Tue, 23 Jan 2024 14:59:51 +0000 (06:59 -0800)]
x86: Don't save callee-saved registers in noreturn functions
There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions. We can treat them the same
as functions with no_callee_saved_registers attribute.
Adjust stack-check-17.c for noreturn function which no longer saves any
registers.
With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from
do_exit:
endbr64
call <do_exit+0x9>
sub $0x28,%rsp
mov %rdi,%r12
mov %gs:0x28,%rax
mov %rax,0x20(%rsp)
xor %eax,%eax
mov %gs:0x0,%rbx
call *0x0(%rip) # <do_exit+0x2f>
test $0x2,%ah
je <do_exit+0x8c9>
I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch. The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.
GCC master branch build time in seconds:
before after improvement
30043.75user 30013.16user 0%
1274.85system 1243.72system 2.4%
GCC master branch test time in seconds (new tests added):
before after improvement
216035.90user 216547.51user 0
27365.51system 26658.54system 2.6%
gcc/
PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Don't
save and restore callee saved registers for a noreturn function
with nothrow or compiled with -fno-exceptions.
H.J. Lu [Tue, 23 Jan 2024 14:59:50 +0000 (06:59 -0800)]
x86: Add no_callee_saved_registers function attribute
When an interrupt handler is implemented by an assembly stub which does:
1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.
it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.
Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers. Such a function won't save and
restore any registers. Classify function call-saved register handling
type with:
1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.
Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function. Otherwise,
callee-saved registers won't be preserved.
After a no_callee_saved_registers function is called, all registers may
be clobbered. If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.
gcc/
PR target/103503
PR target/113312
* config/i386/i386-expand.cc (ix86_expand_call): Replace
no_caller_saved_registers check with call_saved_registers check.
Clobber all registers that are not used by the callee with
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Set
call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
noreturn function. Disallow no_callee_saved_registers with
interrupt or no_caller_saved_registers attributes together.
(ix86_set_current_function): Replace no_caller_saved_registers
check with call_saved_registers check.
(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
(ix86_handle_call_saved_registers_attribute): This.
(ix86_gnu_attributes): Add
ix86_handle_call_saved_registers_attribute.
* config/i386/i386.cc (ix86_conditional_register_usage): Replace
no_caller_saved_registers check with call_saved_registers check.
(ix86_function_ok_for_sibcall): Don't allow callee with
no_callee_saved_registers attribute when the calling function
has callee-saved registers.
(ix86_comp_type_attributes): Also check
no_callee_saved_registers.
(ix86_epilogue_uses): Replace no_caller_saved_registers check
with call_saved_registers check.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_save_reg): Replace no_caller_saved_registers check with
call_saved_registers check. Don't save any registers for
TYPE_NO_CALLEE_SAVED_REGISTERS. Save all registers with
TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
no_callee_saved_registers attribute is called.
(find_drap_reg): Replace no_caller_saved_registers check with
call_saved_registers check.
* config/i386/i386.h (call_saved_registers_type): New enum.
(machine_function): Replace no_caller_saved_registers with
call_saved_registers.
* doc/extend.texi: Document no_callee_saved_registers attribute.
Jakub Jelinek [Sat, 27 Jan 2024 12:06:55 +0000 (13:06 +0100)]
lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]
The following testcase is miscompiled, because some narrower value
is sign-extended to wider unsigned _BitInt used as division operand.
handle_operand_addr for that case returns the narrower value and
precision -prec_of_narrower_value. That works fine for multiplication
(at least, normal multiplication, but we don't merge casts with
.MUL_OVERFLOW or the ubsan multiplication right now), because the
result is the same whether we treat the arguments as signed or unsigned.
But is completely wrong for division/modulo or conversions to
floating-point, if we pass negative prec for an input operand of a libgcc
handler, those treat it like a negative number, not an unsigned one
sign-extended from something smaller (and it doesn't know to what precision
it has been extended).
So, the following patch fixes it by making sure we don't merge such
sign-extensions to unsigned _BitInt type with division, modulo or
conversions to floating point.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113614
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge
widening casts from signed to unsigned types with TRUNC_DIV_EXPR,
TRUNC_MOD_EXPR or FLOAT_EXPR uses.
Jakub Jelinek [Sat, 27 Jan 2024 12:06:17 +0000 (13:06 +0100)]
lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]
We generally allow merging mergeable stmts with some final cast (but not
further casts or mergeable operations after the cast). As some casts
are handled conditionally, if (idx < cst) handle_operand (idx); else if
idx == cst) handle_operand (cst); else ..., we must sure that e.g. the
mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand
called from such casts, because it ICEs on invalid SSA_NAME form (that part
could be fixable by adding further PHIs) but also because we'd need to
correctly propagate the overflow flags from the if to else if.
So, instead lower_mergeable_stmt handles an outermost widening cast (or
widening cast feeding outermost store) specially.
The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is
present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR,
so the checks whether the outermost cast should be handled didn't handle
the VCE case and so handle_plus_minus was called from the conditional
handle_cast.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113568
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1
in the widening extension checks.
Jakub Jelinek [Sat, 27 Jan 2024 12:05:30 +0000 (13:05 +0100)]
lower-bitint: Add debugging dump of SSA_NAME -> decl mappings
While the SSA coalescing performed by lower bitint prints some information
if -fdump-tree-bitintlower-details, it is really hard to read and doesn't
contain the most important information which one looks for when debugging
bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs)
each SSA_NAME in large_huge.m_names bitmap maps to.
So, the following patch adds dumping of that, so that we know that say
_3 -> bitint.3
_8 -> bitint.7
_16 -> bitint.7
etc.
2024-01-27 Jakub Jelinek <jakub@redhat.com>
* gimple-lower-bitint.cc (gimple_lower_bitint): For
TDF_DETAILS dump mapping of SSA_NAMEs to decls.
Lewis Hyatt [Tue, 5 Dec 2023 16:33:39 +0000 (11:33 -0500)]
c-family: Fix ICE with large column number after restoring a PCH [PR105608]
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.
This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
The new testcase line-map-3.C contains XFAILed examples where the locations
are wrong.
This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.
gcc/c-family/ChangeLog:
PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.
gcc/testsuite/ChangeLog:
PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
* g++.dg/pch/line-map-3.C: New test.
* g++.dg/pch/line-map-3.Hs: New test.
c/c++: Tweak warning for 'always_inline function might not be inlinable'
When you're not regularly exposed to this warning, it is
easy to be misled by its wording, believing that there's
something else in the function that stops it from being
inlined, something other than the lack of also being
*declared* inline. Also, clang does not warn.
It's just a warning: without the inline directive, there has
to be a secondary reason for the function to be inlined,
other than the always_inline attribute, a reason that may be
in effect despite the warning.
Whenever the text is quoted in inline-related bugzilla
entries, there seems to often have been an initial step of
confusion that has to be cleared, for example in PR55830.
A file in the powerpc-specific parts of the test-suite,
gcc.target/powerpc/vec-extract-v16qiu-v2.h, has a comment
and seems to be another example, and I testify as the
first-hand third "experience". The wording has been the
same since the warning was added.
Let's just tweak the wording, adding the cause, so that the
reason for the warning is clearer. This hopefully stops the
user from immediately asking "'Might'? Because why?" and
then going off looking at the function body - or grepping
the gcc source or documentation, or enter a bug-report
subsequently closed as resolved/invalid.
Since the message is only appended with additional
information, no test-case actually required adjustment.
I still changed them, so the message is covered.
gcc:
* cgraphunit.cc (process_function_and_variable_attributes): Tweak
the warning for an attribute-always_inline without inline declaration.
Nathaniel Shead [Fri, 26 Jan 2024 05:55:52 +0000 (16:55 +1100)]
c++: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580]
Currently the DECL_STRUCT_FUNCTION for a declaration is always
reconstructed from scratch. This causes issues though, as some fields
used by other parts of the compiler (in this case, specifically
'function_{start,end}_locus') are then not correctly initialised. This
patch makes sure that these fields are also read and written.
PR c++/113580
gcc/cp/ChangeLog:
* module.cc (struct post_process_data): Create.
(trees_in::post_decls): Use.
(trees_in::post_process): Return entire vector at once.
Change overload to take post_process_data instead of tree.
(trees_out::write_function_def): Write needed flags from
DECL_STRUCT_FUNCTION.
(trees_in::read_function_def): Read them and pass to
post_process.
(module_state::read_cluster): Write flags into cfun.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr113580_a.C: New test.
* g++.dg/modules/pr113580_b.C: New test.
Add RTL tests, for RV64 and RV32 where appropriate, corresponding to the
existing cset-sext.c tests. They have been produced from RTL code as at
the entry of the "ce1" pass for the respective cset-sext.c tests built
at -O3.
gcc/testsuite/
* gcc.target/riscv/cset-sext-rtl.c: New file.
* gcc.target/riscv/cset-sext-rtl32.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl.c: New file.
* gcc.target/riscv/cset-sext-sfb-rtl32.c: New file.
* gcc.target/riscv/cset-sext-thead-rtl.c: New file.
* gcc.target/riscv/cset-sext-ventana-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl.c: New file.
* gcc.target/riscv/cset-sext-zicond-rtl32.c: New file.
Add a pair of RTL tests, for RV64 and RV32 respectively, corresponding
to the existing pr105314.c test. They have been produced from RTL code
as at the entry of the "ce1" pass for pr105314.c compiled at -O3.
gcc/testsuite/
* gcc.target/riscv/pr105314-rtl.c: New file.
* gcc.target/riscv/pr105314-rtl32.c: New file.
RISC-V/testsuite: Also verify if-conversion runs for pr105314.c
Verify that if-conversion succeeded through noce_try_store_flag_mask, as
per PR rtl-optimization/105314, tightening the test case and making it
explicit.
gcc/testsuite/
* gcc.target/riscv/pr105314.c: Scan the RTL "ce1" pass too.
The optimization levels pr105314.c is iterated over are needlessly
overridden with "-O2", limiting the coverage of the test case to that
level, perhaps with additional options the original optimization level
has been supplied with. We could prevent the extra iterations other
than "-O2" from being run, but the transformation made by if-conversion
is also expected to happen at other optimization levels, so include them
all, and also make sure no reverse-condition branch appears in output,
moving the `dg-final' command to the bottom, as with most test cases.
gcc/testsuite/
* gcc.target/riscv/pr105314.c: Replace `dg-options' command with
`dg-skip-if'. Also reject "bne" with `dg-final'.
Robin Dapp [Wed, 24 Jan 2024 16:28:31 +0000 (17:28 +0100)]
genopinit: Split init_all_optabs [PR113575].
init_all_optabs initializes > 10000 patterns for riscv targets. This
leads to pathological situations in dataflow analysis (which can occur
with many adjacent stores).
To alleviate this this patch makes genopinit split the init_all_optabs
function into several init_optabs_xx functions that each initialize 1000
patterns.
With this change insn-opinit.cc's compilation time is reduced from 4+
minutes to 1:30 and memory consumption decreases from 1.2G to 630M.
gcc/ChangeLog:
PR other/113575
* genopinit.cc (main): Split init_all_optabs into functions
of 1000 patterns each.
Gaius Mulley [Fri, 26 Jan 2024 19:04:48 +0000 (19:04 +0000)]
modula2: detect string and pointer formal and actual parameter incompatibility
This patch improves the location accuracy of parameters and fixes bugs
in parameter checking in M2Check. It also corrects the location
of constant declarations.
gcc/m2/ChangeLog:
* gm2-compiler/M2Check.mod (dumpIndice): New procedure.
(dumpIndex): New procedure.
(dumptInfo): New procedure.
(buildError4): Add comment and pass formal and actual to
MetaError4. Improve text describing error.
(buildError2): Generate different error descriptions for
the three error kinds.
(checkConstMeta): Add block comment. Add more meta checks
and call doCheckPair to complete string const checking.
Add tinfo parameter.
(checkConstEquivalence): Add tinfo parameter.
* gm2-compiler/M2GCCDeclare.mod (PrintVerboseFromList):
Print the length of a const string.
* gm2-compiler/M2GenGCC.mod (CodeParam): Remove parameters
op1, op2 and op3.
(doParam): Add paramtok parameter. Use paramtok instead rather
than CurrentQuadToken.
(CodeParam): Rewrite.
* gm2-compiler/M2Quads.mod (CheckProcedureParameters):
Add comments explaining that const strings are not checked
in M2Quads.mod.
(FailParameter): Use MetaErrorT2 with tokpos rather than
MetaError2.
(doBuildBinaryOp): Assign OldPos and OperatorPos before the
IF block.
* gm2-compiler/SymbolTable.mod (PutConstString): Add call to
InitWhereDeclaredTok.
gcc/testsuite/ChangeLog:
* gm2/pim/fail/badpointer4.mod: New test.
* gm2/pim/fail/strconst.def: New test.
Richard Biener [Fri, 26 Jan 2024 11:57:10 +0000 (12:57 +0100)]
Avoid registering unsupported OMP offload devices
The following avoids registering unsupported GCN offload devices
when iterating over available ones. With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available. This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.
libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
Richard Biener [Fri, 26 Jan 2024 11:35:57 +0000 (12:35 +0100)]
Fix architecture support in OMP_OFFLOAD_init_device for gcn
The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1). This avoids
later asserts when we assume such devices are never actually
used.
libgomp/
* plugin/plugin-gcn.c
(EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add.
(isa_code): Return that instead of -1.
(GOMP_OFFLOAD_init_device): Adjust.
Tobias Burnus [Fri, 26 Jan 2024 14:11:09 +0000 (15:11 +0100)]
amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs
gcc/ChangeLog:
* config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to
TM_MULTILIB_CONFIG.
* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100.
* doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to
-march/-mtune.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
for gfx1030 and gfx1100.
* testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.
Andrew Stubbs [Wed, 24 Jan 2024 11:07:28 +0000 (11:07 +0000)]
amdgcn: additional gfx1030/gfx1100 support
This is enough to get gfx1030 and gfx1100 working; there are still some test
failures to investigate, and probably some tuning to do.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
* config/gcn/gcn-valu.md (all_convert): New iterator.
(<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New
define_expand, and rename the old one to ...
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this.
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ...
(extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this.
(*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New.
* config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
(gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
* config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3.
(<u>mulqihi3_scalar): Likewise.
Nathaniel Shead [Tue, 2 Jan 2024 22:27:06 +0000 (09:27 +1100)]
c++: Emit definitions of ODR-used static members imported from modules [PR112899]
Static data members marked 'inline' should be emitted in TUs where they
are ODR-used. We need to make sure that inlines imported from modules
are correctly added to the 'pending_statics' map so that they get
emitted if needed, otherwise the attached testcase fails to link.
PR c++/112899
gcc/cp/ChangeLog:
* cp-tree.h (note_variable_template_instantiation): Rename to...
(note_vague_linkage_variable): ...this.
* decl2.cc (note_variable_template_instantiation): Rename to...
(note_vague_linkage_variable): ...this.
* pt.cc (instantiate_decl): Rename usage of above function.
* module.cc (trees_in::read_var_def): Remember pending statics
that we stream in.
gcc/testsuite/ChangeLog:
* g++.dg/modules/init-4_a.C: New test.
* g++.dg/modules/init-4_b.C: New test.
* g++.dg/modules/init-6_a.H: New test.
* g++.dg/modules/init-6_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com
Richard Biener [Fri, 26 Jan 2024 08:29:22 +0000 (09:29 +0100)]
tree-optimization/113602 - datarefs of non-addressables
We can end up creating ADDR_EXPRs of non-addressable entities during
for example vectorization. The following plugs this in data-ref
analysis when that would create such invalid ADDR_EXPR as part of
analyzing the ref structure.
PR tree-optimization/113602
* tree-data-ref.cc (dr_analyze_innermost): Fail when
the base object isn't addressable.
Tobias Burnus [Fri, 26 Jan 2024 09:14:09 +0000 (10:14 +0100)]
gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC
Since LLVM commit 082f87c9d418 (Pull Req. #79038; will become LLVM 18)
"[AMDGPU] Change default AMDHSA Code Object version to 5"
the default - when no --amdhsa-code-object-version= is used - was bumped.
Using --amdhsa-code-object-version=5 is supported (with unknown limitations)
since LLVM 14. GCC required for proper support at least LLVM 13.0.1 such
that explicitly using COV5 is not possible.
Unfortunately, the COV number matters for debugging ("-g") as mkoffload.cc
extracts debugging data from the host's object file and writes into an
an AMD GPU object file it creates. And all object files linked together
must have the same ABI version.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): New; creates the
"--amdhsa-code-object-version=" argument.
(ASM_SPEC): Use it; replace previous version of it.
Jiahao Xu [Tue, 16 Jan 2024 02:32:31 +0000 (10:32 +0800)]
LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.
This modification will introduce the following FAIL items:
Li Wei [Wed, 24 Jan 2024 09:44:17 +0000 (17:44 +0800)]
LoongArch: Optimize implementation of single-precision floating-point approximate division.
We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).
The 2 loops in octfapg_universe can and will be vectorized now
after r14-333-g6d4b59a9356ac4 on targets that support multiplication
in the long type. But the testcase does not check vect_long_mult for
that, so this patch corrects that error and now the testcase passes correctly
on aarch64-linux-gnu (with and without SVE).
Built and tested on aarch64-linux-gnu (with and without SVE).
gcc/testsuite/ChangeLog:
PR testsuite/109705
* gcc.dg/vect/pr25413a.c: Expect 1 vectorized loops for !vect_long_mult
and 2 for vect_long_mult.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Thu, 25 Jan 2024 21:45:59 +0000 (13:45 -0800)]
aarch64: Fix/avoid undefinedness in aarch64_classify_index [PR100212]
The problem here is we don't check the return value of exact_log2
and always use that result as shifter. This fixes the issue by avoiding
the shift if the value was `-1` (which means the value was not exact a power of 2);
in this case we could either check if the values was equal to -1 or not equal to because
we then assign -1 to shift if the constant value was not equal. I chose `!=` as
it seemed to be more obvious of what the code is doing.
Committed as obvious after a build/test for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/100212
* config/aarch64/aarch64.cc (aarch64_classify_index): Avoid
undefined shift after the call to exact_log2.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Thu, 25 Jan 2024 23:08:36 +0000 (00:08 +0100)]
c++: Fix up build_m_component_ref [PR113599]
The following testcase reduced from GDB is miscompiled starting with
r14-5503 PR112427 change.
The problem is in the build_m_component_ref hunk, which changed
- datum = fold_build_pointer_plus (fold_convert (ptype, datum), component);
+ datum = cp_convert (ptype, datum, complain);
+ if (!processing_template_decl)
+ datum = build2 (POINTER_PLUS_EXPR, ptype,
+ datum, convert_to_ptrofftype (component));
+ datum = cp_fully_fold (datum);
Component is e, (sizetype) e is 16, offset of c inside of C.
ptype is A *, pointer to type of C::c and datum is &d.
Now, previously the above created ((A *) &d) p+ (sizetype) e which is correct,
but in the new code cp_convert sees that C has A as base class and
instead of returning (A *) &d, it returns &d.D.2800 where D.2800 is
the FIELD_DECL for the A base at offset 8 into C.
So, instead of computing ((A *) &d) p+ (sizetype) e it computes
&d.D.2800 p+ (sizetype) e, which is ((A *) &d) p+ 24.
The following patch fixes it by using convert instead of cp_convert which
eventually calls build_nop (ptype, datum).
2024-01-26 Jakub Jelinek <jakub@redhat.com>
PR c++/113599
* typeck2.cc (build_m_component_ref): Use convert instead of
cp_convert for pointer conversion.
Jason Merrill [Thu, 25 Jan 2024 17:02:07 +0000 (12:02 -0500)]
c++: array of PMF [PR113598]
Here AGGREGATE_TYPE_P includes pointers to member functions, which is not
what we want. Instead we should use class||array, as elsewhere in the
function.
Jason Merrill [Thu, 25 Jan 2024 19:45:35 +0000 (14:45 -0500)]
c++: co_await and initializer_list [PR109227]
Here we end up with an initializer_list of 'aa', a type with a non-trivial
destructor, and need to destroy it. The code called
build_special_member_call for cleanups, but that doesn't work for arrays, so
use cxx_maybe_build_cleanup instead. Let's go ahead and do that
everywhere that has been calling the destructor directly.
PR c++/109227
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Use cxx_maybe_build_cleanup.
(build_actor_fn, process_conditional, maybe_promote_temps)
(morph_fn_to_coro): Likewise.
(expand_one_await_expression): Use build_cleanup.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/co-await-initlist2.C: New test.
Andrew Pinski [Thu, 25 Jan 2024 16:30:36 +0000 (08:30 -0800)]
aarch64: Fix undefinedness while testing the J constraint [PR100204]
The J constraint can invoke undefined behavior due to it taking the
negative of the ival if ival was HWI_MIN. The fix is simple as casting
to `unsigned HOST_WIDE_INT` before doing the negative of it. This
does that.
Committed as obvious after build/test for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/100204
* config/aarch64/constraints.md (J): Cast to `unsigned HOST_WIDE_INT`
before taking the negative of it.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
[PR113526][LRA]: Fixing asm-flag-1.c failure on ARM
My recent patch for PR113356 results in failure asm-flag-1.c test on arm.
After the patch LRA treats asm operand pseudos as general regs. There
are too many such operands and LRA can not assign hard regs to all
operand pseudos. Actually we should not assign hard regs to the
operand pseudo at all. The following patch fixes this.
gcc/ChangeLog:
PR target/113526
* lra-constraints.cc (curr_insn_transform): Change class even for
spilled pseudo successfully matched with with NO_REGS.
Refactor the compare zero and jump pattern and use it to fix the issue.
gcc/ChangeLog:
PR target/112987
* config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): New.
(aarch64_expand_epilogue): Use the new function.
(aarch64_split_compare_and_swap): Likewise.
(aarch64_split_atomic_op): Likewise.
Gaius Mulley [Thu, 25 Jan 2024 16:29:02 +0000 (16:29 +0000)]
modula2: add project regression test and badpointer tests
This patch adds four modula-2 testcases to the regression testsuite.
The project example stresses INC/DEC and range checking and the bad
pointer stress attempting to pass a string acual parameter to a
procedure with a pointer formal parameter.
gcc/testsuite/ChangeLog:
* gm2/pim/fail/badpointer.mod: New test.
* gm2/pim/fail/badpointer2.mod: New test.
* gm2/pim/fail/badpointer3.mod: New test.
* gm2/projects/pim/run/pass/pegfive/pegfive.mod: New test.
* gm2/projects/pim/run/pass/pegfive/projects-pim-run-pass-pegfive.exp: New test.
Robin Dapp [Mon, 15 Jan 2024 15:23:30 +0000 (16:23 +0100)]
fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
Found in PR112971 this patch adds folding support for bitwise operations
of const duplicate zero/one vectors with stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not be folded to {0, 0, 0, ...}.
gcc/ChangeLog:
PR middle-end/112971
* fold-const.cc (simplify_const_binop): New function for binop
simplification of two constant vectors when element-wise
handling is not necessary.
(const_binop): Call new function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
Robin Dapp [Tue, 23 Jan 2024 11:44:20 +0000 (12:44 +0100)]
testsuite/vect: Add target checks to refined patterns.
On Solaris/SPARC several vector tests appeared to be regressing. They
were never vectorized but the checks before r14-3612-ge40edf64995769
would match regardless if a loop was actually vectorized or not.
The refined checks only match a successful vectorization attempt
but are run unconditionally. This patch adds target checks to them.