git.ipfire.org Git - thirdparty/gcc.git/log

Add block parameter to begin_block debug hook

Add a parameter to the begin_block debug hook that is a pointer to the
tree_node of the block in question. CodeView needs this as it records
line numbers of inlined functions in a different manner, so we need to
be able to tell if the block is actually the start of an inlined
function.

gcc/
* debug.cc (do_nothing_debug_hooks): Change begin_block
function pointer.
(debug_nothing_int_int_tree): New function.
* debug.h (struct gcc_debug_hooks): Add tree parameter to begin_block.
(debug_nothing_int_int_tree): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Add tree parameter.
(dwarf2_lineno_debug_hooks): Use new dummy function for begin_block.
* final.cc (final_scan_insn_1): Pass insn block through to
debug_hooks->begin_block.
* vmsdbgout.cc (vmsdbgout_begin_block): Add tree parameter.

AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.

When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable. When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.

strlen: Handle vector CONSTRUCTORs [PR117057]

The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
count_nonzero_bytes, including handling them if they have some elements
non-constant.
If there are still some constant elements before it (in the range queried),
we derive info at least from those bytes and consider the rest as unknown.

The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
considers byte_size 0 as unknown size, determine yourself, so if offset
is equal to exp size, there are 0 bytes to consider (so nothing useful
to determine), but using byte_size 0 would mean use any size.
Similarly, native_encode_expr uses int type for offset (and size), so
padding it offset larger than INT_MAX could be silent miscompilation.

I've guarded the test to just a couple of targets known to handle it,
because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
-mno-vsx, even -m32).

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/117057
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
when byte_size is equal to offset or nchars.  Punt if offset is bigger
than INT_MAX.  Handle vector CONSTRUCTOR with some elements constant,
possibly followed by non-constant.

* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
specific scan-tree-dump-times directive.
* gcc.dg/strlenopt-96.c: New test.

openmp: Add crtoffloadtableS.o and use it [PR117851]

Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
  __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_funcs_end,
  &__offload_var_table, &__offload_vars_end,
  &__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable.  If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false.  Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.

LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector

For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.

LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.

Add '-fdump-tree-optimized' to this testcases.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/lasx-andn-iorn.c:
Add '-fdump-tree-optimized'.
* gcc.target/loongarch/lsx-andn-iorn.c:
Likewise.

LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.

In r15-5327, change the default language version for C compilation from
-std=gnu17 to -std=gnu23.

ISO C99 and C11 allow ceil, floor, round and trunc, and their float and
long double variants, to raise the “inexact” exception,
but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as
integrated into ISO C23, does not allow these functions to do so.

So add '-ffp-int-builtin-inexact' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-frint-scalar.c: Add
'-ffp-int-builtin-inexact'.

c: Set attributes for fields when forming a composite type [PR117806]

We need to call decl_attributes when creating the fields for a composite
type.

PR c/117806

gcc/c/ChangeLog:
* c-typeck.cc (composite_type_internal): Call decl_attributes.

gcc/testsuite/ChangeLog:
* gcc.dg/pr117806.c: New test.

gimplefe: Error recovery for invalid declarations [PR117749]

c_parser_declarator can return null if there was an error,
but c_parser_gimple_declaration was not ready for that.
This fixes that oversight so we don't get an ICE after the error.

Bootstrapped and tested on x86_64-linux-gnu.

PR c/117749

gcc/c/ChangeLog:

* gimple-parser.cc (c_parser_gimple_declaration): Check
declarator to be non-null.

gcc/testsuite/ChangeLog:

* gcc.dg/gimplefe-55.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]

This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.

Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
  enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
       mode = GET_MODE (XEXP (x, 0));
       if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
-       mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+       mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant () - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode.  We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.

The rest of the patch are cleanups.  For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.

More importantly, the function does
  scalar_int_mode smode;
  if (!is_a <scalar_int_mode> (mode, &smode)
      || GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
    return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.

Plus some formatting issues.

What I've left around is
      if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
          || !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
        return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise.  Use HOST_WIDE_INT_1U instead of
1ULL.  Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant ().  Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.

c++: Implement C++26 P3176R1 - The Oxford variadic comma

While we are already in stage3, I wonder if implementing this small paper
wouldn't be useful even for GCC 15, so that we have in the GCC world one
extra year of deprecation of variadic ellipsis without preceding comma.

The paper just deprecates something, I'd hope most of the C++ code in the
wild when it uses variadic functions at all uses the comma before the
ellipsis.

2024-11-30 Jakub Jelinek <jakub@redhat.com>

gcc/c-family/
* c.opt (Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma.
(cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.

Daily bump.

Rename "libdiagnostics" to "libgdiagnostics"

"libdiagnostics" clashes with an existing soname in Debian, as
per:
https://gcc.gnu.org/pipermail/gcc/2024-November/245175.html

Rename it to "libgdiagnostics" for uniqueness.

I am being deliberately vague about what the "g" stands for:
it could be "gnu", "gcc", or "gpl-licensed" as the reader desires.

ChangeLog:
* configure.ac: Rename "libdiagnostics" to "libgdiagnostics".
* configure: Regenerate.

gcc/ChangeLog:
* Makefile.in: Rename "libdiagnostics" to "libgdiagnostics".
* configure.ac: Likewise.
* configure: Regenerate.
* doc/install.texi: Rename "libdiagnostics" to
"libgdiagnostics".
* doc/libdiagnostics/*: Rename to doc/libgdiagnostics, renaming
"libdiagnostics" to "libgdiagnostics" throughout.
* libdiagnostics++.h: Rename to...
* libgdiagnostics++.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.cc: Rename to...
* libgdiagnostics.cc: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.h: Rename to...
* libgdiagnostics.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.map: Rename to...
* libgdiagnostics.map: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libsarifreplay.cc: Update for renaming of "libdiagnostics"
to "libgdiagnostics".
* libsarifreplay.h: Likewise.
* sarif-replay.cc: Likewise.

gcc/testsuite/ChangeLog:
* libdiagnostics.dg/*: Rename to libgdiagnostics.dg, renaming
"libdiagnostics" to "libgdiagnostics" throughout.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

AVR: Skip the gcc.c-torture/execute/memcpy-a*.c tests.

Skipping these tests on avr since they come up with "memory full",
plus they consume a multiple of the time the rest of the testsuite
takes.

gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c
* gcc.c-torture/execute/memcpy-a2.c
* gcc.c-torture/execute/memcpy-a4.c
* gcc.c-torture/execute/memcpy-a8.c

libbacktrace: use WIN32_LEAN_AND_MEAN, not WIN32_MEAN_AND_LEAN

Patch from awmorgan.

* fileline.c: Use WIN32_LEAN_AND_MEAN, not WIN32_MEAN_AND_LEAN.
* pecoff.c: Likewise.

compiler: increase buffer size to avoid warning

GCC has a new -Wformat-truncation warning that triggers on this code:

../../gcc/go/gofrontend/go-encode-id.cc: In function 'std::string go_encode_id(const std::string&)':
../../gcc/go/gofrontend/go-encode-id.cc:176:48: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 6 [-Werror=format-truncation=]
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                                                ^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:45: note: directive argument in the range [128, 4294967295]
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                                             ^~~~~~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:27: note: 'snprintf' output between 5 and 11 bytes into a destination of size 8
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                   ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The code is safe, because the value of c is known to be >= 0 && <= 0xff.
But it's difficult for the compiler to know that.
Bump the buffer size to avoid the warning.

Fixes PR go/117833

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/632455

AVR: Fix some coding rule nits and typos.

gcc/
* config/avr/avr-c.cc: Fix some coding rule nits and typos.
* config/avr/avr-passes.cc: Same
* config/avr/avr.h: Same.
* config/avr/avr.cc: Same.
(avr_function_arg_regno_p, avr_hard_regno_rename_ok)
(avr_epilogue_uses, extra_constraint_Q): Return bool instead of int.
* config/avr/avr-protos.h (avr_function_arg_regno_p)
(avr_hard_regno_rename_ok, avr_epilogue_uses)
(extra_constraint_Q): Return bool instead of int.

aarch64: Add attributes to the data intrinsics.

All of the data intrinsics don't read/write memory nor they are fp related.
So adding the attributes will improve the code generation slightly.

Built and tested for aarch64-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_data_intrinsics): Call
aarch64_get_attributes and update calls to aarch64_general_add_builtin.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: add attributes to the prefetch_builtins

This adds the attributes associated with prefetch to the bultins.
Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the attributes.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
Updete call to aarch64_general_add_builtin in AARCH64_INIT_PREFETCH_BUILTIN.
Add new variable prefetch_attrs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Fix up flags for vget_low_*, vget_high_* and vreinterpret intrinsics

These 3 intrinsics will not raise an fp exception, or read FPCR. These intrinsics,
will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set to
be const expressions too.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
FLAG_NONE instead of FLAG_AUTO_FP.
(VGET_LOW_BUILTIN): Likewise.
(VGET_HIGH_BUILTIN): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Mark __builtin_aarch64_im_lane_boundsi as leaf and nothrow [PR117665]

__builtin_aarch64_im_lane_boundsi is known not to throw or call back into another
function since it will either folded into an NOP or will produce a compiler error.

This fixes the ICE by fixing the missed optimization. It does not fix the underlying
issue with fold_marked_statements; which I filed as PR 117668.

Built and tested for aarch64-linux-gnu.

PR target/117665

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_builtin_functions):
Pass nothrow and leaf as attributes to aarch64_general_add_builtin for
__builtin_aarch64_im_lane_boundsi.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/lane-bound-1.C: New test.
* gcc.target/aarch64/lane-bound-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PR117770][LRA]: Check hard regs corresponding insn operands for hard reg clobbers

When LRA processes early clobbered hard regs explicitly present in the
insn description, it checks that the hard reg is also used as input.
If the hard reg is not an input also, it is marked as dying.  For the
check LRA processed only input hard reg also explicitly present in the
insn description.  For given PR, the hard reg is used as input as the
operand and is not present explicitly in the insn description and
therefore LRA marked the hard reg as dying.  This results in wrong
allocation and wrong code.  The patch solves the problem by processing
hard regs used as the insn operand.

gcc/ChangeLog:

PR rtl-optimization/117770
* lra-lives.cc: Include ira-int.h.
(process_bb_lives): Check hard regs corresponding insn operands
for dying hard wired reg clobbers.

AVR: target/117681 - Set UNWIND_WORD_MODE to Pmode.

This patch fixes a build warning for libgcc/unwind-sjlj.c
which used word_mode for _Unwind_Word but should use Pmode.

PR target/117681
gcc/
* config/avr/avr.cc (TARGET_UNWIND_WORD_MODE): Define to...
(avr_unwind_word_mode): ...this new static function.

AVR: target/117726 - Better optimize shifts.

This patch splits 2-byte and 3-byte shifts after reload into
a 3-operand byte shift and a residual 2-operand shift.

The "2op" shift insn alternatives are not needed and removed because
all shift insn already have a "r,0,n" alternative that does the job.

PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift):
Also handle 2-byte and 3-byte shifts.
(avr_split_shift4, avr_split_shift3, avr_split_shift2): New
local helper functions.
(avr_split_shift): Use them.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Adjust comments.
* config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3)
(avr_out_lshrpsi3): Support offset 15.
(ashrhi3_out): Support offset 7 as 3-op.
(ashrsi3_out): Support offset 15.
(avr_rtx_costs_1): Adjust shift costs.
* config/avr/avr.md (2op): Remove attribute value and all such insn
alternatives.
(ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l.
(ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a.
(lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r.
(*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l.
(*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r.
(*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r.
(ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative.
(ashrsi3, *ashrsi3, *ashrsi3_const): Same.
(lshrsi3, *lshrsi3, *lshrsi3_const): Same.
(constr_split_suffix): Code attr morphed from constr_split_shift4.
* config/avr/constraints.md (C2a, C2r, C2l)
(C3a, C3r, C3l): New constraints.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.

aarch64: Fix build failure due to missing header

Including the "arm_acle.h" header in aarch64-unwind.h requires
stdint.h to be present and it may not be available during the
first stage of cross-compilation of GCC.

When cross-building GCC for the aarch64-none-linux-gnu target
(on any supporting host) using the 3-stage bootstrap build
process when we build native compiler from source, libgcc fails
to compile due to missing header that has not been installed yet.

This could be worked around but it's better to fix the issue.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.

arm, mve: Detect uses of vctp_vpr_generated inside subregs

Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that. Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

PR target/117814
* config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.

arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning

This fixes a testism introduced by the warning produced with the -std=c23
default. The testcase is a reduced piece of code meant to trigger an ICE, so
there's little value in trying to change the code itself.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-loop-form.c: Add -std=c99 to avoid warning
message.

arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.

[PATCH v7 03/12] RISC-V: Add CRC expander to generate faster CRC.

If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation.  Otherwise, if the target is ZBKB, generates table-based
CRC, but for reversing inputs and the output uses bswap and brev8
instructions.  Add new tests to check CRC generation for ZBC, ZBKC and
ZBKB targets.

gcc/

* expr.cc (gf2n_poly_long_div_quotient): New function.
* expr.h (gf2n_poly_long_div_quotient):  New function declaration.
* hwint.cc (reflect_hwi): New function.
* hwint.h (reflect_hwi): New function declaration.
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New
expander for reversed CRC.
(crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
* config/riscv/iterators.md (SUBX1, ANYI1): New iterators.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
New function declaration.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev): New
function.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

* doc/sourcebuild.texi: Document new target selectors.

gcc/testsuite
* lib/target-supports.exp (check_effective_target_riscv_zbc): New
target supports predicate.
(check_effective_target_riscv_zbkb): Likewise.
(check_effective_target_riscv_zbkc): Likewise.
(check_effective_target_zbc_ok): Likewise.
(check_effective_target_zbkb_ok): Likewise.
(check_effective_target_zbkc_ok): Likewise.
(riscv_get_arch): Add zbkb and zbkc support.

* gcc.target/riscv/crc-builtin-zbc32.c: New file.
* gcc.target/riscv/crc-builtin-zbc64.c: Likewise.

    Co-author: Jeff Law  <jlaw@ventanamicro.com>

RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod extensions.

This commit adds testcases for Xsfvqmaccqoq/dod.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: New test.

RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

This commit adds intrinsics support for Xsfvqmaccqoq/dod.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/ChangeLog:

* config.gcc: Add new SiFive *.o files.
* config/riscv/generic-vector-ooo.md: New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-vector-builtins-shapes.cc (struct sf_vqmacc_def): New function.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_QMACC_OPS): New macros type.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_QMACC_OPS): New builtins def.
(DEF_RVV_TYPE_INDEX): Ditto.
(DEF_RVV_FUNCTION): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): New types def.
(4x8x4): New op type.
(2x8x2): Ditto.
(quad_emul_vector): New base type.
(quad_emul_signed_vector): Ditto.
(quad_emul_unsigned_vector): Ditto.
(quad_fixed_vector): Ditto.
(quad_fixed_signed_vector): Ditto.
(quad_fixed_unsigned_vector): Ditto.
(quad_lmul1_vector): Ditto.
(quad_lmul1_signed_vector): Ditto.
(quad_lmul1_unsigned_vector): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New extensions.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/t-riscv: Add include for SiFive files.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md: New include for SiFive file.
* config/riscv/sifive-vector-builtins-bases.cc: New file.
* config/riscv/sifive-vector-builtins-bases.h: New file.
* config/riscv/sifive-vector-builtins-functions.def: New file.
* config/riscv/sifive-vector.md: New file.

c: Correct type compatibility for bit-fields [PR117828]

Add missing test for consistency of bit-fields when comparing tagged
types for compatibility.

PR c/117828

gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Add check.

gcc/testsuite/ChangeLog:
* gcc.dg/c23-tag-bitfields-1.c: New test.
* gcc.dg/pr117828.c: New test.

AArch64: Suppress default options when march or mcpu used is not affected by it.

This patch makes it so that when you use any of the Cortex-A53 errata
workarounds but have specified an -march or -mcpu we know is not affected by it
that we suppress the errata workaround.

This is a driver only patch as the linker invocation needs to be changed as
well.  The linker and cc SPECs are different because for the linker we didn't
seem to add an inversion flag for the option.  That said, it's also not possible
to configure the linker with it on by default.  So not passing the flag is
sufficient to turn it off.

For the compilers however we have an inversion flag using -mno-, which is needed
to disable the workarounds when the compiler has been configured with it by
default.

In case it's unclear how the patch does what it does (it took me a while to
figure out the syntax):

  * Early matching will replace any -march=native or -mcpu=native with their
    expanded forms and erases the native arguments from the buffer.
  * Due to the above if we ensure we handle the new code after this erasure then
    we only have to handle the expanded form.
  * The expanded form needs to handle -march=<arch>+extensions and
    -mcpu=<cpu>+extensions and so we can't use normal string matching but
    instead use strstr with a custom driver function that's common between
    native and non-native builds.
  * For the compilers we output -mno-<workaround> and for the linker we just
  erase the --fix-<workaround> option.
  * The extra internal matching, e.g. the duplicate match of mcpu inside:
  mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob
  using %* because the outer match would otherwise reset at the %{.  The reason
  for the outer glob at all is to skip the block early if no matches are found.

The workaround has the effect of suppressing certain inlining and multiply-add
formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
cores.  This patch is needed because most distros configure GCC with the
workaround enabled by default.

Expected output:

> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1

> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1

gcc/ChangeLog:

* config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
CA53_ERR_843419_COMPILE_SPEC): New.
(CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
* config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
AARCH64_ERRATA_COMPILE_SPEC.
* config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* common/config/aarch64/aarch64-common.cc
(is_host_cpu_not_armv8_base): New.
* config/aarch64/driver-aarch64.cc: Remove extra newline
* config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base.
(EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base.
* doc/invoke.texi: Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_30: New test.
* gcc.target/aarch64/cpunative/info_31: New test.
* gcc.target/aarch64/cpunative/info_32: New test.
* gcc.target/aarch64/cpunative/info_33: New test.
* gcc.target/aarch64/cpunative/native_cpu_30.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_31.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_32.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_33.c: New test.
* gcc.target/aarch64/erratas_opt_0.c: New test.
* gcc.target/aarch64/erratas_opt_1.c: New test.
* gcc.target/aarch64/erratas_opt_10.c: New test.
* gcc.target/aarch64/erratas_opt_11.c: New test.
* gcc.target/aarch64/erratas_opt_12.c: New test.
* gcc.target/aarch64/erratas_opt_13.c: New test.
* gcc.target/aarch64/erratas_opt_14.c: New test.
* gcc.target/aarch64/erratas_opt_15.c: New test.
* gcc.target/aarch64/erratas_opt_2.c: New test.
* gcc.target/aarch64/erratas_opt_3.c: New test.
* gcc.target/aarch64/erratas_opt_4.c: New test.
* gcc.target/aarch64/erratas_opt_5.c: New test.
* gcc.target/aarch64/erratas_opt_6.c: New test.
* gcc.target/aarch64/erratas_opt_7.c: New test.
* gcc.target/aarch64/erratas_opt_8.c: New test.
* gcc.target/aarch64/erratas_opt_9.c: New test.

aarch64: Add ISA requirements to some SVE/SME md comments

The SVE and SME md files are divided into sections, with each
section often starting with a comment that lists the associated
mnemonics. These lists usually include the base architecture
requirement in parentheses, if the base requirement is greater
than the baseline for the file. This patch tries to be more
consistent about when we do that for the recently added SVE2p1
and SME2p1 extensions.

gcc/
* config/aarch64/aarch64-sme.md: In the section comments, add the
architecture requirements alongside some mnemonics.
* config/aarch64/aarch64-sve2.md: Likewise.

aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm

The first two are available under a combination of the FP8DOT4 and SVE2 features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8dot4, ssve-fp8dot4): Add new extensions.
(fp8dot2, ssve-fp8dot2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8.
(svdotprod_lane_impl): Likewise.
(svdot_lane): Provide an unspec for fp8 types.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_mfloat8_def): Add new class.
(ternary_mfloat8): Add new shape.
(ternary_mfloat8_lane_group_selection_def): Add new class.
(ternary_mfloat8_lane_group_selection): Add new shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal
with the combination of features providing support for 32 and 16 bit
floating point.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Add new.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/aarch64.h:
(TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines.
(TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise.
* config/aarch64/iterators.md
(UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs.
* doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2
extensions.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new.
gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise.
* lib/target-supports.exp: Add dg-require-effective-target support for
aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok,
aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.

aarch64: add SVE2 FP8 multiply accumulate intrinsics

This patch adds support for the following intrinsics:
- svmlalb[_f16_mf8]_fpm
- svmlalb[_n_f16_mf8]_fpm
- svmlalt[_f16_mf8]_fpm
- svmlalt[_n_f16_mf8]_fpm
- svmlalb_lane[_f16_mf8]_fpm
- svmlalt_lane[_f16_mf8]_fpm
- svmlallbb[_f32_mf8]_fpm
- svmlallbb[_n_f32_mf8]_fpm
- svmlallbt[_f32_mf8]_fpm
- svmlallbt[_n_f32_mf8]_fpm
- svmlalltb[_f32_mf8]_fpm
- svmlalltb[_n_f32_mf8]_fpm
- svmlalltt[_f32_mf8]_fpm
- svmlalltt[_n_f32_mf8]_fpm
- svmlallbb_lane[_f32_mf8]_fpm
- svmlallbt_lane[_f32_mf8]_fpm
- svmlalltb_lane[_f32_mf8]_fpm
- svmlalltt_lane[_f32_mf8]_fpm

These are available under a combination of the FP8FMA and SVE2 features.
Alternatively under the SSVE_FP8FMA feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8fma, ssve-fp8fma): Add new options.
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Add unspec_for_mfp8.
(unspec_for): Return unspec_for_mfp8 on fpm-using cases.
(sme_1mode_function): Fix call to parent ctor.
(sme_2mode_function_t): Likewise.
(unspec_based_mla_function, unspec_based_mla_lane_function): Handle
fpm-using cases.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Treat M as TYPE_SUFFIX_mf8
(ternary_mfloat8_lane_def): Add new class.
(ternary_mfloat8_opt_n_def): Likewise.
(ternary_mfloat8_lane): Add new shape.
(ternary_mfloat8_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8_lane, ternary_mfloat8_opt_n): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svmlalb_lane, svmlalb, svmlalt_lane, svmlalt): Update definitions
with mfloat8_t unspec in ctor.
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt, svmlal_impl): Add new FUNCTIONs.
(svqrshr, svqrshrn, svqrshru, svqrshrun): Update definitions with
nop mfloat8 unspec in ctor.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svmlalb, svmlalt, svmlalb_lane, svmlalt_lane, svmlallbb, svmlallbt,
svmlalltb, svmlalltt, svmlalltt_lane, svmlallbb_lane, svmlallbt_lane,
svmlalltb_lane): Add new DEF_SVE_FUNCTION_GS_FPMs.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_h_float_mf8, TYPES_s_float_mf8): Add new types.
(h_float_mf8, s_float_mf8): Add new SVE_TYPES_ARRAY.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx8hf><mode>): Add new.
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx4sf><mode>): Add new.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx8hf><mode>): Likewise.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx4sf><mode>): Likewise.
* config/aarch64/aarch64.h
(TARGET_FP8FMA, TARGET_SSVE_FP8FMA): Likewise.
* config/aarch64/iterators.md
(VNx8HF_ONLY): Add new.
(UNSPEC_FMLALB_FP8, UNSPEC_FMLALLBB_FP8, UNSPEC_FMLALLBT_FP8,
UNSPEC_FMLALLTB_FP8, UNSPEC_FMLALLTT_FP8, UNSPEC_FMLALT_FP8): Likewise.
(SVE2_FP8_TERNARY_VNX8HF, SVE2_FP8_TERNARY_VNX4SF): Likewise.
(SVE2_FP8_TERNARY_LANE_VNX8HF, SVE2_FP8_TERNARY_LANE_VNX4SF): Likewise.
(sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Likewise.
* doc/invoke.texi: Document fp8fma and sve-fp8fma extensions.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z_REV, TEST_DUAL_LANE_REG, TEST_DUAL_ZD) Add fpm0 argument.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_opt_n_1.c: Add
new shape test.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Add new test.
* gcc.target/aarch64/sve2/acle/asm/mlalb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_mf8.c: Likewise.
* lib/target-supports.exp: Add check_effective_target for fp8fma and
ssve-fp8fma

aarch64: add svcvt* FP8 intrinsics

This patch adds the following intrinsics:
- svcvt1_bf16[_mf8]_fpm
- svcvt1_f16[_mf8]_fpm
- svcvt2_bf16[_mf8]_fpm
- svcvt2_f16[_mf8]_fpm
- svcvtlt1_bf16[_mf8]_fpm
- svcvtlt1_f16[_mf8]_fpm
- svcvtlt2_bf16[_mf8]_fpm
- svcvtlt2_f16[_mf8]_fpm
- svcvtn_mf8[_f16_x2]_fpm (unpredicated)
- svcvtnb_mf8[_f32_x2]_fpm
- svcvtnt_mf8[_f32_x2]_fpm

The underlying instructions are only available when SVE2 is enabled and the PE
is not in streaming SVE mode. They are also available when SME2 is enabled and
the PE is in streaming SVE mode.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_signature): Add an fpm_t (uint64_t) argument to functions that
set the fpm register.
(unary_convertxn_narrowt_def): New class.
(unary_convertxn_narrowt): New shape.
(unary_convertxn_narrow_def): New class.
(unary_convertxn_narrow): New shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(unary_convertxn_narrowt): Declare.
(unary_convertxn_narrow): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svcvt_fp8_impl): New class.
(svcvtn_impl): Handle fp8 cases.
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new FUNCTION.
(svcvtnb): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new DEF_SVE_FUNCTION_GS_FPM.
(svcvtn): Likewise.
(svcvtnb, svcvtnt): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svcvt1, svcvt2, svcvtlt1, svcvtlt2, svcvtnb, svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_cvt_mf8, TYPES_cvtn_mf8, TYPES_cvtnx_mf8): Add new types arrays.
(function_builder::get_name): Append _fpm to functions that set fpmr.
(function_resolver::check_gp_argument): Deal with the fpm_t argument.
(function_expander::expand): Set the fpm register before
calling the insn if the function warrants it.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_fp8_cvt): Add new.
(@aarch64_sve2_fp8_cvtn): Likewise.
(@aarch64_sve2_fp8_cvtnb): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64.h (TARGET_SSVE_FP8): Add new.
* config/aarch64/iterators.md
(VNx8SF_ONLY, SVE_FULL_HFx2): New mode iterators.
(UNSPEC_F1CVT, UNSPEC_F1CVTLT, UNSPEC_F2CVT, UNSPEC_F2CVTLT): Add new.
(UNSPEC_FCVTNB, UNSPEC_FCVTNT): Likewise.
(UNSPEC_FP8FCVTN): Likewise.
(FP8CVT_UNS, fp8_cvt_uns_op): Likewise.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z): Add fpm0 argument
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Add new tests.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnt_mf8.c: Likewise.
* lib/target-supports.exp: Add aarch64_asm_fp8_ok check.

aarch64: specify fpm mode in function instances and groups

Some intrinsics require setting the fpm register before calling the
specific asm opcode required.
In order to simplify review, this patch:
- adds the fpm_mode_index attribute to function_group_info and
function_instance objects
- updates existing initialisations and call sites.
- updates equality and hash operations

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svdiv_impl): Specify FPM_unused when folding.
(svmul_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(build_one): Use the group fpm_mode when creating function instances.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svaba_impl, svqrshl_impl, svqshl_impl,svrshl_impl, svsra_impl):
Specify FPM_unused when folding.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Set
fpm_mode on all elements.
(neon_sve_function_groups, sme_function_groups): Likewise.
(function_instance::hash): Include fpm_mode in hash.
(function_builder::add_overloaded_functions): Use the group fpm mode.
(function_resolver::lookup_form): Use the function instance fpm_mode
when looking up a function.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_FUNCTION_GS_FPM): add define.
(DEF_SVE_FUNCTION_GS): redefine against DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins.h (fpm_mode_index): New.
(function_group_info): Add fpm_mode.
(function_instance): Likewise.
(function_instance::operator==): Handle fpm_mode.

aarch64: Add basic svmfloat8_t support to arm_sve.h

This patch adds support for the fp8 related vectors to arm_sve.h. It also
adds support for functions that just treat mfloat8_t as a bag of 8 bits
(reinterpret casts, vector manipulation, loads, stores, element selection,
vector tuples, table lookups, sve<->simd bridge); these functions are
available for fp8 whenever they're available for other 8-bit types.
Arithmetic operations, bit manipulation, conversions are notably absent.

The generated asm is mostly consistent with the _u8 equivalents and this can be
used to validate tests, except where immediates are used. These cannot be
expressed for mf8 and thus we resort to the use of function arguments found in
registers w(0-9).

gcc/
* config/aarch64/aarch64-sve-builtins.cc (TYPES_b_data): Add mf8.
(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
* config/aarch64/aarch64-sve-builtins.def (svmfloat8_t): New type.
(mf8): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_mfloat): New
type_class_index.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
of svmfloat8_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
__SVMfloat8_t.
* gcc.target/aarch64/sve/acle/asm/clasta_mf8.c: New test.
* gcc.target/aarch64/sve/acle/asm/clastb_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create2_1.c (create2_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ext_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/insr_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lasta_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lastb_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnt1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/len_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c
(reinterpret_bf16_mf8_tied1, reinterpret_bf16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
(reinterpret_f16_mf8_tied1, reinterpret_f16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
(reinterpret_f32_mf8_tied1, reinterpret_f32_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
(reinterpret_f64_mf8_tied1, reinterpret_f64_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
(reinterpret_s16_mf8_tied1, reinterpret_s16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
(reinterpret_s32_mf8_tied1, reinterpret_s32_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
(reinterpret_s64_mf8_tied1, reinterpret_s64_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
(reinterpret_s8_mf8_tied1, reinterpret_s8_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
(reinterpret_u16_mf8_tied1, reinterpret_u16_mf8_untied)
(reinterpret_u16_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
(reinterpret_u32_mf8_tied1, reinterpret_u32_mf8_untied)
(reinterpret_u32_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
(reinterpret_u64_mf8_tied1, reinterpret_u64_mf8_untied)
(reinterpret_u64_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
(reinterpret_u8_mf8_tied1, reinterpret_u8_mf8_untied)
(reinterpret_u8_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/rev_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sel_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/splice_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/stnt1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/undef2_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef3_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef4_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_mf8, ret_mf8x2)
(ret_mf8x3, ret_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (mfloat8x32_t): New
typedef.
(mfloat8_callee, mfloat8_caller): New tests.
* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (mfloat8x32_t): New
typedef.
(mfloat8_callee, mfloat8_caller): New tests.
* gcc.target/aarch64/sve/pcs/return_4_128.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_256.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_512.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_1024.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_2048.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_128.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_256.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_512.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_1024.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_2048.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_128.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_256.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_512.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_1024.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_2048.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_7.c (callee_mf8): New tests.
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/return_8.c (callee_mf8): Likewise
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/return_9.c (callee_mf8): Likewise
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: New tests
* gcc.target/aarch64/sve2/acle/asm/tbl2_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/tbx_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilerw_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilewr_mf8.c: Likewise.

tree-optimization/115438 - SLP reduction vect vs. bwaves

503.bwaves_r shows a case where the non-SLP optimization of performing
the reduction adjustment with the initial value as part of the epilogue
rather than including it as part of the initial vector value. It allows
to break a critical dependence path. The following restores this
ability for single-lane SLP.

On Zen2 this turns a 2.5% regression from GCC 14 into a 2.5%
improvement.

PR tree-optimization/115438
* tree-vect-loop.cc (vect_transform_cycle_phi): For SLP also
try to do the reduction adjustment by the initial value
in the epilogue.

cp: Fix another assumption in the FE about constant vector indices.

This patch adds a change to handle VLA's poly indices.

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array_1): Handle poly indices.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

aarch64: Update SVE ACLE tests

This patch updates existing SVE ACLE tests to expect new behaviour wrt SVE ACLE
types, GNU vectors and C/C++ operations.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Update test.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.

aarch64: Add testcase for C/C++ ops on SVE ACLE types.

This patch adds a test case to cover C/C++ operators on SVE ACLE types. This
does not cover all types, but covers most representative types.

gcc/testsuite:

* gcc.target/aarch64/sve/acle/general/cops.c: New test.

c: Fix constructor bounds checking for VLA and construct VLA vector constants

This patch adds support for checking bounds of SVE ACLE vector initialization
constructors. It also adds support to construct vector constant from init
constructors.

gcc/c/ChangeLog:

* c-typeck.cc (process_init_element): Add check to restrict
constructor length to the minimum vector length allowed.

gcc/ChangeLog:

* tree.cc (build_vector_from_ctor): Add support to construct VLA vector
constants from init constructors.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Update test to
test initialize error.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.

gimple: Handle variable-sized vectors in BIT_FIELD_REF

Handle variable-sized vectors for BIT_FIELD_REF canonicalization.

gcc/ChangeLog:

* gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Handle variable
sized vector types in BIT_FIELD_REF canonicalization.
* tree-cfg.cc (verify_types_in_gimple_reference): Change object-size-
checking for BIT_FIELD_REF to error offsets that are known_gt to be
outside object-size. Out-of-range offsets can happen in the case of
indices that reference VLA SVE vector elements that may be outside the
minimum vector size range and therefore maybe_gt is not appropirate
here.

c: Range-check indexing of SVE ACLE vectors

This patch adds a check for non-GNU vectors to warn that the index is outside
the range of a fixed vector size. For VLA vectors, we don't diagnose.

gcc/c-family/ChangeLog:

* c-common.cc (convert_vector_to_array_for_subscript): Add
range-check for target vector types.

aarch64: Make C/C++ operations possible on SVE ACLE types.

This patch changes the TYPE_INDIVISBLE flag to 0 to enable SVE ACLE types to be
treated as GNU vectors and have the same semantics with operations that are
defined on GNU vectors.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Flip
TYPE_INDIVISBLE flag for SVE ACLE vector types.

aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS

This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate
that C/C++ language operations are available natively on SVE ACLE types.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_SVE_VECTOR_OPERATORS.

aarch64: add ACLE macro _CHKFEAT_GCS

gcc/ChangeLog:
* config/aarch64/arm_acle.h (_CHKFEAT_GCS): New.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Update

Reviewed-by: Richard Sandiford <richard.sandiford@arm.com>

testsuite: Add check vect_unpack for pr117776.cc [PR117844]

I had missed that you need to check vect_unpack if you are
vectorizing a conversion from char to int.

Pushed as obvious after a quick test.

PR testsuite/117844
gcc/testsuite/ChangeLog:

* g++.dg/vect/pr117776.cc: Check vect_unpack.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself to write after approval.

gimple-fold: Fix up type_has_padding_at_level_p [PR117065]

The following testcase used to ICE on the trunk since the clear small
object if it has padding optimization before my r15-5746 change,
now it doesn't just because type_has_padding_at_level_p isn't called
on the testcase.

Though, as the testcase shows, structures/unions which contain erroneous
types of one or more of its members can have TREE_TYPE of the FIELD_DECL
error_mark_node, on which we can crash.

E.g. the __builtin_clear_padding lowering just ignores those:
            if (TREE_TYPE (field) == error_mark_node)
              continue;
and
                if (ftype == error_mark_node)
                  continue;
It doesn't matter much what exactly we do for those cases, as we are going
to fail the compilation anyway, but we shouldn't crash.

So, the following patch ignores those in type_has_padding_at_level_p.
For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should
cover already the erroneous fields (and we don't use TYPE_SIZE on those).

2024-11-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117065
* gimple-fold.cc (type_has_padding_at_level_p) <case UNION_TYPE>:
Also continue if f has error_mark_node type.

* gcc.dg/pr117065.c: New test.

__builtin_prefetch fixes [PR117608]

The r15-4833-ge9ab41b79933 patch had among tons of config/i386
specific changes also important change to the generic code, allowing
also 2 as valid value of the second argument of __builtin_prefetch:
-  /* Argument 1 must be either zero or one.  */
-  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
+  /* Argument 1 must be 0, 1 or 2.  */
+  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)

But the patch failed to document that change in __builtin_prefetch
documentation, and more importantly didn't adjust any of the other
backends to deal with it (my understanding is the expected behavior
is that 2 will be silently handled as 0 unless backends have some
more specific way).  Some of the backends would ICE on it, in some
cases gcc_assert failures/gcc_unreachable, in other cases crash later
(e.g. accessing arrays with that value as index and due to accessing
garbage after the array crashing at final.cc time), others treated 2
silently as 0, others treated 2 silently as 1.

And even in the i386 backend there were bugs which caused ICEs.
The patch added some if (write == 0) and write 2 handling into
a (badly indented, maybe that is the reason, if (write == 1) body),
rather than into the else side, so it would be always false.

The new *prefetch_rst2 define_insn only accepts parameters 2 1
(i.e. read-shared with moderate degree of locality), so in order
not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
of the parameter.  If that isn't what we want and we want it to be used
also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
corresponding __builtin_ia32_prefetch, maybe the define_insn could match
other values.
And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
would ICE on most of the prefetches, so I had to add the FAIL; cases.

2024-11-29  Jakub Jelinek  <jakub@redhat.com>

PR target/117608
* doc/extend.texi (__builtin_prefetch): Document that second
argument may be also 2 and its meaning.
* config/i386/i386.md (prefetch): Remove unreachable code.
Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
of locality is not 1.  Formatting fixes.
* config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
Call gen_prefetch even for TARGET_MOVRS.
* config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
* config/mips/mips.md (prefetch): Likewise.
* config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
* config/riscv/riscv.md (prefetch): Likewise.
* config/loongarch/loongarch.md (prefetch): Likewise.
* config/sparc/sparc.md (prefetch): Likewise.  Use IN_RANGE.
* config/ia64/ia64.md (prefetch): Likewise.
* config/pa/pa.md (prefetch): Likewise.
* config/aarch64/aarch64.md (prefetch): Likewise.
* config/rs6000/rs6000.md (prefetch): Likewise.

* gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
2.
* gcc.target/i386/pr117608-1.c: New test.
* gcc.target/i386/pr117608-2.c: New test.

fortran: Add default to switch in gfc_trans_transfer [PR117843]

This fixes a bootstrap failure due to a warning on enum values not being
handled. In this case, it is just checking two values and the rest should
are not handled so adding a default case fixes the issue.

Pushed as obvious.

PR fortran/117843
gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): Add default case.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ifcombine: avoid unsound forwarder-enabled combinations [PR117723]

When ifcombining contiguous blocks, we can follow forwarder blocks and
reverse conditions to enable combinations, but when there are
intervening blocks, we have to constrain ourselves to paths to the
exit that share the PHI args with all intervening blocks.

Avoiding considering forwarders when intervening blocks were present
would match the preexisting test, but we can do better, recording in
case a forwarded path corresponds to the outer block's exit path, and
insisting on not combining through any other path but the one that was
verified as corresponding.  The latter is what this patch implements.

While at that, I've fixed some typos, introduced early testing before
computing the exit path to avoid it when computing it would be
wasteful, or when avoiding it can enable other sound combinations.

for  gcc/ChangeLog

PR tree-optimization/117723
* tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record
forwarder blocks in path to exit, and stick to them.  Avoid
computing the exit if obviously not needed, and if that
enables additional optimizations.
(tree_ssa_ifcombine_bb_1): Fix typos.

for  gcc/testsuite/ChangeLog

PR tree-optimization/117723
* gcc.dg/torture/ifcmb-1.c: New.

Daily bump.

Fortran: Check for impure subroutine.

PR fortran/117765

gcc/fortran/ChangeLog:

* resolve.cc (pure_subroutine): Check for an impure subroutine
call in a BLOCK construct nested within a DO CONCURRENT block.

gcc/testsuite/ChangeLog:

* gfortran.dg/impure_fcn_do_concurrent.f90: Update test to catch
calls to an impure subroutine.

Fortran: fix crash with bounds check writing array section [PR117791]

PR fortran/117791

gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): When an array index depends on
a function evaluation or an expression, do not use optimized array
I/O of an array section and fall back to normal scalarization.

gcc/testsuite/ChangeLog:

* gfortran.dg/bounds_check_array_io.f90: New test.

libstdc++: Use std::_Destroy in std::stacktrace

This benefits from the optimizations in std::_Destroy which avoid doing
any work when using std::allocator.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::_M_impl::_M_resize):
Use std::_Destroy to destroy removed elements.

c++: define __cpp_pack_indexing [PR113798]

Forgot to do this in my original patch.

PR c++/113798

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_pack_indexing=202311L for C++26.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/feat-cxx26.C (__cpp_pack_indexing): Add test.

Reviewed-by: Jakub Jelinek <jakub@redhat.com>

i386: Macroize compound shift patterns some more

Merge ashl and <any_shiftrt:code> compound define_insn_and_split
patterns to form <any_shift:code> macroized pattern.

No functional changes.

gcc/ChangeLog:

* config/i386/i386.md (*<any_shift:insn><mode>3_mask): Macroize
pattern from *ashl<mode>3_mask and *<any_shiftrt:insn><mode>3_mask
using any_shift code iterator.
(*<any_shift:insn><mode>3_mask_1): Macroize pattern
from *ashl<mode>3_mask_1 and *<any_shiftrt:insn><mode>3_mask_1
using any_shift code iterator.
(*<any_shift:insn><mode>3_add): Macroize pattern
from *ashl<mode>3_add and *<any_shiftrt:insn><mode>3_add
using any_shift code iterator.
(*<any_shift:insn><mode>3_add_1): Macroize pattern
from *ashl<mode>3_add_1 and *<any_shiftrt:insn><mode>3_add_1
using any_shift code iterator.
(*<insn><mode>3_sub): Macroize pattern
from *ashl<mode>3_sub and *<any_shiftrt:insn><mode>3_sub
using any_shift code iterator.
(*<any_shift:insn><mode>3_sub_1): Macroize pattern
from *ashl<mode>3_sub_1 and *<any_shiftrt:insn><mode>3_sub_1
using any_shift code iterator.

libstdc++: Reorder printer registrations in printers.py

Register StdIntegralConstantPrinter with the other C++11 printers, and
register StdTextEncodingPrinter after C++20 printers.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Reorder registrations.

libstdc++: Fix allocator-extended move ctor for std::basic_stacktrace [PR117822]

libstdc++-v3/ChangeLog:

PR libstdc++/117822
* include/std/stacktrace (stacktrace(stacktrace&&, const A&)):
Fix typo in qualified-id for is_always_equal trait.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Test
allocator-extended constructors and allocator propagation.

libstdc++: Deprecate std::rel_ops namespace for C++20

This is deprecated in the C++20 standard and will be removed at some
point.

libstdc++-v3/ChangeLog:

* include/bits/stl_relops.h (rel_ops): Add deprecated attribute.
* testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc:
Add dg-warning for -Wdeprecated warnings.
* testsuite/20_util/rel_ops.cc: Likewise.
* testsuite/util/testsuite_containers.h: Disable -Wdeprecated
warnings when using rel_ops.

libstdc++: Reduce duplication in Doxygen comments for std::list

We have a number of comments which are duplicated for C++98 and C++11
overloads, where the signatures are slightly different. Instead of
duplicating the comments that are 90% identical, just use a single
comment that can apply to both. In some cases this means saying "an
iterator" instead of "A const iterator" but that's fine, a
std::list::const_iterator is still an iterator (and a non-const iterator
is a valid argument to those functions because they'll implicitly
convert to const_iterator).

In two cases the @return description just needs to say that it returns
void for C++98 and an iterator otherwise.

libstdc++-v3/ChangeLog:

* include/bits/stl_list.h: Reduce duplication in doxygen
comments.

c++: Implement P2662R3, Pack Indexing [PR113798]

This patch implements C++26 Pack Indexing, as described in
<https://wg21.link/P2662R3>.

The issue discussing how to mangle pack indexes has not been resolved
yet <https://github.com/itanium-cxx-abi/cxx-abi/issues/175> and I've
made no attempt to address it so far.

Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
Both carry two operands: the pack expansion and the index.  They are
handled in tsubst_pack_index: substitute the index and the pack and
then extract the element from the vector (if possible).

To handle pack indexing in a decltype or with decltype(auto), there is
also the new PACK_INDEX_PARENTHESIZED_P flag.

With this feature, it's valid to write something like

  using U = tmpl<Ts...[Is]...>;

where we first expand the template argument into

  Ts...[Is#0], Ts...[Is#1], ...

and then substitute each individual pack index.

PR c++/113798

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) <case PACK_INDEX_EXPR>:
New case.
* cp-objcp-common.cc (cp_common_init_ts): Mark PACK_INDEX_TYPE and
PACK_INDEX_EXPR.
* cp-tree.def (PACK_INDEX_TYPE): New.
(PACK_INDEX_EXPR): New.
* cp-tree.h (WILDCARD_TYPE_P): Also check PACK_INDEX_TYPE.
(PACK_INDEX_CHECK): Define.
(PACK_INDEX_P): Define.
(PACK_INDEX_PACK): Define.
(PACK_INDEX_INDEX): Define.
(PACK_INDEX_PARENTHESIZED_P): Define.
(make_pack_index): Declare.
(pack_index_element): Declare.
* cxx-pretty-print.cc (cxx_pretty_printer::expression) <case
PACK_INDEX_EXPR>: New case.
(cxx_pretty_printer::type_id) <case PACK_INDEX_TYPE>: New case.
* error.cc (dump_type) <case PACK_INDEX_TYPE>: New case.
(dump_type_prefix): Handle PACK_INDEX_TYPE.
(dump_type_suffix): Likewise.
(dump_expr) <case PACK_INDEX_EXPR>: New case.
* mangle.cc (write_type) <case PACK_INDEX_TYPE>: New case.
* module.cc (trees_out::type_node) <case PACK_INDEX_TYPE>: New case.
(trees_in::tree_node) <case PACK_INDEX_TYPE>: New case.
* parser.cc (cp_parser_next_tokens_are_pack_index_p): New.
(cp_parser_pack_index): New.
(cp_parser_primary_expression): Handle a C++26 pack-index-expression.
(cp_parser_unqualified_id): Handle a C++26 pack-index-specifier.
(cp_parser_nested_name_specifier_opt): See if a pack-index-specifier
follows.
(cp_parser_qualifying_entity): Handle a C++26 pack-index-specifier.
(cp_parser_decltype_expr): Set id_expression_or_member_access_p for
pack indexing.
(cp_parser_mem_initializer_id): Handle a C++26 pack-index-specifier.
(cp_parser_simple_type_specifier): Likewise.
(cp_parser_base_specifier): Likewise.
* pt.cc (iterative_hash_template_arg) <case PACK_INDEX_TYPE,
PACK_INDEX_EXPR>: New case.
(find_parameter_packs_r) <case PACK_INDEX_TYPE, PACK_INDEX_EXPR>: New
case.
(make_pack_index): New.
(tsubst_pack_index): New.
(tsubst): Avoid tsubst on PACK_INDEX_TYPE.
<case TYPENAME_TYPE>: Add a call to error.
<case PACK_INDEX_TYPE>: New case.
(tsubst_expr) <case PACK_INDEX_EXPR>: New case.
(dependent_type_p_r): Return true for PACK_INDEX_TYPE.
(type_dependent_expression_p): Recurse on PACK_INDEX_PACK for
PACK_INDEX_EXPR.
* ptree.cc (cxx_print_type) <case PACK_INDEX_TYPE>: New case.
* semantics.cc (finish_parenthesized_expr): Set
PACK_INDEX_PARENTHESIZED_P for PACK_INDEX_EXPR.
(finish_type_pack_element): Adjust error messages.
(pack_index_element): New.
* tree.cc (cp_tree_equal) <case PACK_INDEX_EXPR>: New case.
(cp_walk_subtrees) <case PACK_INDEX_TYPE, PACK_INDEX_EXPR>: New case.
* typeck.cc (structural_comptypes) <case PACK_INDEX_TYPE>: New case.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
dg-prune-output.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing1.C: New test.
* g++.dg/cpp26/pack-indexing2.C: New test.
* g++.dg/cpp26/pack-indexing3.C: New test.
* g++.dg/cpp26/pack-indexing4.C: New test.
* g++.dg/cpp26/pack-indexing5.C: New test.
* g++.dg/cpp26/pack-indexing6.C: New test.
* g++.dg/cpp26/pack-indexing7.C: New test.
* g++.dg/cpp26/pack-indexing8.C: New test.
* g++.dg/cpp26/pack-indexing9.C: New test.
* g++.dg/cpp26/pack-indexing10.C: New test.
* g++.dg/cpp26/pack-indexing11.C: New test.
* g++.dg/modules/pack-index-1_a.C: New test.
* g++.dg/modules/pack-index-1_b.C: New test.

[PATCH v6 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs.

This patch introduces new built-in functions to GCC for computing
bit-forward and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code
for a table-based CRC calculation.

The built-ins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16,
__builtin_rev_crc32_data32
__builtin_rev_crc64_data8, __builtin_rev_crc64_data16,
__builtin_rev_crc64_data32, __builtin_rev_crc64_data64,
__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16, __builtin_crc64_data32,
__builtin_crc64_data64

Each built-in takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.

To validate the correctness of these built-ins, this patch also includes
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC
computation options
that automatically adapt to the capabilities of the target hardware.

gcc/

* builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
(BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
(BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
* builtins.cc (associated_internal_fn): Handle CRC related builtins.
(expand_builtin_crc_table_based): New function.
(expand_builtin): Handle CRC related builtins.
* builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
(BUILT_IN_CRC16_DATA8): Likewise.
(BUILT_IN_CRC16_DATA16): Likewise.
(BUILT_IN_CRC32_DATA8): Likewise.
(BUILT_IN_CRC32_DATA16): Likewise.
(BUILT_IN_CRC32_DATA32): Likewise.
(BUILT_IN_CRC64_DATA8): Likewise.
(BUILT_IN_CRC64_DATA16): Likewise.
(BUILT_IN_CRC64_DATA32): Likewise.
(BUILT_IN_CRC64_DATA64): Likewise.
(BUILT_IN_REV_CRC8_DATA8): New builtin.
(BUILT_IN_REV_CRC16_DATA8): Likewise.
(BUILT_IN_REV_CRC16_DATA16): Likewise.
(BUILT_IN_REV_CRC32_DATA8): Likewise.
(BUILT_IN_REV_CRC32_DATA16): Likewise.
(BUILT_IN_REV_CRC32_DATA32): Likewise.
(BUILT_IN_REV_CRC64_DATA8): Likewise.
(BUILT_IN_REV_CRC64_DATA16): Likewise.
(BUILT_IN_REV_CRC64_DATA32): Likewise.
(BUILT_IN_REV_CRC64_DATA64): Likewise.
* builtins.h (expand_builtin_crc_table_based): New function
declaration.
* doc/extend.texi: Add documentation for new CRC builtins.

gcc/testsuite/

* gcc.dg/crc-builtin-rev-target32.c: New test.
* gcc.dg/crc-builtin-rev-target64.c: New test.
* gcc.dg/crc-builtin-target32.c: New test.
* gcc.dg/crc-builtin-target64.c: New test.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

[PATCH v6 01/12] Implement internal functions for efficient CRC computation.

Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

gcc/

* doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document.
* expr.cc (calculate_crc): New function.
(assemble_crc_table): Likewise.
(generate_crc_table): Likewise.
(calculate_table_based_CRC): Likewise.
(expand_crc_table_based): Likewise.
(gen_common_operation_to_reflect): Likewise.
(reflect_64_bit_value): Likewise.
(reflect_32_bit_value): Likewise.
(reflect_16_bit_value): Likewise.
(reflect_8_bit_value): Likewise.
(generate_reflecting_code_standard): Likewise.
(expand_reversed_crc_table_based): Likewise.
* expr.h (generate_reflecting_code_standard): New function declaration.
(expand_crc_table_based): Likewise.
(expand_reversed_crc_table_based): Likewise.
* internal-fn.cc: (crc_direct): Define.
(direct_crc_optab_supported_p): Likewise.
(expand_crc_optab_fn): New function
* internal-fn.def (CRC, CRC_REV): New internal functions.
* optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com>
Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

Fix 'libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c' for C23 default

With commit 55e3bd376b2214e200fa76d12b67ff259b06c212 "c: Default to -std=gnu23"
we've got:

    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  (test for excess errors)
    [-PASS:-]{+UNRESOLVED:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  [-execution test-]{+compilation failed to produce executable+}
    [Etc.]

..., due to:

    [...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:13: error: two or more data types in declaration specifiers
    [...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:1: warning: useless type name in empty declaration

libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
[!__cplusplus]: Don't 'typedef int bool;'.

testsuite: Fix up pr116675.c test [PR116675]

The test uses dg-do run and scan-assembler* at the same time,
that obviously doesn't work when pr116675.s isn't created at all,
so one gets
PASS: gcc.target/i386/pr116675.c execution test
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4
The usual way to handle that is adding -save-temps option.

The test FAILs after that change though, for simple reason, the pand
regex doesn't match just pand instructions, but also the pandn ones.

I've added \t there to make sure it matches only pand.

Though, wonder if it wouldn't be safer to split the test into two,
one with just the 4 functions (why noinline, noclone rather than
noipa, btw?), that one would be dg-do compile and have the scan-assembler*
directives, and then another one which includes the first one and is
dg-do run and contains the runtime checking of those.

In any case, I've committed this as obvious.

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR target/116675
* gcc.target/i386/pr116675.c: Add -save-temps to dg-options.
Scan for pand\t rather than pand.

Address UNRESOLVED for 'g++.dg/tree-ssa/empty-loop.C'

As of commit 1046c32de4956c3d706a2ff8683582fd21b8f360 "optimize basic_string",
we've got:

    PASS: g++.dg/tree-ssa/empty-loop.C  -std=gnu++17 (test for excess errors)
    [-PASS:-]{+XFAIL:+} g++.dg/tree-ssa/empty-loop.C  -std=gnu++17  scan-tree-dump-not cddce2 "if"
    {+UNRESOLVED: g++.dg/tree-ssa/empty-loop.C  -std=gnu++17  scan-tree-dump-not cddce3 "if"+}
    [Etc.]

gcc/testsuite/
* g++.dg/tree-ssa/empty-loop.C: Address UNRESOLVED.

docs: Fix up __sync_* documentation [PR117642]

The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).

So, the following patch just adjust the documentation to match the
implementation.

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.

ranger: Handle nonnull_if_nonzero attribute [PR117023]

On top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668554.html
patch which introduces the nonnull_if_nonzero attribute (because
C2Y is allowing NULL arguments on various calls like memcpy, memset,
strncpy etc. as long as the count is 0) the following patch adds just
limited handling of the attribute in the ranger, in particular infers
nonnull for the pointer argument referenced in first argument of the
attribute if the second argument is a non-zero INTEGER_CST
(integer_nonzerop).

Ideally (as the FIXME says) I'd like to query arg2 range and check if
it doesn't contain zero, but am not sure such queries are possible from
gimple_infer_range (and if it is possible whether one can just query
the currently recorded range for it or if one can call something that
will try to compute the range by walking the def stmts etc.).

Could you handle as a follow-up the range querying if it is possible?

As for useful testcase, with the patch I'm going to post next
e.g. gcc.dg/tree-ssa/pr78154.c if the calls use d as destination (not dn)
and count that will have a range which doesn't include 0 and isn't constant.

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR c/117023
* gimple-range-infer.cc (gimple_infer_range::gimple_infer_range):
Handle also nonnull_if_nonzero attributes.

Add support for nonnull_if_nonzero attribute [PR117023]

As mentioned in an earlier thread, C2Y voted in a change which made
various library APIs callable with NULL arguments in certain cases,
e.g.
memcpy (NULL, NULL, 0);
is now valid, although
memcpy (NULL, NULL, 1);
remains invalid.  This affects various APIs, including several of
GCC builtins; plus on the C library side those APIs are often declared
with nonnull attribute(s) as well.

Florian suggested using the access attribute for this, but our docs
explicitly say that access attribute doesn't imply nonnull and it doesn't
cover e.g. the qsort case where the comparison function pointer may be
also NULL if nmemb is 0, but must be non-zero otherwise.
As this case affects 21 APIs in C standard and I think is going to affect
various wrappers around those in various packages as well, I think it
is a common thing that should have its own attribute, because we should
still warn when people use
qsort (NULL, 1, 1, NULL);
etc., and similarly want to have -fsanitize=null instrumentation for those.

So, the following patch introduces nonnull_if_nonzero attribute (or would
you prefer cond_nonnull or some other name?), which has always 2 arguments,
argument index of a pointer argument (like one argument nonnull) and
argument index of an associated integral argument.  If that argument is
non-zero, it is UB to pass NULL to the pointer argument, if that argument
is zero, it is valid.  And changes various spots which already handled the
nonnull attribute to handle this one as well, with sometimes using the
ranger (or for -fsanitize=nonnull explicitly checking the associated
argument value, so instead of if (!ptr) __ubsan_... (...); it will
now do if (!ptr && sz) __ubsan_... (...);).
I've so far omitted changing gimple_infer_range (am not 100% sure how I can
use the ranger inside of the ranger) and changing the analyzer to handle it.
And I haven't changed builtins.def etc. to make use of that attribute
instead of nonnull where appropriate.

I'd then follow with the builtins.def changes (and eventually glibc
etc. would need to be adjusted too).

2024-11-28  Jakub Jelinek  <jakub@redhat.com>

PR c/117023
gcc/
* gimple.h (infer_nonnull_range_by_attribute): Add a tree *
argument defaulted to NULL.
* gimple.cc (infer_nonnull_range_by_attribute): Add op2 argument.
Handle also nonnull_if_nonzero attributes.
* tree.cc (get_nonnull_args): Fix comment typo.
* builtins.cc (validate_arglist): Handle nonnull_if_nonzero attribute.
* tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Handle
nonnull_if_nonzero attributes.
* ubsan.cc (instrument_nonnull_arg): Adjust
infer_nonnull_range_by_attribute caller.  If it returned true and
filed in non-NULL arg2, check that arg2 is non-zero as another
condition next to checking that arg is zero.
* doc/extend.texi (nonnull_if_nonzero): Document new attribute.
gcc/c-family/
* c-attribs.cc (handle_nonnull_if_nonzero_attribute): New
function.
(c_common_gnu_attributes): Add nonnull_if_nonzero attribute.
(handle_nonnull_attribute): Fix comment typo.
* c-common.cc (struct nonnull_arg_ctx): Add other member.
(check_function_nonnull): Also check nonnull_if_nonzero attributes.
(check_nonnull_arg): Use different warning wording if pctx->other
is non-zero.
(check_function_arguments): Initialize ctx.other.
gcc/testsuite/
* gcc.dg/nonnull-8.c: New test.
* gcc.dg/nonnull-9.c: New test.
* gcc.dg/nonnull-10.c: New test.
* c-c++-common/ubsan/nonnull-6.c: New test.
* c-c++-common/ubsan/nonnull-7.c: New test.

rs6000: Add PowerPC inline asm redzone clobber support

The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
patch adds rs6000 part of the support (the only other target I'm aware of
which clearly has red zone as well).

2024-11-28 Jakub Jelinek <jakub@redhat.com>

* config/rs6000/rs6000.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Force
info->push_p if cfun->machine->asm_redzone_clobber_seen.
* config/rs6000/rs6000.cc (TARGET_REDZONE_CLOBBER): Redefine.
(rs6000_redzone_clobber): New function.

* gcc.target/powerpc/asm-redzone-1.c: New test.

inline-asm, i386: Add "redzone" clobber support

The following patch adds a "redzone" clobber (recognized everywhere,
even on on targets which don't do anything with it),
with which one can mark the rare case where inline asm pushes
something on the stack or uses call instruction without taking
red zone into account (i.e. addq $-128, %rsp; and addq $128, %rsp
around that).

2024-11-28 Jakub Jelinek <jakub@redhat.com>

gcc/
* target.def (redzone_clobber): New target hook.
* varasm.cc (decode_reg_name_and_count): Return -5 for
"redzone".
* cfgexpand.cc (expand_asm_stmt): Handle redzone clobber.
* config/i386/i386.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/i386/i386.cc (ix86_compute_frame_layout): Don't
use red zone if cfun->machine->asm_redzone_clobber_seen.
(ix86_redzone_clobber): New function.
(TARGET_REDZONE_CLOBBER): Redefine.
* doc/extend.texi (Clobbers and Scratch Registers): Document
the "redzone" clobber.
* doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER.
* doc/tm.texi: Regenerate.
gcc/testsuite/
* gcc.dg/asm-redzone-1.c: New test.
* gcc.target/i386/asm-redzone-1.c: New test.

c++: Small initial fixes for zeroing of padding bits [PR117256]

https://eel.is/c++draft/dcl.init#general-6
says that even padding bits are supposed to be zeroed during
zero-initialization.
The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665565.html
patch attempts to implement that, though only for the easy
cases so far, in particular marks the CONSTRUCTOR created during
zero-initialization (or zero-initialization done during the
value-initialization) as having padding bits cleared and for
constexpr evaluation attempts to preserve that bit on a new CONSTRUCTOR
created for CONSTRUCTOR_ZERO_PADDING_BITS lhs.

I think we need far more than that, but am not sure where exactly
to implement that.
In particular, I think __builtin_bitcast should take it into account
during constant evaluation, if the padding bits in something are guaranteed
to be zero, then I'd think std::bitcast out of it and testing those
bits in there should be well defined.
But if we do that, the flag needs to be maintained precisely, not just
conservatively, so e.g. any place where some object is copied into another
one (except bitcast?) which would be element-wise copy, the bit should
be cleared (or preserved from the earlier state?  I'd hope
element-wise copying invalidates even the padding bits, but then what
about just stores into some members, do those invalidate the padding bits
in the rest of the object?).  But if it is an elided copy, it shouldn't.
And am not really sure what happens e.g. with non-automatic constexpr
variables.  If it is constructed by something that doesn't guarantee
the zeroing of the padding bits (so similarly constructed constexpr automatic
variable would have undefined state of the padding bits), are those padding
bits well defined because it isn't automatic variable?

Anyway, I hope the following patch is at least a small step in the right
direction.

2024-11-28  Jakub Jelinek  <jakub@redhat.com>

PR c++/78620
PR c++/117256
* init.cc (build_zero_init_1): Set CONSTRUCTOR_ZERO_PADDING_BITS.
(build_value_init_noctor): Likewise.
* constexpr.cc (cxx_eval_store_expression): Propagate
CONSTRUCTOR_ZERO_PADDING_BITS flag.

expr, c: Don't clear whole unions [PR116416]

As discussed earlier, we currently clear padding bits even when we
don't have to and that causes pessimization of emitted code,
e.g. for
union U { int a; long b[64]; };
void bar (union U *);
void
foo (void)
{
  union U u = { 0 };
  bar (&u);
}
we need to clear just u.a, not the whole union, but on the other side
in cases where the standard requires padding bits to be zeroed, like for
C23 {} initializers of aggregates with padding bits, or for C++11 zero
initialization we don't do that.

This patch
a) moves some of the stuff into complete_ctor_at_level_p (but not
   all the *p_complete = 0; case, for that it would need to change
   so that it passes around the ctor rather than just its type) and
   changes the handling of unions
b) introduces a new option, so that users can either get the new
   behavior (only what is guaranteed by the standards, the default),
   or previous behavior (union padding zero initialization, no such
   guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
   (if any) should be zero initialized (and sets it for now in the C
   FE for C23 {} initializers).

Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers.  Maybe with
some designated initializer games, say
void foo () {
  struct S { char a; long long b; };
  struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
?  What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?

And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);

void
foo ()
{
  T t (1, 2);
  bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits?  Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?
Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.
For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.

2024-11-28  Jakub Jelinek  <jakub@redhat.com>

PR c++/116416
gcc/
* flag-types.h (enum zero_init_padding_bits_kind): New type.
* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
* common.opt (fzero-init-padding-bits=): New option.
* expr.cc (categorize_ctor_elements_1): Handle
CONSTRUCTOR_ZERO_PADDING_BITS or
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix
up *p_complete = -1; setting for unions.
(complete_ctor_at_level_p): Handle unions differently for
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
handling, return also true for UNION_TYPE with no FIELD_DECLs
and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
* doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
gcc/c/
* c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
for flag_isoc23 empty initializers.
* c-typeck.cc (constructor_zero_padding_bits): New variable.
(struct constructor_stack): Add zero_padding_bits member.
(really_start_incremental_init): Save and clear
constructor_zero_padding_bits.
(push_init_level): Save constructor_zero_padding_bits.  Or into it
CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
(pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
constructor_zero_padding_bits and restore
constructor_zero_padding_bits.
gcc/testsuite/
* gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
diagnostics.
* gcc.dg/c23-empty-init-5.c: New test.
* gcc.dg/gnu11-empty-init-1.c: New test.
* gcc.dg/gnu11-empty-init-2.c: New test.
* gcc.dg/gnu11-empty-init-3.c: New test.
* gcc.dg/gnu11-empty-init-4.c: New test.

middle-end: rework vectorizable_store to iterate over single index [PR117557]

The testcase

#include <stdint.h>
#include <string.h>

#define N 8
#define L 8

void f(const uint8_t * restrict seq1,
       const uint8_t *idx, uint8_t *seq_out) {
  for (int i = 0; i < L; ++i) {
    uint8_t h = idx[i];
    memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
  }
}

compiled at -O3 -mcpu=neoverse-n1+sve

miscompiles to:

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    st1w    z31.s, p7, [x24, z12.s, sxtw]

rather than

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    addvl   x3, x24, #2
    st1w    z31.s, p3, [x3, z12.s, sxtw]

Where two things go wrong, the wrong mask is used and the address pointers to
the stores are wrong.

This issue is happening because the codegen loop in vectorizable_store is a
nested loop where in the outer loop we iterate over ncopies and in the inner
loop we loop over vec_num.

For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
determined by only the outerloop index and the pointer address is only updated
in the outer loop.

As such for SLP we always use the same predicate and the same memory location.
This patch flattens the two loops and instead iterates over ncopies * vec_num
and simplified the indexing.

This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error
moves somewhere else.  I will look at that next but fixes some other libraries
that also started failing.

gcc/ChangeLog:

PR tree-optimization/117557
* tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
vec_num loops.

gcc/testsuite/ChangeLog:

PR tree-optimization/117557
* gcc.target/aarch64/pr117557.c: New test.

libstdc++: Include <sys/cdefs.h> in os_defines.h for FreeBSD [PR117210]

This is needed so that __LONG_LONG_SUPPORTED is defined before we depend
on it.

libstdc++-v3/ChangeLog:

PR libstdc++/117210
* config/os/bsd/dragonfly/os_defines.h: Include <sys/cdefs.h>.
* config/os/bsd/freebsd/os_defines.h: Likewise.

gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]

When one puts incorrect const or pure attributes on declarations of various
C APIs which have corresponding builtins (vs. what they actually do), we can
get tons of ICEs in gimple-fold.cc.

The following patch fixes it by giving up gimple_fold_builtin_* folding
if the functions don't have gimple_vdef (or for pure functions like
bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
they will surely have both of those NULL even when declared correctly,
yet it is highly desirable to fold them).

Or shall I replace
!gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
tests with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
and
!gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
?

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/117358
* gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
vdef in ssa form.
(gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_bzero): Likewise.
(gimple_fold_builtin_memset): Likewise. Use return false instead of
return NULL_TREE.
(gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strncpy): Likewise.
(gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_strstr): Likewise.
(gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strcat_chk): Likewise.
(gimple_fold_builtin_strncat): Likewise.
(gimple_fold_builtin_strncat_chk): Likewise.
(gimple_fold_builtin_string_compare): Likewise.
(gimple_fold_builtin_fputs): Likewise.
(gimple_fold_builtin_memory_chk): Likewise.
(gimple_fold_builtin_stxcpy_chk): Likewise.
(gimple_fold_builtin_stxncpy_chk): Likewise.
(gimple_fold_builtin_stpcpy): Likewise.
(gimple_fold_builtin_snprintf_chk): Likewise.
(gimple_fold_builtin_sprintf_chk): Likewise.
(gimple_fold_builtin_sprintf): Likewise.
(gimple_fold_builtin_snprintf): Likewise.
(gimple_fold_builtin_fprintf): Likewise.
(gimple_fold_builtin_printf): Likewise.
(gimple_fold_builtin_realloc): Likewise.

* gcc.c-torture/compile/pr117358.c: New test.

builtins: Handle BITINT_TYPE in __builtin_iseqsig folding [PR117802]

In check_builtin_function_arguments in the _BitInt patchset I've changed
INTEGER_TYPE tests to INTEGER_TYPE or BITINT_TYPE, but haven't done the
same in fold_builtin_iseqsig, which now ICEs because of that.

The following patch fixes that.

BTW, that TYPE_PRECISION (type0) >= TYPE_PRECISION (type1) test
for REAL_TYPE vs. REAL_TYPE looks pretty random and dangerous, I think
it would be useful to handle this builtin also in the C and C++ FEs,
if both arguments have REAL_TYPE, use the FE specific routine to decide
which types to use and error if a comparison between types would be
erroneous (e.g. complain about _Decimal* vs. float/double/long
double/_Float*, pick up the preferred type, complain about
__ibm128 vs. _Float128 in C++, etc.).
But the FEs can just promote one argument to the other in that case
and keep fold_builtin_iseqsig as is for say Fortran and other FEs.

2024-11-28 Jakub Jelinek <jakub@redhat.com>

PR c/117802
* builtins.cc (fold_builtin_iseqsig): Handle BITINT_TYPE like
INTEGER_TYPE.

* gcc.dg/builtin-iseqsig-1.c: New test.
* gcc.dg/bitint-118.c: New test.

c: Fix gimplification ICE for shifts with invalid redeclarations

As reported in bug 117757, there is a C gimplification ICE for shifts
involving a variable that was incompatibly redeclared (and thus had
its type changed to error_mark_node). Fix this with an appropriate
error_operand_p check.

Note that this is not the same issue as any of the other bugs reported
for ICEs later in the gimplifier dealing with such erroneous
redeclarations (it is, however, the same as the *second* ICE reported
in bug 115644 - the test in comment#1 for that bug, not the one in the
original bug report).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/117757

gcc/c-family/
* c-gimplify.cc (c_gimplify_expr): Check for error_operand_p
before calling TYPE_MAIN_VARIANT for shifts.

gcc/testsuite/
* gcc.dg/pr117757-1.c: New test.

analyzer,timevar: avoid naked "new" in JSON-handling

Now that <memory> is always included, use std::unique_ptr in a few more
places to avoid naked "new".

No functional change intended.

gcc/analyzer/ChangeLog:
* engine.cc (strongly_connected_components::to_json): Avoid naked
"new".
* infinite-loop.cc (infinite_loop::to_json): Convert return type
to unique_ptr. Avoid naked "new".
* sm-signal.cc (signal_delivery_edge_info_t::to_json): Delete
unused function.
* supergraph.cc (supernode::to_json): Avoid naked "new".

gcc/ChangeLog:
* timevar.cc: Include "make-unique.h".
(timer::named_items::make_json): Convert return type to unique_ptr.
Avoid naked "new".
(make_json_for_timevar_time_def): Likewise.
(timer::timevar_def::make_json): Likewise.
(timer::make_json): Likewise.
* timevar.h (timer::make_json): Likewise.
(timer::timevar_def::make_json): Likewise.
* tree-diagnostic-client-data-hooks.cc: Update for above changes.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c-family: offer suggestions for missing command-line options [PR82892]

Some builtin macros are only defined when certain command-line options
are provided. Update the error messages for them so that we suggest
the pertinent option ('-fopenacc' for '_OPENACC', and '-fopenmp' for
'_OPENMP')

gcc/c-family/ChangeLog:
PR c/82892
* c-common.h (get_option_for_builtin_define): New decl.
* c-cppbuiltin.cc (get_option_for_builtin_define): New.
* known-headers.cc: Include "opts.h".
(suggest_missing_option::suggest_missing_option): New.
(suggest_missing_option::~suggest_missing_option): New.
* known-headers.h (class suggest_missing_option): New.

gcc/c/ChangeLog:
PR c/82892
* c-decl.cc (lookup_name_fuzzy): Provide hints for missing
command-line options.

gcc/cp/ChangeLog:
PR c/82892
* name-lookup.cc (suggest_alternatives_for_1): Provide hints for
missing command-line options.

gcc/testsuite/ChangeLog:
PR c/82892
* c-c++-common/spellcheck-missing-option.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

C/C++: add fix-it hints for missing '&' and '*' (v5) [PR87850]

This patch adds a note with a fix-it hint to various
pointer-vs-non-pointer diagnostics, suggesting the addition of
a leading '&' or '*'.

For example, note the ampersand fix-it hint in the following:

demo.c: In function 'int main()':
demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned int'}
   to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
    5 |   pthread_key_create(key, NULL);
      |                      ^~~
      |                      |
      |                      pthread_key_t {aka unsigned int}
demo.c:5:22: note: possible fix: take the address with '&'
    5 |   pthread_key_create(key, NULL);
      |                      ^~~
      |                      &
In file included from demo.c:1:
/usr/include/pthread.h:1122:47: note:   initializing argument 1 of
   'int pthread_key_create(pthread_key_t*, void (*)(void*))'
1122 | extern int pthread_key_create (pthread_key_t *__key,
      |                                ~~~~~~~~~~~~~~~^~~~~

gcc/c-family/ChangeLog:
PR c++/87850
* c-common.cc: Include "gcc-rich-location.h".
(maybe_emit_indirection_note): New function.
* c-common.h (maybe_emit_indirection_note): New decl.
(compatible_types_for_indirection_note_p): New decl.

gcc/c/ChangeLog:
PR c++/87850
* c-typeck.cc (compatible_types_for_indirection_note_p): New
function.
(convert_for_assignment): Call maybe_emit_indirection_note for
pointer vs non-pointer diagnostics.

gcc/cp/ChangeLog:
PR c++/87850
* call.cc (convert_like_real): Call maybe_emit_indirection_note
for "invalid conversion" diagnostic.
* typeck.cc (compatible_types_for_indirection_note_p): New
function.

gcc/testsuite/ChangeLog:
PR c++/87850
* c-c++-common/indirection-fixits.c: New test.
* g++.dg/template/error60.C: Add fix-it hint to expected output.
* g++.dg/template/error60a.C: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: replace %<%s%> with %qs [PR104896]

No functional change intended.

gcc/analyzer/ChangeLog:
PR c/104896
* sm-malloc.cc: Replace "%<%s%>" with "%qs" in message wording.

gcc/c-family/ChangeLog:
PR c/104896
* c-lex.cc (c_common_lex_availability_macro): Replace "%<%s%>"
with "%qs" in message wording.
* c-opts.cc (c_common_handle_option): Likewise.
* c-warn.cc (warn_parm_array_mismatch): Likewise.

gcc/ChangeLog:
PR c/104896
* common/config/ia64/ia64-common.cc (ia64_handle_option): Replace
"%<%s%>" with "%qs" in message wording.
* common/config/rs6000/rs6000-common.cc (rs6000_handle_option):
Likewise.
* config/aarch64/aarch64.cc (aarch64_validate_sls_mitigation):
Likewise.
(aarch64_override_options): Likewise.
(aarch64_process_target_attr): Likewise.
* config/arm/aarch-common.cc (aarch_validate_mbranch_protection):
Likewise.
* config/pru/pru.cc (pru_insert_attributes): Likewise.
* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::parse_arch): Likewise.
* omp-general.cc (oacc_verify_routine_clauses): Likewise.
* tree-ssa-uninit.cc (maybe_warn_read_write_only): Likewise.
(maybe_warn_pass_by_reference): Likewise.

gcc/cp/ChangeLog:
PR c/104896
* cvt.cc (maybe_warn_nodiscard): Replace "%<%s%>" with "%qs" in
message wording.

gcc/fortran/ChangeLog:
PR c/104896
* resolve.cc (resolve_operator): Replace "%<%s%>" with "%qs" in
message wording.

gcc/go/ChangeLog:
PR c/104896
* gofrontend/embed.cc (Gogo::initializer_for_embeds): Replace
"%<%s%>" with "%qs" in message wording.
* gofrontend/expressions.cc
(Selector_expression::lower_method_expression): Likewise.
* gofrontend/gogo.cc (Gogo::set_package_name): Likewise.
(Named_object::export_named_object): Likewise.
* gofrontend/parse.cc (Parse::struct_type): Likewise.
(Parse::parameter_list): Likewise.

gcc/rust/ChangeLog:
PR c/104896
* backend/rust-compile-expr.cc
(CompileExpr::compile_integer_literal): Replace "%<%s%>" with
"%qs" in message wording.
(CompileExpr::compile_float_literal): Likewise.
* backend/rust-compile-intrinsic.cc (Intrinsics::compile):
Likewise.
* backend/rust-tree.cc (maybe_warn_nodiscard): Likewise.
* checks/lints/rust-lint-scan-deadcode.h: Likewise.
* lex/rust-lex.cc (Lexer::parse_partial_unicode_escape): Likewise.
(Lexer::parse_raw_byte_string): Likewise.
* lex/rust-token.cc (Token::get_str): Likewise.
* metadata/rust-export-metadata.cc
(PublicInterface::write_to_path): Likewise.
* parse/rust-parse.cc
(peculiar_fragment_match_compatible_fragment): Likewise.
(peculiar_fragment_match_compatible): Likewise.
* resolve/rust-ast-resolve-path.cc (ResolvePath::resolve_path):
Likewise.
* resolve/rust-ast-resolve-toplevel.h: Likewise.
* resolve/rust-ast-resolve-type.cc (ResolveRelativeTypePath::go):
Likewise.
* rust-session-manager.cc (validate_crate_name): Likewise.
(Session::load_extern_crate): Likewise.
* typecheck/rust-hir-type-check-expr.cc (TypeCheckExpr::visit):
Likewise.
(TypeCheckExpr::resolve_fn_trait_call): Likewise.
* typecheck/rust-hir-type-check-implitem.cc
(TypeCheckImplItemWithTrait::visit): Likewise.
* typecheck/rust-hir-type-check-item.cc
(TypeCheckItem::validate_trait_impl_block): Likewise.
* typecheck/rust-hir-type-check-struct.cc
(TypeCheckStructExpr::visit): Likewise.
* typecheck/rust-tyty-call.cc (TypeCheckCallExpr::visit):
Likewise.
* typecheck/rust-tyty.cc (BaseType::bounds_compatible): Likewise.
* typecheck/rust-unify.cc (UnifyRules::emit_abi_mismatch):
Likewise.
* util/rust-attributes.cc (AttributeChecker::visit): Likewise.

libcpp/ChangeLog:
PR c/104896
* pch.cc (cpp_valid_state): Replace "%<%s%>" with "%qs" in message
wording.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Daily bump.

optimize basic_string

Add __builtin_unreachable conditionls to declare value ranges of
basic_string::length(). FIx max_size() to return actual max size
using logic similar to std::vector. Aviod use of size() in empty()
to save some compile time overhead.

As disucced, max_size() change is technically ABI breaking, but
hopefully this does not really matter in practice.

Change of length() breaks empty-loop testcase where we now optimize the
loop only after inlining, so template is updated to check cddce3 instead
of cddce2. This is PR117764.

With these chages we now optimize out unused strings as tested in
string-1.C

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (basic_string::size(),
basic_string::length(), basic_string::capacity()): Add
__builtin_unreachable to declare value ranges.
(basic_string::empty()): Implement directly
(basic_string::max_size()): Account correctly the terminating 0
and limits implied by ptrdiff_t.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/empty-loop.C: xfail optimization at cddce2 and check
it happens at cddce3.
* g++.dg/tree-ssa/string-1.C: New test.

c: Fix ICE using function name in parameter type in old-style function definition [PR91193]

As reported in bug 91193, if an old-style function definition
redeclares a typedef name as a function, then uses that function name
at the start of the first old-style parameter definition, then the
parser interprets that token as a typedef name (because lookahead
occurred before processing of the function declarator completed), but
when it is looked up in processing that parameter definition, what is
found is the redefinition, resulting in an ICE.

The function name's scope starts at the end of its declarator, so this
is similar to other cases where we call
c_parser_maybe_reclassify_token because lookahead might have
classified a token as being a typedef or not based on information from
the wrong scope; do so in this case as well, so resulting in the
expected parse errors from using something that's no longer a typedef
name as if it were a typedef name, and eliminating the ICE.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/91193

gcc/c/
* c-parser.cc (c_parser_maybe_reclassify_token): Define earlier.
(c_parser_declaration_or_fndef): Call
c_parser_maybe_reclassify_token before parsing old-style parameter
definitions.

gcc/testsuite/
* gcc.dg/pr91193-1.c, gcc.dg/pr91193-2.c: New tests.

libstdc++: Remove __builtin_expect from consteval assertion

libstdc++-v3/ChangeLog:

* include/bits/c++config (__glibcxx_assert): Remove useless
__builtin_expect from constexpr-only assertion. Improve
comments.

libstdc++: Add cold attribute to assertion failure functions [PR117650]

This helps the compiler to split the cold path into a separate clone, so
that the hot path is a smaller function that uses less icache, and the
cold path is only fetched into the icache if actually executed.

libstdc++-v3/ChangeLog:

PR libstdc++/117650
* include/bits/c++config (__glibcxx_assert_fail): Add cold
attribute.
* include/debug/formatter.h (_Error_formatter::_M_error):
Likewise.

i386: x86 can use x >> y for x >> 32+y [PR36503]

x86 targets mask 32-bit shifts with a 5-bit mask (and 64-bit with 6-bit mask),
so they can use x >> y instead of x >> 32+y.

The optimization converts:

leal    32(%rsi), %ecx
sall    %cl, %eax

to:
sall    %cl, %eax

PR target/36503

gcc/ChangeLog:

* config/i386/i386.md (*ashl<mode>3_add):
New define_insn_and_split pattern.
(*ashl<mode>3_add_1): Ditto.
(*<insn><mode>3_add): Ditto.
(*<insn><mode>3_add_1): Ditto.
(*ashl<mode>3_sub): Rename from *ashl<mode>3_negcnt.
(*ashl<mode>3_sub_1): Rename from *ashl<mode>3_negcnt_1.
(*<insn><mode>3_sub): Rename from *<insn><mode>3_negcnt.
(*<insn><mode>3_sub_1): Rename from *<insn><mode>3_negcnt_1.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr36503-3.c: New test.
* gcc.target/i386/pr36503-4.c: New test.

match: Improve handling of double convert [PR117776]

For a double conversion, we will simplify it into a conversion
with an and if the outer type and inside precision matches and
the intra precision is smaller and unsigned. We should be able
to extend this to where the outer precision is larger too.
This is a good canonicalization too.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117776
gcc/ChangeLog:

* match.pd (nested int casts): Allow for the case
where the final prec is greater than the original
prec.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr117776.cc: New test.
* gcc.dg/tree-ssa/cast-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++/ranges: make _RangeAdaptorClosure befriend operator|

This declares the range adaptor pipe operators a friend of the
_RangeAdaptorClosure base class so that the std module doesn't need to
export them for ADL to find them.

Note that we deliberately don't define these pipe operators as hidden
friends, see r14-3293-g4a6f3676e7dd9e.

libstdc++-v3/ChangeLog:

* include/std/ranges (views::__adaptor::_RangeAdaptorClosure):
Befriend both operator| overloads.
* src/c++23/std.cc.in: Don't export views::__adaptor::operator|.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

c: Fix sizeof error recovery [PR117745]

Compilation of the following testcase hangs forever after emitting first
error.  The problem is that in one place we just return error_mark_node
directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
The parsing of the expression could have called record_maybe_used_decl
though, but nothing calls pop_maybe_used which needs to be called after
parsing of every sizeof/typeof, successful or not.
At the end of the toplevel declaration we free the parser_obstack and in
another function record_maybe_used_decl is called again and due to the
missing pop_maybe_unused we end up with a cycle in the chain.

The following patch fixes it by just setting error and goto to the
    sizeof_expr:
      c_inhibit_evaluation_warnings--;
      in_sizeof--;
      mark_exp_read (expr.value);
      if (TREE_CODE (expr.value) == COMPONENT_REF
          && DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
        error_at (expr_loc, "%<sizeof%> applied to a bit-field");
      result = c_expr_sizeof_expr (expr_loc, expr);
where c_expr_sizeof_expr will do:
  struct c_expr ret;
  if (expr.value == error_mark_node)
    {
      ret.value = error_mark_node;
      ret.original_code = ERROR_MARK;
      ret.original_type = NULL;
      ret.m_decimal = 0;
      pop_maybe_used (false);
    }
...
  return ret;
which is exactly what the old code did manually except for the missing
pop_maybe_used call.  mark_exp_read does nothing on error_mark_node and
error_mark_node doesn't have COMPONENT_REF tree_code.

2024-11-27  Jakub Jelinek  <jakub@redhat.com>

PR c/117745
* c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
just expr.set_error () and goto sizeof_expr instead of doing error
recovery manually.

* gcc.dg/pr117745.c: New test.

I386: Add more testcases for unsigned SAT_ADD vector pattern

Some forms like below failed to be recognized as a SAT_ADD pattern
for target i386.  It is related to some match pattern
extraction but get fixed after the refactor of the SAT_ADD
pattern.  Thus, add testcases to ensure we won't have similar
issues in the future.

  #define DEF_SAT_ADD(T)   \
  T sat_add_##T (T x, T y) \
  {                        \
    T res;                 \
    res = x + y;           \
    res |= -(T)(res < x);  \
    return res;            \
  }

  #define VEC_DEF_SAT_ADD(T)                       \
  void vec_sat_add(T * restrict a, T * restrict b) \
  {                                                \
    for (int i = 0; i < 8; i++)                    \
      b[i] = sat_add_##T (a[i], b[i]);             \
  }

  DEF_SAT_ADD (uint32_t)
  VEC_DEF_SAT_ADD (uint32_t)

The below test suites are passed for this patch.
make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\} i386.exp=pr112600-5a-*.c"

PR target/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-5-u16.c: New test.
* gcc.target/i386/pr112600-5-u32.c: New test.
* gcc.target/i386/pr112600-5-u64.c: New test.
* gcc.target/i386/pr112600-5-u8.c: New test.
* gcc.target/i386/pr112600-5.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Match: Refactor the unsigned SAT_ADD match ADD_OVERFLOW pattern [NFC]

This patch would like to refactor the unsigned SAT_ADD pattern when
leverage the IFN ADD_OVERFLOW, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_ADD match pattern for
IFN ADD_OVERFLOW.

Signed-off-by: Pan Li <pan2.li@intel.com>

c: Do not remove _Atomic from array element type for typeof_unqual [PR117781]

As reported in bug 117781, my fix for bug 112841 broke the case of
typeof_unqual applied to an array of _Atomic elements, which should
not have _Atomic removed since only the element type is atomic, not
the array type. Fix with logic to ensure that atomic element types
are preserved as such, while other qualifiers (i.e. those that are
semantically rather than only syntactically such in C) are removed.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/117781

gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Do not remove _Atomic
from array element type for typeof_unqual.

gcc/testsuite/
* gcc.dg/c23-typeof-5.c: New test.

builtins: Emit __sync_lock_release_{8,16} call as last resort instead of doing nothing [PR117642]

As the following testcases show, in case of multi-word
__sync_lock_test_and_set where we don't actually support atomics for that
size (__int128 for x86_64 lp64 with -mno-cx16, long long for ia32 with
-march=i{3,4}86), as the last fallback if we don't know anything else
we just emit calls to __sync_lock_test_and_set_{8,16}. Those aren't defined
in libatomic, but perhaps users could define them themselves.
While __sync_lock_release if it gives up and has no way to emit the atomic
store just does nothing at all, so no clear sign to the users something
went wrong and that the code will not do what they expected.
This regressed when __atomic_* support has been introduced, previously we
would just emit those calls even in this case.

The patch just emits the call as the last fallback.

2024-11-27 Jakub Jelinek <jakub@redhat.com>

PR target/117642
* builtins.cc (expand_builtin_sync_lock_release): Change return type
from void to rtx, return result of expand_atomic_store.
(expand_builtin) <case BUILT_IN_SYNC_LOCK_RELEASE_16>: If
expand_builtin_sync_lock_release returns NULL, do a break rather
than return const0_rtx.

* gcc.target/i386/pr117642-1.c: New test.
* gcc.target/i386/pr117642-2.c: New test.