git.ipfire.org Git - thirdparty/gcc.git/log

arm, mve: Adding missing Runtime Library Exception to header files

Add missing Runtime Library Exception to mve header files to bring them into
line with other similar headers. Not adding it in the first place was an
oversight.

gcc/ChangeLog:

* config/arm/arm_mve.h: Add Runtime Library Exception.
* config/arm/arm_mve_types.h: Likewise.

tree-optimization/116352 - SLP scheduling and stmt order

The PR uncovers unchecked constraints on the ability to code-generate
with SLP but also latent issues with regard to stmt order checking
since loop (early-break) and BB (for quite some time) vectorization
are no longer constraint to single-BBs. In particular get_later_stmt
simply compares UIDs of stmts, but that's only reliable when they
are in the same BB.

For the PR in question the problematical case is demoting a SLP node
to external which fails to check we can actually code generate this
in the way we do (using get_later_stmt). The following thus adds
checking that we demote to external only when all defs are from
the same BB.

We no longer vectorize gcc.dg/vect/bb-slp-49.c but the testcase was
for a wrong-code issue and the vectorization done is a no-op.

PR tree-optimization/116352
PR tree-optimization/117876
* tree-vect-slp.cc (vect_slp_can_convert_to_external): New.
(vect_slp_convert_to_external): Call it.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/pr116352.c: New testcase.
* gcc.dg/vect/bb-slp-49.c: Remove vectorization check.

testsuite: Adjust rs6000-ldouble-2.c for switch to -std=gnu23 by default [PR117663]

-std=gnu23/-std=c23 changes LDBL_EPSILON for IBM long double, see r13-3029 and
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602738.html
for details.

That change even had a note:
"and when we move to a C2x
default, gcc.target/powerpc/rs6000-ldouble-2.c will need an
appropriate option added to keep using an older language version"

The following patch just implements it to fix rs6000-ldouble-2.c regression.

2024-12-02 Jakub Jelinek <jakub@redhat.com>

PR testsuite/117663
* gcc.target/powerpc/rs6000-ldouble-2.c: Add -std=gnu17 to dg-options.

RISC-V: Add intrinsics testcases for SiFive Xsfvfnrclipxfqf extensions.

This commit adds testcases for Xsfvfnrclipxfqf.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: New test.

RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.

This commit adds intrinsics support for XXsfvfnrclipxfqf. We also redefine
the enum type frm_op_type in riscv-vector-builtins-bases.h file, because it
be used in sifive-vector-builtins-bases.cc file.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/ChangeLog:

* config/riscv/generic-vector-ooo.md: New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-vector-builtins-bases.cc (enum frm_op_type): Delete it.
* config/riscv/riscv-vector-builtins-bases.h (enum frm_op_type): Redefine in h file.
* config/riscv/riscv-vector-builtins-shapes.cc (struct sf_vfnrclip_def): New function.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE_INDEX): New builtins def.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): New base def.
(signed_eew8_index): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New extension.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/sifive-vector-builtins-bases.cc (class sf_vfnrclip_x_f_qf): New function.
(class sf_vfnrclip_xu_f_qf): Ditto.
(BASE): New base_name.
* config/riscv/sifive-vector-builtins-bases.h: New function_base.
* config/riscv/sifive-vector-builtins-functions.def
(REQUIRED_EXTENSIONS): New intrinsics def.
(sf_vfnrclip_x_f_qf): Ditto.
(sf_vfnrclip_xu_f_qf): Ditto.
* config/riscv/sifive-vector.md (@pred_sf_vfnrclip<v_su><mode>_x_f_qf): New RTL mode.
* config/riscv/vector-iterators.md: New iterator.

riscv: Avoid narrowing warning

* config/riscv/riscv.cc (fli_value_hf, fli_value_sf)
(fli_value_df): Use integer constants. Constify.
(riscv_float_const_rtx_index_for_fli): Add const.

x86: Correct comments for pass_apx_nf_convert

Change pass_rpad to pass_apx_nf_convert in pass_apx_nf_convert comments.

* config/i386/i386-features.cc (pass_apx_nf_convert): Change
pass_rpad to pass_apx_nf_convert in comments.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

RISC-V: Fix incorrect optimization options passing to widden

Like the strided load/store, the testcases of vector widen are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we fixed for
strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix RVV strided load/store testcases failure

This patch would like to fix the testcases failures of strided
load/store after sorts of optimization option passing to testcase.

* Add no strict align for vector option.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
on different optimization options (like O2, O3, zvl).

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: Fix
the failed test by target any-opts and/or no-opts.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: Ditto

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[contrib] validate_failures.py: fix python 3.12 escape sequence warnings

The warnings:
contrib/testsuite-management/validate_failures.py:65: SyntaxWarning: invalid escape sequence '\s'
_VALID_TEST_RESULTS_REX = re.compile('(%s):\s*(\S+)\s*(.*)'
contrib/testsuite-management/validate_failures.py:77: SyntaxWarning: invalid escape sequence '\.'
_EXP_LINE_REX = re.compile('^Running (?:.*:)?(.*) \.\.\.\n')

contrib/ChangeLog:
* testsuite-management/validate_failures.py: Change re.compile()
function arguments to Python raw strings.

[PATCH] gcc: configure: Fix the optimization flags cleanup

Currently sed command in flag cleanup removes all the -O[0-9] flags, ignoring
the context. This leads to issues when the optimization flags is passed to
linker:

CFLAGS="-Os -Wl,-O1 -Wl,--hash-style=gnu"
is converted into
CFLAGS="-Os -Wl,-Wl,--hash-style=gnu"

Which leads to configure failure with ld: unrecognized option '-Wl,-Wl'.

gcc/
* configure.ac: Only remove -O[0-9] if not preceded with comma
* configure: Regenerated

Thanks for the feedback on the first version of the patch. Accordingly:

I have corrected the code formatting as requested. I added new tests to
the existing file phi-opt-11.c, instead of creating a new one.

I performed testing before and after applying the patch on the x86
architecture, and I confirm that there are no new regressions.

The logic and general code of the patch itself have not been changed.

> So the A EQ/NE B expression, we can reverse A and B in the expression
> and still get the same result. But don't we have to be more careful for
> the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for
> BIT_IOR we need ? b : a.
>
> I don't see that gets verified in the existing code or after your
> change. I suspect I'm just missing something here. Can you clarify how
> we verify that BIT_AND gets ? a : b for the true/false arms and that
> BIT_IOR gets ? b : a for the true/false arms?

I did not communicate this clearly last time, but the existing optimization
simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b".
Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a".

Thus, the existing and my optimization perform the following
simplifications:

(cond & (a == b)) ? a : b -> b
(cond & (a == b)) ? b : a -> a
(cond | (a != b)) ? a : b -> a
(cond | (a != b)) ? b : a -> b

For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to
confirm that one operand matches the true/false arm and the other matches
the false/true arm. In both cases, we simplify the expression to the third
operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2).
This is achieved in the value_replacement function after successfully
setting the value of *code within the rhs_is_fed_for_value_replacement
function to EQ_EXPR.

For BIT_IOR_EXPR, the same check is performed for A NE B, except now
*code remains NE_EXPR, and then value_replacement returns the second
operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1).

2024-10-30 Jovan Vukic <Jovan.Vukic@rt-rk.com>

gcc/ChangeLog:

* tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new
optimization opportunity for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-11.c: Add more tests.

[PATCH v7 12/12] Add tests for CRC detection and generation

gcc/testsuite

* gcc.dg/crc-from-fedora-packages-1.c: New test.
* gcc.dg/crc-from-fedora-packages-2.c: Likewise.
* gcc.dg/crc-from-fedora-packages-3.c: Likewise.
* gcc.dg/crc-from-fedora-packages-4.c: Likewise.
* gcc.dg/crc-from-fedora-packages-5.c: Likewise.
* gcc.dg/crc-from-fedora-packages-6.c: Likewise.
* gcc.dg/crc-from-fedora-packages-7.c: Likewise.
* gcc.dg/crc-from-fedora-packages-8.c: Likewise.
* gcc.dg/crc-from-fedora-packages-9.c: Likewise.
* gcc.dg/crc-from-fedora-packages-10.c: Likewise.
* gcc.dg/crc-from-fedora-packages-11.c: Likewise.
* gcc.dg/crc-from-fedora-packages-12.c: Likewise.
* gcc.dg/crc-from-fedora-packages-13.c: Likewise.
* gcc.dg/crc-from-fedora-packages-14.c: Likewise.
* gcc.dg/crc-from-fedora-packages-15.c: Likewise.
* gcc.dg/crc-from-fedora-packages-16.c: Likewise.
* gcc.dg/crc-from-fedora-packages-17.c: Likewise.
* gcc.dg/crc-from-fedora-packages-18.c: Likewise.
* gcc.dg/crc-from-fedora-packages-19.c: Likewise.
* gcc.dg/crc-from-fedora-packages-20.c: Likewise.
* gcc.dg/crc-from-fedora-packages-21.c: Likewise.
* gcc.dg/crc-from-fedora-packages-22.c: Likewise.
* gcc.dg/crc-from-fedora-packages-23.c: Likewise.
* gcc.dg/crc-from-fedora-packages-24.c: Likewise.
* gcc.dg/crc-from-fedora-packages-25.c: Likewise.
* gcc.dg/crc-from-fedora-packages-26.c: Likewise.
* gcc.dg/crc-from-fedora-packages-27.c: Likewise.
* gcc.dg/crc-from-fedora-packages-28.c: Likewise.
* gcc.dg/crc-from-fedora-packages-29.c: Likewise.
* gcc.dg/crc-from-fedora-packages-30.c: Likewise.
* gcc.dg/crc-from-fedora-packages-31.c: Likewise.
* gcc.dg/crc-from-fedora-packages-32.c: Likewise.
* gcc.dg/crc-linux-1.c: Likewise.
* gcc.dg/crc-linux-2.c: Likewise.
* gcc.dg/crc-linux-3.c: Likewise.
* gcc.dg/crc-linux-4.c: Likewise.
* gcc.dg/crc-linux-5.c: Likewise.
* gcc.dg/crc-not-crc-1.c: Likewise.
* gcc.dg/crc-not-crc-2.c: Likewise.
* gcc.dg/crc-not-crc-3.c: Likewise.
* gcc.dg/crc-not-crc-4.c: Likewise.
* gcc.dg/crc-not-crc-5.c: Likewise.
* gcc.dg/crc-not-crc-6.c: Likewise.
* gcc.dg/crc-not-crc-7.c: Likewise.
* gcc.dg/crc-not-crc-8.c: Likewise.
* gcc.dg/crc-not-crc-9.c: Likewise.
* gcc.dg/crc-not-crc-10.c: Likewise.
* gcc.dg/crc-not-crc-11.c: Likewise.
* gcc.dg/crc-not-crc-12.c: Likewise.
* gcc.dg/crc-not-crc-13.c: Likewise.
* gcc.dg/crc-not-crc-14.c: Likewise.
* gcc.dg/crc-not-crc-15.c: Likewise.
* gcc.dg/crc-not-crc-16.c: Likewise.
* gcc.dg/crc-not-crc-17.c: Likewise.
* gcc.dg/crc-not-crc-18.c: Likewise.
* gcc.dg/crc-not-crc-19.c: Likewise.
* gcc.dg/crc-not-crc-20.c: Likewise.
* gcc.dg/crc-not-crc-21.c: Likewise.
* gcc.dg/crc-not-crc-22.c: Likewise.
* gcc.dg/crc-not-crc-23.c: Likewise.
* gcc.dg/crc-not-crc-24.c: Likewise.
* gcc.dg/crc-not-crc-25.c: Likewise.
* gcc.dg/crc-not-crc-26.c: Likewise.
* gcc.dg/crc-side-instr-1.c: Likewise.
* gcc.dg/crc-side-instr-2.c: Likewise.
* gcc.dg/crc-side-instr-3.c: Likewise.
* gcc.dg/crc-side-instr-4.c: Likewise.
* gcc.dg/crc-side-instr-5.c: Likewise.
* gcc.dg/crc-side-instr-6.c: Likewise.
* gcc.dg/crc-side-instr-7.c: Likewise.
* gcc.dg/crc-side-instr-8.c: Likewise.
* gcc.dg/crc-side-instr-9.c: Likewise.
* gcc.dg/crc-side-instr-10.c: Likewise.
* gcc.dg/crc-side-instr-11.c: Likewise.
* gcc.dg/crc-side-instr-12.c: Likewise.
* gcc.dg/crc-side-instr-13.c: Likewise.
* gcc.dg/crc-side-instr-14.c: Likewise.
* gcc.dg/crc-side-instr-15.c: Likewise.
* gcc.dg/crc-side-instr-16.c: Likewise.
* gcc.dg/crc-side-instr-17.c: Likewise.
* gcc.dg/torture/crc-1.c: Likewise.
* gcc.dg/torture/crc-2.c: Likewise.
* gcc.dg/torture/crc-3.c: Likewise.
* gcc.dg/torture/crc-4.c: Likewise.
* gcc.dg/torture/crc-5.c: Likewise.
* gcc.dg/torture/crc-6.c: Likewise.
* gcc.dg/torture/crc-7.c: Likewise.
* gcc.dg/torture/crc-8.c: Likewise.
* gcc.dg/torture/crc-9.c: Likewise.
* gcc.dg/torture/crc-10.c: Likewise.
* gcc.dg/torture/crc-11.c: Likewise.
* gcc.dg/torture/crc-12.c: Likewise.
* gcc.dg/torture/crc-13.c: Likewise.
* gcc.dg/torture/crc-14.c: Likewise.
* gcc.dg/torture/crc-15.c: Likewise.
* gcc.dg/torture/crc-16.c: Likewise.
* gcc.dg/torture/crc-17.c: Likewise.
* gcc.dg/torture/crc-18.c: Likewise.
* gcc.dg/torture/crc-19.c: Likewise.
* gcc.dg/torture/crc-20.c: Likewise.
* gcc.dg/torture/crc-21.c: Likewise.
* gcc.dg/torture/crc-22.c: Likewise.
* gcc.dg/torture/crc-23.c: Likewise.
* gcc.dg/torture/crc-24.c: Likewise.
* gcc.dg/torture/crc-25.c: Likewise.
* gcc.dg/torture/crc-26.c: Likewise.
* gcc.dg/torture/crc-27.c: Likewise.
* gcc.dg/torture/crc-28.c: Likewise.
* gcc.dg/torture/crc-29.c: Likewise.
* gcc.dg/torture/crc-CCIT-data16-xorOutside_InsideFor.c: Likewise.
* gcc.dg/torture/crc-coremark16-data16.c: Likewise.
* gcc.dg/torture/crc-coremark32-data16.c: Likewise.
* gcc.dg/torture/crc-coremark32-data32.c: Likewise.
* gcc.dg/torture/crc-coremark32-data8.c: Likewise.
* gcc.dg/torture/crc-coremark64-data64.c: Likewise.
* gcc.dg/torture/crc-coremark8-data8.c: Likewise.
* gcc.dg/torture/crc-CCIT-data16.c: Likewise.
* gcc.dg/torture/crc-CCIT-data8.c: Likewise.
* gcc.dg/torture/crc-crc32-data16.c: Likewise.
* gcc.dg/torture/crc-crc32-data24.c: Likewise.
* gcc.dg/torture/crc-crc32-data8.c: Likewise.
* gcc.dg/torture/crc-crc32.c: Likewise.
* gcc.dg/torture/crc-crc64-data32.c: Likewise.
* gcc.dg/torture/crc-crc64-data64.c: Likewise.
* gcc.dg/torture/crc-crc8-data8-loop-xorInFor.c: Likewise.
* gcc.dg/torture/crc-crc8-data8-xorOustideFor.c: Likewise.
* gcc.dg/torture/crc-crc8.c: Likewise.

Co-Authored: Jeff Law <jlaw@ventanamicro.com>

[PATCH v7 11/12] Replace the original CRC loops with a faster CRC calculation

After the loop exit an internal function call (CRC, CRC_REV) is added, and its
result is assigned to the output CRC variable (the variable where the
calculated CRC is stored after the loop execution). The removal of the loop is
left to CFG cleanup and DCE.

gcc/

* gimple-crc-optimization.cc (optimize_crc_loop): New function.
(execute): Add optimize_crc_loop function call.

[PATCH v7 10/12] Verify detected CRC loop with symbolic execution and LFSR matching

Symbolically execute potential CRC loops and check whether the loop actually
calculates CRC (uses LFSR matching). Calculated CRC and created LFSR are
compared on each iteration of the potential CRC loop.

gcc/

* Makefile.in (OBJS): Add crc-verification.o.
* crc-verification.cc: New file.
* crc-verification.h: New file.
* gimple-crc-optimization.cc (loop_calculates_crc): New function.
(is_output_crc): Likewise.
(swap_crc_and_data_if_needed): Likewise.
(validate_crc_and_data): Likewise.
(optimize_crc_loop): Likewise.
(get_output_phi): Likewise.
(execute): Add check whether potential CRC loop calculates CRC.
* sym-exec/sym-exec-state.cc (create_reversed_lfsr): New function.
(create_forward_lfsr): Likewise.
(last_set_bit): Likewise.
(create_lfsr): Likewise.
* sym-exec/sym-exec-state.h (is_bit_vector): Reorder, make the function public and static.
(create_reversed_lfsr) New static function declaration.
(create_forward_lfsr) New static function declaration.

[PATCH v6 09/12] Add symbolic execution support.

Gives an opportunity to execute the code on bit level, assigning
symbolic values to the variables which don't have initial values.
Supports only CRC specific operations.

Example:

uint8_t crc;
uint8_t pol = 1;
crc = crc ^ pol;

during symbolic execution crc's value will be:
crc(8), crc(7), ... crc(1), crc(0) ^ 1

gcc/
* Makefile.in (OBJS): Add sym-exec/sym-exec-expression.o,
sym-exec/sym-exec-state.o, sym-exec/sym-exec-condition.o.
* configure (sym-exec): New subdir.
* sym-exec/sym-exec-condition.cc: New file.
* sym-exec/sym-exec-condition.h: New file.
* sym-exec/sym-exec-expr-is-a-helper.h: New file.
* sym-exec/sym-exec-expression.cc: New file.
* sym-exec/sym-exec-expression.h: New file.
* sym-exec/sym-exec-state.cc: New file.
* sym-exec/sym-exec-state.h: New file.

Co-authored-by: Mariam Arutunian <mariamarutunian@gmail.com>

[PATCH v7 08/12] Add a new pass for naive CRC loops detection

This patch adds a new compiler pass aimed at identifying naive CRC
implementations, characterized by the presence of a loop calculating
a CRC (polynomial long division). Upon detection of a potential CRC,
the pass prints an informational message.

Performs CRC optimization if optimization level is >= 2 and if
fno_gimple_crc_optimization given.

This pass is added for the detection and optimization of naive CRC
implementations, improving the efficiency of CRC-related computations.

This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be
provided in subsequent patches.

gcc/

* Makefile.in (OBJS): Add gimple-crc-optimization.o.
* common.opt (foptimize-crc): New option.
* common.opt.urls: Regenerate to add foptimize-crc.
* doc/invoke.texi (-foptimize-crc): Add documentation.
* gimple-crc-optimization.cc: New file.
* opts.cc (default_options_table): Add OPT_foptimize_crc.
(enable_fdo_optimizations): Enable optimize_crc.
* passes.def (pass_crc_optimization): Add new pass.
* timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
* tree-pass.h (make_pass_crc_optimization): New extern function
declaration.

Write binary annotations for CodeView S_INLINESITE symbols

Add "binary annotations" at the end of CodeView S_INLINESITE symbols,
which are a series of compressed integers that represent how line
numbers map to addresses.

This requires assembler support; you will need commit b3aa594d ("gas:
add .cv_ucomp and .cv_scomp pseudo-directives") in binutils.

gcc/
* configure.ac (HAVE_GAS_CV_UCOMP): New check.
* configure: Regenerate.
* config.in: Regenerate.
* dwarf2codeview.cc (enum binary_annotation_opcode): Define.
(struct codeview_function): Add htab_next and inline_loc;
(struct cv_func_hasher): Define.
(cv_func_htab): New global variable.
(new_codeview_function): Add new codeview_function to hash table.
(codeview_begin_block): Record location of inline block.
(codeview_end_block): Add dummy source line at end of inline block.
(find_line_function): New function.
(write_binary_annotations): New function.
(write_s_inlinesite): Call write_binary_annotations.
(codeview_debug_finish): Delete cv_func_htab.

testsuite: Silence gcc.dg/pr117806.c for default_packed

On default_packed targets like PRU, spurious warnings are emitted:
...workspace/gcc/gcc/testsuite/gcc.dg/pr117806.c:5:3: warning: 'packed' attribute ignored for field of type 'double' [-Wattributes]

Fix by annotating the excess warnings for default_packed targets.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117806.c: Test can spill excess
errors for default_packed targets.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

VN: Don't recurse on for the same value of `a != 0` [PR117859]

Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case.
After adding vn_valueize to the handle the `a ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for a could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.

Note lim was introducing `bool_var2 = bool_var1 != 0` originally but
with the gimple testcase in -2, there is no dependency on what passes
before hand will do.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117859

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs for `lhs != 0`
is the same as the old ones, don't recurse.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117859-1.c: New test.
* gcc.dg/torture/pr117859-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

gimple-lim: Reuse boolean var when moving PHI

While looking into PR 117859, I noticed that LIM
sometimes would produce `bool_var2 = bool_var1 != 0` instead
of just using bool_var2. This patch allows LIM to reuse bool_var1
in the place where bool_var2 was going to be used.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (move_computations_worker): While moving
phi, reuse the lhs of the conditional if it is a boolean type.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix aarch64/sve/acle/general-c/gnu_vectors_[12].c for taking address of vector element

After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.

Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Remove
error message on taking address of an element of a vector.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix aarch64/sve/acle/general-c++/gnu_vectors_[12].C for taking address of vector element

After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.

Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Remove
error message on taking address of an element of a vector.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix sve-sizeless-[12].C for C++98

In C++98 `{ a }` for aggregates can only mean constructing by
each element rather than a copy. This adds the expected error
message for SVE vectors for C++98.

Pushed as obvious after a test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Add error message for line 164
for C++98 only.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix sve-sizeless-[12].C for aggregate change

Since r15-5777-g761cf60218890a, the SVE types are considered
an aggregate since they are now acting similar as a GNU vector.

Pushed as obvious after a quick test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: SVE vectors are now aggregates.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix another issue with sve-sizeless-[12].C

There is a different error message expected on line 165 (for both files).
It was expecting:
error: cannot convert 'svint16_t' to 'sveint8_t' in initialization
But now we get:
error: cannot convert 'svint16_t' to 'signed char' in initialization

This is because we support constructing scalable vectors rather than before.

So just update error message.

Pushed as obvious after a quick test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Update error message for line 165.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: Fix part of sve-sizeless-2.c

r15-5783-gb5df3eefd70064 missed to update part of sve-sizeless-2.C to
include the declaration of the bar function.
This corrects the oversight there.

Pushed as obvious after testing the tecase for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-2.C: Add declaration of bar.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PATCH v3] zero_extend(not) -> xor optimization [PR112398]

This patch adds optimization of the following patterns:

  (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
  (xor:M (zero_extend:M (subreg:N (X:M)), mask))
  ... where the mask is GET_MODE_MASK (N).

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

  (zero_extend:M (subreg:N (not:M (X:M)))) ->
  (xor:M (X:M, mask))

Patch targets to handle code patterns like:
  not   a0,a0
  andi  a0,a0,0xff
to be optimized to:
  xori  a0,a0,255

PR rtl-optimization/112398
PR rtl-optimization/117476

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr112398.c: New test.
* gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su.
* gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.

Daily bump.

libstdc++: Improve new testcase for std::optional assignment [PR117858]

The copy & paste bug affected two assignment operators, so ensure the
new test covers both.

libstdc++-v3/ChangeLog:

PR libstdc++/117858
* testsuite/20_util/optional/assignment/117858.cc: Also test
assignment from rvalue optional.

libstdc++: Fix constraints on std::optional converting assignments [PR117858]

It looks like I copied these constraints from operator=(U&&) and didn't
correct them to account for the parameter being optional<U> not U.

libstdc++-v3/ChangeLog:

PR libstdc++/117858
* include/std/optional (operator=(const optional<U>&)): Fix copy
and paste error in constraints.
(operator=(optional<U>&&)): Likewise.
* testsuite/20_util/optional/assignment/117858.cc: New test.

libstdc++: Move std::monostate to <utility> for C++26 (P0472R2)

Another C++26 paper just approved in Wrocław. The std::monostate class
is defined in <variant> since C++17, but for C++26 it should also be
available in <utility>.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add bits/monostate.h.
* include/Makefile.in: Regenerate.
* include/std/utility: Include <bits/monostate.h>.
* include/std/variant (monostate, hash<monostate>): Move
definitions to ...
* include/bits/monostate.h: New file.
* testsuite/20_util/headers/utility/synopsis.cc: Add monostate
and hash<monostate> declarations.
* testsuite/20_util/monostate/requirements.cc: New test.

libstdc++: Improve test for <utility> synopsis

libstdc++-v3/ChangeLog:

* testsuite/20_util/headers/utility/synopsis.cc: Add
declarations from C++11 and later.

Support for 64-bit location_t: Internal parts

Several of the selftests in diagnostic-show-locus.cc and input.cc are
sensitive to linemap internals. Adjust them here so they will support 64-bit
location_t if configured.

Likewise, handle 64-bit location_t in the support for
-fdump-internal-locations. As was done with the analyzer, convert to
(unsigned long long) explicitly so that 32- and 64-bit can be handled with
the same printf formats.

gcc/ChangeLog:

* diagnostic-show-locus.cc
(test_one_liner_fixit_validation_adhoc_locations): Adapt so it can
effectively test 7-bit ranges instead of 5-bit ranges.
(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
* input.cc (get_end_location): Adjust types to support 64-bit
location_t.
(write_digit_row): Likewise.
(dump_location_range): Likewise.
(dump_location_info): Likewise.
(class line_table_case): Likewise.
(test_accessing_ordinary_linemaps): Replace some hard-coded
constants with the values defined in line-map.h.
(for_each_line_table_case): Likewise.

Support for 64-bit location_t: toplev parts

With the upcoming move from 32-bit to 64-bit location_t, the recommended
number of range bits will change from 5 to 7. line-map.h now exports the
recommended setting, so use that instead of hard-coding 5.

gcc/ChangeLog:

* toplev.cc (general_init): Replace hard-coded constant with
line_map_suggested_range_bits.

Support for 64-bit location_t: Backend parts

A few targets have been using "unsigned int" function arguments that need to
receive a "location_t". Change to "location_t" to prepare for the
possibility that location_t can be configured to be a different type.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin):
Change "unsigned int" argument to "location_t".
* config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise.
* config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): Likewise.
* target.def: Likewise.
* doc/tm.texi: Regenerate.

gimplify: Handle void expression as asm input [PR100501, PR100792]

As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
tests involving a statement expression not returning a value as an asm
input; this includes the variant bug 100792 where the statement
expression ends with another asm statement.

The expected diagnostic for this case (as seen for C++ input) is one
coming from the gimplifier and so it seems reasonable to fix the
gimplifier to handle the GENERIC generated for this case by the C
front end, rather than trying to make the C front end detect it
earlier. Thus the gimplifier to handle a void
expression like other non-lvalues for such a memory input.

Bootstrapped with no regressions for x86_64-pc-linux-gnu. OK to commit?

PR c/100501
PR c/100792

gcc/
* gimplify.cc (gimplify_asm_expr): Handle void expressions for
memory inputs like other non-lvalues.

gcc/testsuite/
* gcc.dg/pr100501-1.c, gcc.dg/pr100792-1.c: New tests.
* gcc.dg/pr48552-1.c, gcc.dg/pr48552-2.c,
gcc.dg/torture/pr98601.c: Update expected errors.

Co-authored-by: Richard Biener <rguenther@suse.de>

Write S_INLINESITE CodeView symbols

Translate DW_TAG_inlined_subroutine DIEs into S_INLINESITE CodeView
symbols, marking inlined functions.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_INLINESITE and
S_INLINESITE_END.
(get_func_id): Add declaration.
(write_s_inlinesite): New function.
(write_inlinesite_records): New function.
(write_function): Call write_inlinesite_records.

Write S_INLINEELINES CodeView subsection

When outputting the .debug$S CodeView section, also write an
S_INLINEELINES subsection, which records the filename and line number of
the start of each inlined function.

gcc/
* dwarf2codeview.cc (DEBUG_S_INLINEELINES): Define.
(CV_INLINEE_SOURCE_LINE_SIGNATURE): Define.
(struct codeview_inlinee_lines): Define.
(struct inlinee_lines_hasher): Define.
(func_htab, inlinee_lines_htab): New global variables.
(get_file_id): New function.
(codeview_source_line): Move file_id logic to get_file_id.
(write_inlinee_lines_entry): New function.
(write_inlinee_lines): New function.
(codeview_debug_finish): Call write_inlinee_lines, and free func_htab
and inlinee_lines_htab.
(get_func_id): New function.
(add_function): Move func_id logic to get_func_id.
(codeview_abstract_function): New function.
* dwarf2codeview.h (codeview_abstract_function): Add declaration.
* dwarf2out.cc (dwarf2out_abstract_function): Call
codeview_abstract_function if outputting CodeView debug info.

Don't output CodeView line numbers for inlined functions

If we encounter an inlined function, treat it as another
codeview_function, and skip over these when outputting line numbers.
This information will instead be output as part of the S_INLINESITE
symbols.

gcc/
* dwarf2codeview.cc (struct codeview_function): Add parent and
inline_block fields.
(cur_func): New global variable.
(new_codeview_function): New function.
(codeview_source_line): Call new_codeview_function, and use cur_func
instead of last_func.
(codeview_begin_block): New function.
(codeview_end_block): New function.
(write_line_numbers): No longer free data as we go along.
(codeview_switch_text_section): Call new_codeview_function, and use
cur_func instead of last_func.
(codeview_end_epilogue): Use cur_func instead of last_func.
(codeview_debug_finish): Free funcs list and its contents.
* dwarf2codeview.h (codeview_begin_block): Add declaration.
(codeview_end_block): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Call codeview_begin_block if
outputting CodeView debug info.
(dwarf2out_end_block): Call codeview_end_block if outputting CodeView
debug info.

Add block parameter to begin_block debug hook

Add a parameter to the begin_block debug hook that is a pointer to the
tree_node of the block in question. CodeView needs this as it records
line numbers of inlined functions in a different manner, so we need to
be able to tell if the block is actually the start of an inlined
function.

gcc/
* debug.cc (do_nothing_debug_hooks): Change begin_block
function pointer.
(debug_nothing_int_int_tree): New function.
* debug.h (struct gcc_debug_hooks): Add tree parameter to begin_block.
(debug_nothing_int_int_tree): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Add tree parameter.
(dwarf2_lineno_debug_hooks): Use new dummy function for begin_block.
* final.cc (final_scan_insn_1): Pass insn block through to
debug_hooks->begin_block.
* vmsdbgout.cc (vmsdbgout_begin_block): Add tree parameter.

AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.

When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable. When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.

strlen: Handle vector CONSTRUCTORs [PR117057]

The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
count_nonzero_bytes, including handling them if they have some elements
non-constant.
If there are still some constant elements before it (in the range queried),
we derive info at least from those bytes and consider the rest as unknown.

The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
considers byte_size 0 as unknown size, determine yourself, so if offset
is equal to exp size, there are 0 bytes to consider (so nothing useful
to determine), but using byte_size 0 would mean use any size.
Similarly, native_encode_expr uses int type for offset (and size), so
padding it offset larger than INT_MAX could be silent miscompilation.

I've guarded the test to just a couple of targets known to handle it,
because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
-mno-vsx, even -m32).

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/117057
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
when byte_size is equal to offset or nchars.  Punt if offset is bigger
than INT_MAX.  Handle vector CONSTRUCTOR with some elements constant,
possibly followed by non-constant.

* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
specific scan-tree-dump-times directive.
* gcc.dg/strlenopt-96.c: New test.

openmp: Add crtoffloadtableS.o and use it [PR117851]

Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
  __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_funcs_end,
  &__offload_var_table, &__offload_vars_end,
  &__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable.  If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false.  Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.

LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector

For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.

LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.

Add '-fdump-tree-optimized' to this testcases.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/lasx-andn-iorn.c:
Add '-fdump-tree-optimized'.
* gcc.target/loongarch/lsx-andn-iorn.c:
Likewise.

LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.

In r15-5327, change the default language version for C compilation from
-std=gnu17 to -std=gnu23.

ISO C99 and C11 allow ceil, floor, round and trunc, and their float and
long double variants, to raise the “inexact” exception,
but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as
integrated into ISO C23, does not allow these functions to do so.

So add '-ffp-int-builtin-inexact' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-frint-scalar.c: Add
'-ffp-int-builtin-inexact'.

c: Set attributes for fields when forming a composite type [PR117806]

We need to call decl_attributes when creating the fields for a composite
type.

PR c/117806

gcc/c/ChangeLog:
* c-typeck.cc (composite_type_internal): Call decl_attributes.

gcc/testsuite/ChangeLog:
* gcc.dg/pr117806.c: New test.

gimplefe: Error recovery for invalid declarations [PR117749]

c_parser_declarator can return null if there was an error,
but c_parser_gimple_declaration was not ready for that.
This fixes that oversight so we don't get an ICE after the error.

Bootstrapped and tested on x86_64-linux-gnu.

PR c/117749

gcc/c/ChangeLog:

* gimple-parser.cc (c_parser_gimple_declaration): Check
declarator to be non-null.

gcc/testsuite/ChangeLog:

* gcc.dg/gimplefe-55.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]

This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.

Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
  enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
       mode = GET_MODE (XEXP (x, 0));
       if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
-       mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+       mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant () - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode.  We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.

The rest of the patch are cleanups.  For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.

More importantly, the function does
  scalar_int_mode smode;
  if (!is_a <scalar_int_mode> (mode, &smode)
      || GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
    return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.

Plus some formatting issues.

What I've left around is
      if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
          || !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
        return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.

2024-11-30  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise.  Use HOST_WIDE_INT_1U instead of
1ULL.  Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant ().  Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.

c++: Implement C++26 P3176R1 - The Oxford variadic comma

While we are already in stage3, I wonder if implementing this small paper
wouldn't be useful even for GCC 15, so that we have in the GCC world one
extra year of deprecation of variadic ellipsis without preceding comma.

The paper just deprecates something, I'd hope most of the C++ code in the
wild when it uses variadic functions at all uses the comma before the
ellipsis.

2024-11-30 Jakub Jelinek <jakub@redhat.com>

gcc/c-family/
* c.opt (Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma.
(cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.

Daily bump.

Rename "libdiagnostics" to "libgdiagnostics"

"libdiagnostics" clashes with an existing soname in Debian, as
per:
https://gcc.gnu.org/pipermail/gcc/2024-November/245175.html

Rename it to "libgdiagnostics" for uniqueness.

I am being deliberately vague about what the "g" stands for:
it could be "gnu", "gcc", or "gpl-licensed" as the reader desires.

ChangeLog:
* configure.ac: Rename "libdiagnostics" to "libgdiagnostics".
* configure: Regenerate.

gcc/ChangeLog:
* Makefile.in: Rename "libdiagnostics" to "libgdiagnostics".
* configure.ac: Likewise.
* configure: Regenerate.
* doc/install.texi: Rename "libdiagnostics" to
"libgdiagnostics".
* doc/libdiagnostics/*: Rename to doc/libgdiagnostics, renaming
"libdiagnostics" to "libgdiagnostics" throughout.
* libdiagnostics++.h: Rename to...
* libgdiagnostics++.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.cc: Rename to...
* libgdiagnostics.cc: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.h: Rename to...
* libgdiagnostics.h: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libdiagnostics.map: Rename to...
* libgdiagnostics.map: ...this, renaming "libdiagnostics" to
"libgdiagnostics" throughout.
* libsarifreplay.cc: Update for renaming of "libdiagnostics"
to "libgdiagnostics".
* libsarifreplay.h: Likewise.
* sarif-replay.cc: Likewise.

gcc/testsuite/ChangeLog:
* libdiagnostics.dg/*: Rename to libgdiagnostics.dg, renaming
"libdiagnostics" to "libgdiagnostics" throughout.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

AVR: Skip the gcc.c-torture/execute/memcpy-a*.c tests.

Skipping these tests on avr since they come up with "memory full",
plus they consume a multiple of the time the rest of the testsuite
takes.

gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c
* gcc.c-torture/execute/memcpy-a2.c
* gcc.c-torture/execute/memcpy-a4.c
* gcc.c-torture/execute/memcpy-a8.c

libbacktrace: use WIN32_LEAN_AND_MEAN, not WIN32_MEAN_AND_LEAN

Patch from awmorgan.

* fileline.c: Use WIN32_LEAN_AND_MEAN, not WIN32_MEAN_AND_LEAN.
* pecoff.c: Likewise.

compiler: increase buffer size to avoid warning

GCC has a new -Wformat-truncation warning that triggers on this code:

../../gcc/go/gofrontend/go-encode-id.cc: In function 'std::string go_encode_id(const std::string&)':
../../gcc/go/gofrontend/go-encode-id.cc:176:48: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 6 [-Werror=format-truncation=]
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                                                ^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:45: note: directive argument in the range [128, 4294967295]
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                                             ^~~~~~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:27: note: 'snprintf' output between 5 and 11 bytes into a destination of size 8
  176 |                   snprintf(buf, sizeof buf, "_x%02x", c);
      |                   ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The code is safe, because the value of c is known to be >= 0 && <= 0xff.
But it's difficult for the compiler to know that.
Bump the buffer size to avoid the warning.

Fixes PR go/117833

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/632455

AVR: Fix some coding rule nits and typos.

gcc/
* config/avr/avr-c.cc: Fix some coding rule nits and typos.
* config/avr/avr-passes.cc: Same
* config/avr/avr.h: Same.
* config/avr/avr.cc: Same.
(avr_function_arg_regno_p, avr_hard_regno_rename_ok)
(avr_epilogue_uses, extra_constraint_Q): Return bool instead of int.
* config/avr/avr-protos.h (avr_function_arg_regno_p)
(avr_hard_regno_rename_ok, avr_epilogue_uses)
(extra_constraint_Q): Return bool instead of int.

aarch64: Add attributes to the data intrinsics.

All of the data intrinsics don't read/write memory nor they are fp related.
So adding the attributes will improve the code generation slightly.

Built and tested for aarch64-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_data_intrinsics): Call
aarch64_get_attributes and update calls to aarch64_general_add_builtin.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: add attributes to the prefetch_builtins

This adds the attributes associated with prefetch to the bultins.
Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the attributes.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
Updete call to aarch64_general_add_builtin in AARCH64_INIT_PREFETCH_BUILTIN.
Add new variable prefetch_attrs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Fix up flags for vget_low_*, vget_high_* and vreinterpret intrinsics

These 3 intrinsics will not raise an fp exception, or read FPCR. These intrinsics,
will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set to
be const expressions too.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
FLAG_NONE instead of FLAG_AUTO_FP.
(VGET_LOW_BUILTIN): Likewise.
(VGET_HIGH_BUILTIN): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Mark __builtin_aarch64_im_lane_boundsi as leaf and nothrow [PR117665]

__builtin_aarch64_im_lane_boundsi is known not to throw or call back into another
function since it will either folded into an NOP or will produce a compiler error.

This fixes the ICE by fixing the missed optimization. It does not fix the underlying
issue with fold_marked_statements; which I filed as PR 117668.

Built and tested for aarch64-linux-gnu.

PR target/117665

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_builtin_functions):
Pass nothrow and leaf as attributes to aarch64_general_add_builtin for
__builtin_aarch64_im_lane_boundsi.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/lane-bound-1.C: New test.
* gcc.target/aarch64/lane-bound-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[PR117770][LRA]: Check hard regs corresponding insn operands for hard reg clobbers

When LRA processes early clobbered hard regs explicitly present in the
insn description, it checks that the hard reg is also used as input.
If the hard reg is not an input also, it is marked as dying.  For the
check LRA processed only input hard reg also explicitly present in the
insn description.  For given PR, the hard reg is used as input as the
operand and is not present explicitly in the insn description and
therefore LRA marked the hard reg as dying.  This results in wrong
allocation and wrong code.  The patch solves the problem by processing
hard regs used as the insn operand.

gcc/ChangeLog:

PR rtl-optimization/117770
* lra-lives.cc: Include ira-int.h.
(process_bb_lives): Check hard regs corresponding insn operands
for dying hard wired reg clobbers.

AVR: target/117681 - Set UNWIND_WORD_MODE to Pmode.

This patch fixes a build warning for libgcc/unwind-sjlj.c
which used word_mode for _Unwind_Word but should use Pmode.

PR target/117681
gcc/
* config/avr/avr.cc (TARGET_UNWIND_WORD_MODE): Define to...
(avr_unwind_word_mode): ...this new static function.

AVR: target/117726 - Better optimize shifts.

This patch splits 2-byte and 3-byte shifts after reload into
a 3-operand byte shift and a residual 2-operand shift.

The "2op" shift insn alternatives are not needed and removed because
all shift insn already have a "r,0,n" alternative that does the job.

PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_shift_is_3op, avr_emit_shift):
Also handle 2-byte and 3-byte shifts.
(avr_split_shift4, avr_split_shift3, avr_split_shift2): New
local helper functions.
(avr_split_shift): Use them.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Adjust comments.
* config/avr/avr.cc (avr_out_ashlpsi3, avr_out_ashrpsi3)
(avr_out_lshrpsi3): Support offset 15.
(ashrhi3_out): Support offset 7 as 3-op.
(ashrsi3_out): Support offset 15.
(avr_rtx_costs_1): Adjust shift costs.
* config/avr/avr.md (2op): Remove attribute value and all such insn
alternatives.
(ashlhi3, *ashlhi3, *ashlhi3_const): Add 3-op alternatives like C2l.
(ashrhi3, *ashrhi3, *ashrhi3_const): Add 3-op alternatives like C2a.
(lshrhi3, *lshrhi3, *lshrhi3_const): Add 3-op alternatives like C2r.
(*ashlpsi3_split, *ashlpsi3): Add 3-op alternatives C15 and C3l.
(*ashrpsi3_split, *ashrpsi3): Add 3-op alternatives C15 and C3r.
(*lshrpsi3_split, *lshrpsi3): Add 3-op alternatives C15 and C3r.
(ashlsi3, *ashlsi3, *ashlsi3_const): Remove "2op" alternative.
(ashrsi3, *ashrsi3, *ashrsi3_const): Same.
(lshrsi3, *lshrsi3, *lshrsi3_const): Same.
(constr_split_suffix): Code attr morphed from constr_split_shift4.
* config/avr/constraints.md (C2a, C2r, C2l)
(C3a, C3r, C3l): New constraints.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Adjust doc.

aarch64: Fix build failure due to missing header

Including the "arm_acle.h" header in aarch64-unwind.h requires
stdint.h to be present and it may not be available during the
first stage of cross-compilation of GCC.

When cross-building GCC for the aarch64-none-linux-gnu target
(on any supporting host) using the 3-stage bootstrap build
process when we build native compiler from source, libgcc fails
to compile due to missing header that has not been installed yet.

This could be worked around but it's better to fix the issue.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_CHKFEAT_GCS): Add.

arm, mve: Detect uses of vctp_vpr_generated inside subregs

Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that. Using
reg_overlap_mentioned_p is much more robust.

gcc/ChangeLog:

PR target/117814
* config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.

arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning

This fixes a testism introduced by the warning produced with the -std=c23
default. The testcase is a reduced piece of code meant to trigger an ICE, so
there's little value in trying to change the code itself.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-loop-form.c: Add -std=c99 to avoid warning
message.

arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c

After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.

gcc/testsuite/ChangeLog:

PR target/117814
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.

[PATCH v7 03/12] RISC-V: Add CRC expander to generate faster CRC.

If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation.  Otherwise, if the target is ZBKB, generates table-based
CRC, but for reversing inputs and the output uses bswap and brev8
instructions.  Add new tests to check CRC generation for ZBC, ZBKC and
ZBKB targets.

gcc/

* expr.cc (gf2n_poly_long_div_quotient): New function.
* expr.h (gf2n_poly_long_div_quotient):  New function declaration.
* hwint.cc (reflect_hwi): New function.
* hwint.h (reflect_hwi): New function declaration.
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New
expander for reversed CRC.
(crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
* config/riscv/iterators.md (SUBX1, ANYI1): New iterators.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
New function declaration.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev): New
function.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

* doc/sourcebuild.texi: Document new target selectors.

gcc/testsuite
* lib/target-supports.exp (check_effective_target_riscv_zbc): New
target supports predicate.
(check_effective_target_riscv_zbkb): Likewise.
(check_effective_target_riscv_zbkc): Likewise.
(check_effective_target_zbc_ok): Likewise.
(check_effective_target_zbkb_ok): Likewise.
(check_effective_target_zbkc_ok): Likewise.
(riscv_get_arch): Add zbkb and zbkc support.

* gcc.target/riscv/crc-builtin-zbc32.c: New file.
* gcc.target/riscv/crc-builtin-zbc64.c: Likewise.

    Co-author: Jeff Law  <jlaw@ventanamicro.com>

RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod extensions.

This commit adds testcases for Xsfvqmaccqoq/dod.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: New test.

RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

This commit adds intrinsics support for Xsfvqmaccqoq/dod.

Co-Authored by: Jiawei Chen <jiawei@iscas.ac.cn>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>
Co-Authored by: Yixuan Chen <chenyixuan@iscas.ac.cn>

gcc/ChangeLog:

* config.gcc: Add new SiFive *.o files.
* config/riscv/generic-vector-ooo.md: New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-vector-builtins-shapes.cc (struct sf_vqmacc_def): New function.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_QMACC_OPS): New macros type.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_QMACC_OPS): New builtins def.
(DEF_RVV_TYPE_INDEX): Ditto.
(DEF_RVV_FUNCTION): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): New types def.
(4x8x4): New op type.
(2x8x2): Ditto.
(quad_emul_vector): New base type.
(quad_emul_signed_vector): Ditto.
(quad_emul_unsigned_vector): Ditto.
(quad_fixed_vector): Ditto.
(quad_fixed_signed_vector): Ditto.
(quad_fixed_unsigned_vector): Ditto.
(quad_lmul1_vector): Ditto.
(quad_lmul1_signed_vector): Ditto.
(quad_lmul1_unsigned_vector): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New extensions.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/t-riscv: Add include for SiFive files.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md: New include for SiFive file.
* config/riscv/sifive-vector-builtins-bases.cc: New file.
* config/riscv/sifive-vector-builtins-bases.h: New file.
* config/riscv/sifive-vector-builtins-functions.def: New file.
* config/riscv/sifive-vector.md: New file.

c: Correct type compatibility for bit-fields [PR117828]

Add missing test for consistency of bit-fields when comparing tagged
types for compatibility.

PR c/117828

gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Add check.

gcc/testsuite/ChangeLog:
* gcc.dg/c23-tag-bitfields-1.c: New test.
* gcc.dg/pr117828.c: New test.

AArch64: Suppress default options when march or mcpu used is not affected by it.

This patch makes it so that when you use any of the Cortex-A53 errata
workarounds but have specified an -march or -mcpu we know is not affected by it
that we suppress the errata workaround.

This is a driver only patch as the linker invocation needs to be changed as
well.  The linker and cc SPECs are different because for the linker we didn't
seem to add an inversion flag for the option.  That said, it's also not possible
to configure the linker with it on by default.  So not passing the flag is
sufficient to turn it off.

For the compilers however we have an inversion flag using -mno-, which is needed
to disable the workarounds when the compiler has been configured with it by
default.

In case it's unclear how the patch does what it does (it took me a while to
figure out the syntax):

  * Early matching will replace any -march=native or -mcpu=native with their
    expanded forms and erases the native arguments from the buffer.
  * Due to the above if we ensure we handle the new code after this erasure then
    we only have to handle the expanded form.
  * The expanded form needs to handle -march=<arch>+extensions and
    -mcpu=<cpu>+extensions and so we can't use normal string matching but
    instead use strstr with a custom driver function that's common between
    native and non-native builds.
  * For the compilers we output -mno-<workaround> and for the linker we just
  erase the --fix-<workaround> option.
  * The extra internal matching, e.g. the duplicate match of mcpu inside:
  mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob
  using %* because the outer match would otherwise reset at the %{.  The reason
  for the outer glob at all is to skip the block early if no matches are found.

The workaround has the effect of suppressing certain inlining and multiply-add
formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
cores.  This patch is needed because most distros configure GCC with the
workaround enabled by default.

Expected output:

> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
5

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-mfix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1

> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-\-fix" | wc -l
1

gcc/ChangeLog:

* config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
CA53_ERR_843419_COMPILE_SPEC): New.
(CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
* config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
AARCH64_ERRATA_COMPILE_SPEC.
* config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* common/config/aarch64/aarch64-common.cc
(is_host_cpu_not_armv8_base): New.
* config/aarch64/driver-aarch64.cc: Remove extra newline
* config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base.
(EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base.
* doc/invoke.texi: Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_30: New test.
* gcc.target/aarch64/cpunative/info_31: New test.
* gcc.target/aarch64/cpunative/info_32: New test.
* gcc.target/aarch64/cpunative/info_33: New test.
* gcc.target/aarch64/cpunative/native_cpu_30.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_31.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_32.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_33.c: New test.
* gcc.target/aarch64/erratas_opt_0.c: New test.
* gcc.target/aarch64/erratas_opt_1.c: New test.
* gcc.target/aarch64/erratas_opt_10.c: New test.
* gcc.target/aarch64/erratas_opt_11.c: New test.
* gcc.target/aarch64/erratas_opt_12.c: New test.
* gcc.target/aarch64/erratas_opt_13.c: New test.
* gcc.target/aarch64/erratas_opt_14.c: New test.
* gcc.target/aarch64/erratas_opt_15.c: New test.
* gcc.target/aarch64/erratas_opt_2.c: New test.
* gcc.target/aarch64/erratas_opt_3.c: New test.
* gcc.target/aarch64/erratas_opt_4.c: New test.
* gcc.target/aarch64/erratas_opt_5.c: New test.
* gcc.target/aarch64/erratas_opt_6.c: New test.
* gcc.target/aarch64/erratas_opt_7.c: New test.
* gcc.target/aarch64/erratas_opt_8.c: New test.
* gcc.target/aarch64/erratas_opt_9.c: New test.

aarch64: Add ISA requirements to some SVE/SME md comments

The SVE and SME md files are divided into sections, with each
section often starting with a comment that lists the associated
mnemonics. These lists usually include the base architecture
requirement in parentheses, if the base requirement is greater
than the baseline for the file. This patch tries to be more
consistent about when we do that for the recently added SVE2p1
and SME2p1 extensions.

gcc/
* config/aarch64/aarch64-sme.md: In the section comments, add the
architecture requirements alongside some mnemonics.
* config/aarch64/aarch64-sve2.md: Likewise.

aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm

The first two are available under a combination of the FP8DOT4 and SVE2 features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8dot4, ssve-fp8dot4): Add new extensions.
(fp8dot2, ssve-fp8dot2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8.
(svdotprod_lane_impl): Likewise.
(svdot_lane): Provide an unspec for fp8 types.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_mfloat8_def): Add new class.
(ternary_mfloat8): Add new shape.
(ternary_mfloat8_lane_group_selection_def): Add new class.
(ternary_mfloat8_lane_group_selection): Add new shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal
with the combination of features providing support for 32 and 16 bit
floating point.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Add new.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/aarch64.h:
(TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines.
(TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise.
* config/aarch64/iterators.md
(UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs.
* doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2
extensions.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new.
gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise.
* lib/target-supports.exp: Add dg-require-effective-target support for
aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok,
aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.

aarch64: add SVE2 FP8 multiply accumulate intrinsics

This patch adds support for the following intrinsics:
- svmlalb[_f16_mf8]_fpm
- svmlalb[_n_f16_mf8]_fpm
- svmlalt[_f16_mf8]_fpm
- svmlalt[_n_f16_mf8]_fpm
- svmlalb_lane[_f16_mf8]_fpm
- svmlalt_lane[_f16_mf8]_fpm
- svmlallbb[_f32_mf8]_fpm
- svmlallbb[_n_f32_mf8]_fpm
- svmlallbt[_f32_mf8]_fpm
- svmlallbt[_n_f32_mf8]_fpm
- svmlalltb[_f32_mf8]_fpm
- svmlalltb[_n_f32_mf8]_fpm
- svmlalltt[_f32_mf8]_fpm
- svmlalltt[_n_f32_mf8]_fpm
- svmlallbb_lane[_f32_mf8]_fpm
- svmlallbt_lane[_f32_mf8]_fpm
- svmlalltb_lane[_f32_mf8]_fpm
- svmlalltt_lane[_f32_mf8]_fpm

These are available under a combination of the FP8FMA and SVE2 features.
Alternatively under the SSVE_FP8FMA feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8fma, ssve-fp8fma): Add new options.
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Add unspec_for_mfp8.
(unspec_for): Return unspec_for_mfp8 on fpm-using cases.
(sme_1mode_function): Fix call to parent ctor.
(sme_2mode_function_t): Likewise.
(unspec_based_mla_function, unspec_based_mla_lane_function): Handle
fpm-using cases.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Treat M as TYPE_SUFFIX_mf8
(ternary_mfloat8_lane_def): Add new class.
(ternary_mfloat8_opt_n_def): Likewise.
(ternary_mfloat8_lane): Add new shape.
(ternary_mfloat8_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8_lane, ternary_mfloat8_opt_n): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svmlalb_lane, svmlalb, svmlalt_lane, svmlalt): Update definitions
with mfloat8_t unspec in ctor.
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt, svmlal_impl): Add new FUNCTIONs.
(svqrshr, svqrshrn, svqrshru, svqrshrun): Update definitions with
nop mfloat8 unspec in ctor.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svmlalb, svmlalt, svmlalb_lane, svmlalt_lane, svmlallbb, svmlallbt,
svmlalltb, svmlalltt, svmlalltt_lane, svmlallbb_lane, svmlallbt_lane,
svmlalltb_lane): Add new DEF_SVE_FUNCTION_GS_FPMs.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_h_float_mf8, TYPES_s_float_mf8): Add new types.
(h_float_mf8, s_float_mf8): Add new SVE_TYPES_ARRAY.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx8hf><mode>): Add new.
(@aarch64_sve_add_<sve2_fp8_fma_op_vnx4sf><mode>): Add new.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx8hf><mode>): Likewise.
(@aarch64_sve_add_lane_<sve2_fp8_fma_op_vnx4sf><mode>): Likewise.
* config/aarch64/aarch64.h
(TARGET_FP8FMA, TARGET_SSVE_FP8FMA): Likewise.
* config/aarch64/iterators.md
(VNx8HF_ONLY): Add new.
(UNSPEC_FMLALB_FP8, UNSPEC_FMLALLBB_FP8, UNSPEC_FMLALLBT_FP8,
UNSPEC_FMLALLTB_FP8, UNSPEC_FMLALLTT_FP8, UNSPEC_FMLALT_FP8): Likewise.
(SVE2_FP8_TERNARY_VNX8HF, SVE2_FP8_TERNARY_VNX4SF): Likewise.
(SVE2_FP8_TERNARY_LANE_VNX8HF, SVE2_FP8_TERNARY_LANE_VNX4SF): Likewise.
(sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Likewise.
* doc/invoke.texi: Document fp8fma and sve-fp8fma extensions.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z_REV, TEST_DUAL_LANE_REG, TEST_DUAL_ZD) Add fpm0 argument.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_opt_n_1.c: Add
new shape test.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Add new test.
* gcc.target/aarch64/sve2/acle/asm/mlalb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_mf8.c: Likewise.
* lib/target-supports.exp: Add check_effective_target for fp8fma and
ssve-fp8fma

aarch64: add svcvt* FP8 intrinsics

This patch adds the following intrinsics:
- svcvt1_bf16[_mf8]_fpm
- svcvt1_f16[_mf8]_fpm
- svcvt2_bf16[_mf8]_fpm
- svcvt2_f16[_mf8]_fpm
- svcvtlt1_bf16[_mf8]_fpm
- svcvtlt1_f16[_mf8]_fpm
- svcvtlt2_bf16[_mf8]_fpm
- svcvtlt2_f16[_mf8]_fpm
- svcvtn_mf8[_f16_x2]_fpm (unpredicated)
- svcvtnb_mf8[_f32_x2]_fpm
- svcvtnt_mf8[_f32_x2]_fpm

The underlying instructions are only available when SVE2 is enabled and the PE
is not in streaming SVE mode. They are also available when SME2 is enabled and
the PE is in streaming SVE mode.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_signature): Add an fpm_t (uint64_t) argument to functions that
set the fpm register.
(unary_convertxn_narrowt_def): New class.
(unary_convertxn_narrowt): New shape.
(unary_convertxn_narrow_def): New class.
(unary_convertxn_narrow): New shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(unary_convertxn_narrowt): Declare.
(unary_convertxn_narrow): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svcvt_fp8_impl): New class.
(svcvtn_impl): Handle fp8 cases.
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new FUNCTION.
(svcvtnb): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new DEF_SVE_FUNCTION_GS_FPM.
(svcvtn): Likewise.
(svcvtnb, svcvtnt): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svcvt1, svcvt2, svcvtlt1, svcvtlt2, svcvtnb, svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_cvt_mf8, TYPES_cvtn_mf8, TYPES_cvtnx_mf8): Add new types arrays.
(function_builder::get_name): Append _fpm to functions that set fpmr.
(function_resolver::check_gp_argument): Deal with the fpm_t argument.
(function_expander::expand): Set the fpm register before
calling the insn if the function warrants it.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_fp8_cvt): Add new.
(@aarch64_sve2_fp8_cvtn): Likewise.
(@aarch64_sve2_fp8_cvtnb): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64.h (TARGET_SSVE_FP8): Add new.
* config/aarch64/iterators.md
(VNx8SF_ONLY, SVE_FULL_HFx2): New mode iterators.
(UNSPEC_F1CVT, UNSPEC_F1CVTLT, UNSPEC_F2CVT, UNSPEC_F2CVTLT): Add new.
(UNSPEC_FCVTNB, UNSPEC_FCVTNT): Likewise.
(UNSPEC_FP8FCVTN): Likewise.
(FP8CVT_UNS, fp8_cvt_uns_op): Likewise.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z): Add fpm0 argument
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Add new tests.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnt_mf8.c: Likewise.
* lib/target-supports.exp: Add aarch64_asm_fp8_ok check.

aarch64: specify fpm mode in function instances and groups

Some intrinsics require setting the fpm register before calling the
specific asm opcode required.
In order to simplify review, this patch:
- adds the fpm_mode_index attribute to function_group_info and
function_instance objects
- updates existing initialisations and call sites.
- updates equality and hash operations

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svdiv_impl): Specify FPM_unused when folding.
(svmul_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(build_one): Use the group fpm_mode when creating function instances.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svaba_impl, svqrshl_impl, svqshl_impl,svrshl_impl, svsra_impl):
Specify FPM_unused when folding.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Set
fpm_mode on all elements.
(neon_sve_function_groups, sme_function_groups): Likewise.
(function_instance::hash): Include fpm_mode in hash.
(function_builder::add_overloaded_functions): Use the group fpm mode.
(function_resolver::lookup_form): Use the function instance fpm_mode
when looking up a function.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_FUNCTION_GS_FPM): add define.
(DEF_SVE_FUNCTION_GS): redefine against DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins.h (fpm_mode_index): New.
(function_group_info): Add fpm_mode.
(function_instance): Likewise.
(function_instance::operator==): Handle fpm_mode.

aarch64: Add basic svmfloat8_t support to arm_sve.h

This patch adds support for the fp8 related vectors to arm_sve.h. It also
adds support for functions that just treat mfloat8_t as a bag of 8 bits
(reinterpret casts, vector manipulation, loads, stores, element selection,
vector tuples, table lookups, sve<->simd bridge); these functions are
available for fp8 whenever they're available for other 8-bit types.
Arithmetic operations, bit manipulation, conversions are notably absent.

The generated asm is mostly consistent with the _u8 equivalents and this can be
used to validate tests, except where immediates are used. These cannot be
expressed for mf8 and thus we resort to the use of function arguments found in
registers w(0-9).

gcc/
* config/aarch64/aarch64-sve-builtins.cc (TYPES_b_data): Add mf8.
(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
* config/aarch64/aarch64-sve-builtins.def (svmfloat8_t): New type.
(mf8): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_mfloat): New
type_class_index.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Test mangling
of svmfloat8_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise for
__SVMfloat8_t.
* gcc.target/aarch64/sve/acle/asm/clasta_mf8.c: New test.
* gcc.target/aarch64/sve/acle/asm/clastb_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create2_1.c (create2_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/create3_1.c (create_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c (create_mf8): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ext_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/get4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/insr_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lasta_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lastb_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1ro_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldff1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnf1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ldnt1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/len_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c
(reinterpret_bf16_mf8_tied1, reinterpret_bf16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c
(reinterpret_f16_mf8_tied1, reinterpret_f16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c
(reinterpret_f32_mf8_tied1, reinterpret_f32_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c
(reinterpret_f64_mf8_tied1, reinterpret_f64_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c
(reinterpret_s16_mf8_tied1, reinterpret_s16_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c
(reinterpret_s32_mf8_tied1, reinterpret_s32_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c
(reinterpret_s64_mf8_tied1, reinterpret_s64_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c
(reinterpret_s8_mf8_tied1, reinterpret_s8_mf8_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c
(reinterpret_u16_mf8_tied1, reinterpret_u16_mf8_untied)
(reinterpret_u16_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c
(reinterpret_u32_mf8_tied1, reinterpret_u32_mf8_untied)
(reinterpret_u32_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c
(reinterpret_u64_mf8_tied1, reinterpret_u64_mf8_untied)
(reinterpret_u64_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c
(reinterpret_u8_mf8_tied1, reinterpret_u8_mf8_untied)
(reinterpret_u8_mf8_x3_untied): Likewise.
* gcc.target/aarch64/sve/acle/asm/rev_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sel_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set_neonq_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/set4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/splice_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st3_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/st4_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/stnt1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/tbl_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/undef2_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef3_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef4_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/undef_1.c (mfloat8_t): Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2_mf8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_1.c (ret_mf8, ret_mf8x2)
(ret_mf8x3, ret_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_3.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_4.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_5.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_6.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/annotate_7.c (fn_mf8, fn_mf8x2)
(fn_mf8x3, fn_mf8x4): Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_le_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_be_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_6_le_mf8.c: Likewise.
* gcc.target/aarch64/sve/pcs/gnu_vectors_1.c (mfloat8x32_t): New
typedef.
(mfloat8_callee, mfloat8_caller): New tests.
* gcc.target/aarch64/sve/pcs/gnu_vectors_2.c (mfloat8x32_t): New
typedef.
(mfloat8_callee, mfloat8_caller): New tests.
* gcc.target/aarch64/sve/pcs/return_4_128.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_256.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_512.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_1024.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4_2048.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_4.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_128.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_256.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_512.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_1024.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5_2048.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_5.c
(CALLER_NON_NUMERIC): Renamed CALLER_BF16 macro.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_128.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_256.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_512.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_1024.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_6_2048.c (mfloat8_t): New typedef.
(callee_mf8, caller_mf8): New tests.
* gcc.target/aarch64/sve/pcs/return_7.c (callee_mf8): New tests.
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/return_8.c (callee_mf8): Likewise
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/return_9.c (callee_mf8): Likewise
(caller_mf8): Likewise.
* gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: New tests
* gcc.target/aarch64/sve2/acle/asm/tbl2_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/tbx_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilerw_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/whilewr_mf8.c: Likewise.

tree-optimization/115438 - SLP reduction vect vs. bwaves

503.bwaves_r shows a case where the non-SLP optimization of performing
the reduction adjustment with the initial value as part of the epilogue
rather than including it as part of the initial vector value. It allows
to break a critical dependence path. The following restores this
ability for single-lane SLP.

On Zen2 this turns a 2.5% regression from GCC 14 into a 2.5%
improvement.

PR tree-optimization/115438
* tree-vect-loop.cc (vect_transform_cycle_phi): For SLP also
try to do the reduction adjustment by the initial value
in the epilogue.

cp: Fix another assumption in the FE about constant vector indices.

This patch adds a change to handle VLA's poly indices.

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array_1): Handle poly indices.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

aarch64: Update SVE ACLE tests

This patch updates existing SVE ACLE tests to expect new behaviour wrt SVE ACLE
types, GNU vectors and C/C++ operations.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Update test.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.

aarch64: Add testcase for C/C++ ops on SVE ACLE types.

This patch adds a test case to cover C/C++ operators on SVE ACLE types. This
does not cover all types, but covers most representative types.

gcc/testsuite:

* gcc.target/aarch64/sve/acle/general/cops.c: New test.

c: Fix constructor bounds checking for VLA and construct VLA vector constants

This patch adds support for checking bounds of SVE ACLE vector initialization
constructors. It also adds support to construct vector constant from init
constructors.

gcc/c/ChangeLog:

* c-typeck.cc (process_init_element): Add check to restrict
constructor length to the minimum vector length allowed.

gcc/ChangeLog:

* tree.cc (build_vector_from_ctor): Add support to construct VLA vector
constants from init constructors.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Update test to
test initialize error.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.

gimple: Handle variable-sized vectors in BIT_FIELD_REF

Handle variable-sized vectors for BIT_FIELD_REF canonicalization.

gcc/ChangeLog:

* gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Handle variable
sized vector types in BIT_FIELD_REF canonicalization.
* tree-cfg.cc (verify_types_in_gimple_reference): Change object-size-
checking for BIT_FIELD_REF to error offsets that are known_gt to be
outside object-size. Out-of-range offsets can happen in the case of
indices that reference VLA SVE vector elements that may be outside the
minimum vector size range and therefore maybe_gt is not appropirate
here.

c: Range-check indexing of SVE ACLE vectors

This patch adds a check for non-GNU vectors to warn that the index is outside
the range of a fixed vector size. For VLA vectors, we don't diagnose.

gcc/c-family/ChangeLog:

* c-common.cc (convert_vector_to_array_for_subscript): Add
range-check for target vector types.

aarch64: Make C/C++ operations possible on SVE ACLE types.

This patch changes the TYPE_INDIVISBLE flag to 0 to enable SVE ACLE types to be
treated as GNU vectors and have the same semantics with operations that are
defined on GNU vectors.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Flip
TYPE_INDIVISBLE flag for SVE ACLE vector types.

aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS

This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate
that C/C++ language operations are available natively on SVE ACLE types.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_SVE_VECTOR_OPERATORS.

aarch64: add ACLE macro _CHKFEAT_GCS

gcc/ChangeLog:
* config/aarch64/arm_acle.h (_CHKFEAT_GCS): New.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Update

Reviewed-by: Richard Sandiford <richard.sandiford@arm.com>

testsuite: Add check vect_unpack for pr117776.cc [PR117844]

I had missed that you need to check vect_unpack if you are
vectorizing a conversion from char to int.

Pushed as obvious after a quick test.

PR testsuite/117844
gcc/testsuite/ChangeLog:

* g++.dg/vect/pr117776.cc: Check vect_unpack.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself to write after approval.

gimple-fold: Fix up type_has_padding_at_level_p [PR117065]

The following testcase used to ICE on the trunk since the clear small
object if it has padding optimization before my r15-5746 change,
now it doesn't just because type_has_padding_at_level_p isn't called
on the testcase.

Though, as the testcase shows, structures/unions which contain erroneous
types of one or more of its members can have TREE_TYPE of the FIELD_DECL
error_mark_node, on which we can crash.

E.g. the __builtin_clear_padding lowering just ignores those:
            if (TREE_TYPE (field) == error_mark_node)
              continue;
and
                if (ftype == error_mark_node)
                  continue;
It doesn't matter much what exactly we do for those cases, as we are going
to fail the compilation anyway, but we shouldn't crash.

So, the following patch ignores those in type_has_padding_at_level_p.
For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should
cover already the erroneous fields (and we don't use TYPE_SIZE on those).

2024-11-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117065
* gimple-fold.cc (type_has_padding_at_level_p) <case UNION_TYPE>:
Also continue if f has error_mark_node type.

* gcc.dg/pr117065.c: New test.

__builtin_prefetch fixes [PR117608]

The r15-4833-ge9ab41b79933 patch had among tons of config/i386
specific changes also important change to the generic code, allowing
also 2 as valid value of the second argument of __builtin_prefetch:
-  /* Argument 1 must be either zero or one.  */
-  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
+  /* Argument 1 must be 0, 1 or 2.  */
+  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)

But the patch failed to document that change in __builtin_prefetch
documentation, and more importantly didn't adjust any of the other
backends to deal with it (my understanding is the expected behavior
is that 2 will be silently handled as 0 unless backends have some
more specific way).  Some of the backends would ICE on it, in some
cases gcc_assert failures/gcc_unreachable, in other cases crash later
(e.g. accessing arrays with that value as index and due to accessing
garbage after the array crashing at final.cc time), others treated 2
silently as 0, others treated 2 silently as 1.

And even in the i386 backend there were bugs which caused ICEs.
The patch added some if (write == 0) and write 2 handling into
a (badly indented, maybe that is the reason, if (write == 1) body),
rather than into the else side, so it would be always false.

The new *prefetch_rst2 define_insn only accepts parameters 2 1
(i.e. read-shared with moderate degree of locality), so in order
not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
of the parameter.  If that isn't what we want and we want it to be used
also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
corresponding __builtin_ia32_prefetch, maybe the define_insn could match
other values.
And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
would ICE on most of the prefetches, so I had to add the FAIL; cases.

2024-11-29  Jakub Jelinek  <jakub@redhat.com>

PR target/117608
* doc/extend.texi (__builtin_prefetch): Document that second
argument may be also 2 and its meaning.
* config/i386/i386.md (prefetch): Remove unreachable code.
Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
of locality is not 1.  Formatting fixes.
* config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
Call gen_prefetch even for TARGET_MOVRS.
* config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
* config/mips/mips.md (prefetch): Likewise.
* config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
* config/riscv/riscv.md (prefetch): Likewise.
* config/loongarch/loongarch.md (prefetch): Likewise.
* config/sparc/sparc.md (prefetch): Likewise.  Use IN_RANGE.
* config/ia64/ia64.md (prefetch): Likewise.
* config/pa/pa.md (prefetch): Likewise.
* config/aarch64/aarch64.md (prefetch): Likewise.
* config/rs6000/rs6000.md (prefetch): Likewise.

* gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
2.
* gcc.target/i386/pr117608-1.c: New test.
* gcc.target/i386/pr117608-2.c: New test.

fortran: Add default to switch in gfc_trans_transfer [PR117843]

This fixes a bootstrap failure due to a warning on enum values not being
handled. In this case, it is just checking two values and the rest should
are not handled so adding a default case fixes the issue.

Pushed as obvious.

PR fortran/117843
gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): Add default case.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

ifcombine: avoid unsound forwarder-enabled combinations [PR117723]

When ifcombining contiguous blocks, we can follow forwarder blocks and
reverse conditions to enable combinations, but when there are
intervening blocks, we have to constrain ourselves to paths to the
exit that share the PHI args with all intervening blocks.

Avoiding considering forwarders when intervening blocks were present
would match the preexisting test, but we can do better, recording in
case a forwarded path corresponds to the outer block's exit path, and
insisting on not combining through any other path but the one that was
verified as corresponding.  The latter is what this patch implements.

While at that, I've fixed some typos, introduced early testing before
computing the exit path to avoid it when computing it would be
wasteful, or when avoiding it can enable other sound combinations.

for  gcc/ChangeLog

PR tree-optimization/117723
* tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record
forwarder blocks in path to exit, and stick to them.  Avoid
computing the exit if obviously not needed, and if that
enables additional optimizations.
(tree_ssa_ifcombine_bb_1): Fix typos.

for  gcc/testsuite/ChangeLog

PR tree-optimization/117723
* gcc.dg/torture/ifcmb-1.c: New.

Daily bump.

Fortran: Check for impure subroutine.

PR fortran/117765

gcc/fortran/ChangeLog:

* resolve.cc (pure_subroutine): Check for an impure subroutine
call in a BLOCK construct nested within a DO CONCURRENT block.

gcc/testsuite/ChangeLog:

* gfortran.dg/impure_fcn_do_concurrent.f90: Update test to catch
calls to an impure subroutine.

Fortran: fix crash with bounds check writing array section [PR117791]

PR fortran/117791

gcc/fortran/ChangeLog:

* trans-io.cc (gfc_trans_transfer): When an array index depends on
a function evaluation or an expression, do not use optimized array
I/O of an array section and fall back to normal scalarization.

gcc/testsuite/ChangeLog:

* gfortran.dg/bounds_check_array_io.f90: New test.

libstdc++: Use std::_Destroy in std::stacktrace

This benefits from the optimizations in std::_Destroy which avoid doing
any work when using std::allocator.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::_M_impl::_M_resize):
Use std::_Destroy to destroy removed elements.