git.ipfire.org Git - thirdparty/gcc.git/log

Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning

... in order to demonstrate unexpected behavior (XFAILed here).

PR target/104957
gcc/testsuite/
* gcc.target/nvptx/alias-1.c: Enhance assembler scanning.
* gcc.target/nvptx/alias-2.c: Likewise.
* gcc.target/nvptx/alias-3.c: Likewise.
* gcc.target/nvptx/alias-4.c: Likewise.
* gcc.target/nvptx/alias-to-alias-1.c: Likewise.

Fix 'gcc.target/nvptx/alias-2.c' comment

PR target/104957
gcc/testsuite/
* gcc.target/nvptx/alias-2.c: Fix comment.

Move from 'gcc.target/nvptx/nvptx.exp' into 'target-supports.exp' additions for nvptx target

gcc/testsuite/
* gcc.target/nvptx/nvptx.exp
(check_effective_target_default_ptx_isa_version_at_least)
(check_effective_target_default_ptx_isa_version_at_least_6_0)
(check_effective_target_runtime_ptx_isa_version_at_least)
(check_effective_target_runtime_ptx_alias)
(add_options_for_ptx_alias): Move...
* lib/target-supports.exp
(check_nvptx_default_ptx_isa_version_at_least)
(check_effective_target_nvptx_default_ptx_isa_version_at_least_6_0)
(check_nvptx_runtime_ptx_isa_version_at_least)
(check_effective_target_nvptx_runtime_alias_ptx)
(add_options_for_nvptx_alias_ptx): ... here.
* gcc.target/nvptx/alias-1.c: Adjust.
* gcc.target/nvptx/alias-2.c: Likewise.
* gcc.target/nvptx/alias-3.c: Likewise.
* gcc.target/nvptx/alias-4.c: Likewise.
* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
* gcc.target/nvptx/alias-weak-1.c: Likewise.
* gcc.target/nvptx/uniform-simt-5.c: Likewise.
gcc/
* doc/sourcebuild.texi (Effective-Target Keywords): Document
'nvptx_default_ptx_isa_version_at_least_6_0',
'nvptx_runtime_alias_ptx'.
(Add Options): Document 'nvptx_alias_ptx'.

c++: Add missing auto_diagnostic_groups

This patch goes through all .cc files in gcc/cp and adds in any
auto_diagnostic_groups that seem to be missing by looking for any
'inform' calls that aren't grouped with their respective error/warning.
Now with SARIF output support this seems to be a bit more important.

The patch isn't complete; I've tried to also track helper functions used
for diagnostics to group them, but some may have been missed.
Additionally there are a few functions that are definitely missing
groupings but I wasn't able to see an obvious way to add them without
potentially grouping together unrelated messages.

This list includes:

- lazy_load_{binding,pendings} "during load of {binding,pendings} for"
- cp_finish_decomp "in initialization of structured binding variable"
- require_deduced_type "using __builtin_source_location"
- convert_nontype_argument "in template argument for type %qT"
- coerce_template_params "so any instantiation with a non-empty parameter pack"
- tsubst_default_argument "when instantiating default argument"
- invalid_nontype_parm_type_p "invalid template non-type parameter"

gcc/cp/ChangeLog:

* class.cc (add_method): Add missing auto_diagnostic_group.
(handle_using_decl): Likewise.
(maybe_warn_about_overly_private_class): Likewise.
(check_field_decl): Likewise.
(check_field_decls): Likewise.
(resolve_address_of_overloaded_function): Likewise.
(note_name_declared_in_class): Likewise.
* constraint.cc (associate_classtype_constraints): Likewise.
(diagnose_trait_expr): Clean up whitespace.
* coroutines.cc (find_coro_traits_template_decl): Add missing
auto_diagnostic_group.
(coro_promise_type_found_p): Likewise.
(coro_diagnose_throwing_fn): Likewise.
* cvt.cc (build_expr_type_conversion): Likewise.
* decl.cc (validate_constexpr_redeclaration): Likewise.
(duplicate_function_template_decls): Likewise.
(duplicate_decls): Likewise.
(lookup_label_1): Likewise.
(check_previous_goto_1): Likewise.
(check_goto_1): Likewise.
(make_typename_type): Likewise.
(make_unbound_class_template): Likewise.
(check_tag_decl): Likewise.
(start_decl): Likewise.
(maybe_commonize_var): Likewise.
(check_for_uninitialized_const_var): Likewise.
(reshape_init_class): Likewise.
(check_initializer): Likewise.
(cp_finish_decl): Likewise.
(find_decomp_class_base): Likewise.
(cp_finish_decomp): Likewise.
(expand_static_init): Likewise.
(grokfndecl): Likewise.
(grokdeclarator): Likewise.
(check_elaborated_type_specifier): Likewise.
(lookup_and_check_tag): Likewise.
(xref_tag): Likewise.
(cxx_simulate_enum_decl): Likewise.
(finish_function): Likewise.
* decl2.cc (check_classfn): Likewise.
(record_mangling): Likewise.
(mark_used): Likewise.
* error.cc (qualified_name_lookup_error): Likewise.
* except.cc (build_throw): Likewise.
* init.cc (get_nsdmi): Likewise.
(diagnose_uninitialized_cst_or_ref_member_1): Likewise.
(warn_placement_new_too_small): Likewise.
(build_new_1): Likewise.
(build_vec_delete_1): Likewise.
(build_delete): Likewise.
* lambda.cc (add_capture): Likewise.
(add_default_capture): Likewise.
* lex.cc (unqualified_fn_lookup_error): Likewise.
* method.cc (synthesize_method): Likewise.
(defaulted_late_check): Likewise.
* module.cc (trees_in::is_matching_decl): Likewise.
(trees_in::read_enum_def): Likewise.
(module_state::check_not_purview): Likewise.
(module_state::deferred_macro): Likewise.
(module_state::read_config): Likewise.
(module_state::check_read): Likewise.
(declare_module): Likewise.
(init_modules): Likewise.
* name-lookup.cc (diagnose_name_conflict): Likewise.
(lookup_using_decl): Likewise.
(set_decl_namespace): Likewise.
(finish_using_directive): Likewise.
(push_namespace): Likewise.
(add_imported_namespace): Likewise.
* parser.cc (cp_parser_check_for_definition_in_return_type): Likewise.
(cp_parser_userdef_numeric_literal): Likewise.
(cp_parser_nested_name_specifier_opt): Likewise.
(cp_parser_new_expression): Likewise.
(cp_parser_binary_expression): Likewise.
(cp_parser_lambda_introducer): Likewise.
(cp_parser_module_declaration): Likewise.
(cp_parser_import_declaration): Likewise, removing gotos to
support this.
(cp_parser_declaration): Add missing auto_diagnostic_group.
(cp_parser_decl_specifier_seq): Likewise.
(cp_parser_template_id): Likewise.
(cp_parser_template_name): Likewise.
(cp_parser_explicit_specialization): Likewise.
(cp_parser_placeholder_type_specifier): Likewise.
(cp_parser_elaborated_type_specifier): Likewise.
(cp_parser_enum_specifier): Likewise.
(cp_parser_asm_definition): Likewise.
(cp_parser_init_declarator): Likewise.
(cp_parser_direct_declarator): Likewise.
(cp_parser_class_head): Likewise.
(cp_parser_member_declaration): Likewise.
(cp_parser_lookup_name): Likewise.
(cp_parser_explicit_template_declaration): Likewise.
(cp_parser_check_class_key): Likewise.
* pt.cc (maybe_process_partial_specialization): Likewise.
(determine_specialization): Likewise.
(check_for_bare_parameter_packs): Likewise.
(check_template_shadow): Likewise.
(process_partial_specialization): Likewise.
(push_template_decl): Likewise.
(redeclare_class_template): Likewise.
(convert_nontype_argument_function): Likewise.
(check_valid_ptrmem_cst_expr): Likewise.
(convert_nontype_argument): Likewise.
(convert_template_argument): Likewise.
(coerce_template_parms): Likewise.
(tsubst_qualified_id): Likewise.
(tsubst_expr): Likewise.
(most_specialized_partial_spec): Likewise.
(do_class_deduction): Likewise.
(do_auto_deduction): Likewise.
* search.cc (lookup_member): Likewise.
* semantics.cc (finish_non_static_data_member): Likewise.
(process_outer_var_ref): Likewise.
(finish_id_expression_1): Likewise.
(finish_offsetof): Likewise.
(omp_reduction_lookup): Likewise.
(finish_omp_clauses): Likewise.
* tree.cc (check_abi_tag_redeclaration): Likewise.
(check_abi_tag_args): Likewise.
* typeck.cc (invalid_nonstatic_memfn_p): Likewise.
(complain_about_unrecognized_member): Likewise.
(finish_class_member_access_expr): Likewise.
(error_args_num): Likewise.
(warn_for_null_address): Likewise.
(cp_build_binary_op): Likewise.
(build_x_unary_op): Likewise.
(cp_build_unary_op): Likewise.
(build_static_cast): Likewise.
(cp_build_modify_expr): Likewise.
(get_delta_difference): Likewise.
(convert_for_assignment): Widen scope of auto_diagnostic_group.
(check_return_expr): Add missing auto_diagnostic_group.
* typeck2.cc (cxx_incomplete_type_diagnostic): Likewise.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Marek Polacek <polacek@redhat.com>

[AARCH64] adjust gcc.target/aarch64/sve/mask_gather_load_7.c

The following adjusts the scan-assembler to also allow predicate
registers p8-15 to be used for the destination of the compares.
I see that code generation with a pending vectorizer patch (the
only assembler change is different predicate register allocation).

* gcc.target/aarch64/sve/mask_gather_load_7.c: Allow
p8-15 to be used for the destination of the compares.

libsanitizer: On aarch64 use hint #34 in prologue of libsanitizer functions

When gcc is built with -mbranch-protection=standard, running sanitized
programs doesn't work properly on bti enabled kernels.

This has been fixed upstream with
https://github.com/llvm/llvm-project/pull/84061

The following patch cherry picks that from upstream.

For trunk we should eventually do a full merge from upstream, but I'm hoping
they will first fix up the _BitInt libubsan support mess.

2024-09-05 Jakub Jelinek <jakub@redhat.com>

* sanitizer_common/sanitizer_asm.h: Cherry-pick llvm-project revision
1c792d24e0a228ad49cc004a1c26bbd7cd87f030.
* interception/interception.h: Likewise.

middle-end: have vect_recog_cond_store_pattern use pattern statement for cond if available

When vectorizing a conditional operation we rely on the bool_recog pattern to
hit and convert the bool of the operand to a valid mask.

However we are currently not using the converted operand as this is in a pattern
statement. This change updates it to look at the actual statement to be
vectorized so we pick up the pattern.

Note that there are no tests here since vectorization will fail until we
correctly lower all boolean conditionals early.

Tests for these are in the next patch, namely vect-conditional_store_5.c and
vect-conditional_store_6.c. And the existing vect-conditional_store_[1-4].c
checks that the other cases are still handled correctly.

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_cond_store_pattern): Use pattern
statement.

testsuite: remove -fwrapv from signbit-5.c

The meaning of the testcase was changed by passing it -fwrapv. The reason for
the test failures on some platform was because the test was testing some
implementation defined behavior wrt INT_MIN in generic code.

Instead of using -fwrapv this just removes the border case from the test so
all the values now have a defined semantic. It still relies on the handling of
shifting a negative value right, but that wasn't changed with -fwrapv anyway.

The -fwrapv case is being handled already by other testcases.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Remove -fwrapv and change INT_MIN to INT_MIN+1.

docs: double mention of armv9-a.

The list of available architecture for Arm is incorrectly listing armv9-a twice.
This removes the duplicate armv9-a enumeration from the part of the list having
M-profile targets.

gcc/ChangeLog:

* doc/invoke.texi: Remove duplicate armv9-a mention.

vrp: Fix up diagnostics wording

I've noticed non-standard wording of this diagnostics when looking at
a miscompilation with --param=vrp-block-limit=0.

Diagnostics generally shouldn't start with uppercase letter (unless
the upper case would appear also in the middle of a sentence) and shouldn't
be separate sentences with dot as separator, ; is IMHO more frequently used.

2024-09-05 Jakub Jelinek <jakub@redhat.com>

* tree-vrp.cc (pass_vrp::execute): Start diagnostics with
lowercase u rather than capital U, use semicolon instead of dot.

RISC-V: Lookup reversely in riscv_select_multilib_by_abi

When use --print-multi-os-dir or -print-multi-directory, gcc outputs
different values with full -march option and the base one only.

$ ./gcc/xgcc --print-multi-os-dir -mabi=lp64d -march=rv64gc
lib64/lp64d

$ ./gcc/xgcc --print-multi-os-dir -mabi=lp64d -march=rv64gc_zba
.

The reason is that in multilib.h, the fallback value of multilib
is listed as the 1st one in `multilib_raw[]`.

gcc
* common/config/riscv/riscv-common.cc(riscv_select_multilib_by_abi):
look up reversely as the fallback path is listed as the 1st one.

testsuite: Fix xorsign.c, vect-double-2.c fails with -march=x86-64-v2

These testcases raise fails with -march=x86-64-v2, so add -mno-sse4 to avoid
these unexpected fails.

gcc/testsuite/ChangeLog:

PR testsuite/116608
* gcc.target/i386/vect-double-2.c: Add extra option -mno-sse4
* gcc.target/i386/xorsign.c: Ditto.

ada: Add bypass for internal fields on strict-alignment platforms

This is required to support misalignment of tagged types in legacy code.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add bypass
for internal fields on strict-alignment platforms.

ada: Streamline handling of low-level peculiarities of record field layout

This factors out the interface to the low-level field layout machinery.

gcc/ada/

* gcc-interface/gigi.h (default_field_alignment): New function.
* gcc-interface/misc.cc: Include tm_p header file.
(default_field_alignment): New function.
* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Replace
previous alignment klduge with call to default_field_alignment.
* gcc-interface/utils.cc (finish_record_type): Likewise for the
alignment based on which DECL_BIT_FIELD should be cleared.

ada: Remove unused parameters in validity checking routine

Code cleanup; semantics is unaffected.

gcc/ada/

* exp_util.ads, exp_util.adb (Duplicate_Subexpr_No_Checks):
Remove parameters, which are no longer used.

ada: Integrate new diagnostics in the frontend

Integrate diagnostic messages using the new implementation to the codebase.

New diagnostic implementation uses GNAT.Lists as a building
block. Tampering checks that were initially implemented
for those lists are not critical for this implementation and
they lead to overly complex code.

Add a generic parameter Tampering_Checks to control whether
the tempering checks should be applied for the lists.
Make tampering checks conditional for GNAT.Lists

gcc/ada/

* par-endh.adb: add call to new diagnostic for end loop errors.
* sem_ch13.adb: add call to new diagnostic for default iterator
error and record representation being too late.
* sem_ch4.adb: Add new diagnostic for wrong operands.
* sem_ch9.adb: Add new diagnostic for a Lock_Free warning.
* libgnat/g-lists.adb (Ensure_Unlocked): Make checks for tampering
conditional.
* libgnat/g-lists.ads: Add parameter Tampering_Checks to control
whether tampering checks should be executed.
* backend_utils.adb: Add new gcc switches
'-fdiagnostics-format=sarif-file' and
'-fdiagnostics-format=sarif-stderr'.
* debug.adb: document -gnatd_D switch.
* diagnostics-brief_emitter.adb: New package for displaying
diagnostic messages in a compact manner.
* diagnostics-brief_emitter.ads: Same as above.
* diagnostics-constructors.adb: New pacakge for providing simpler
constructor methods for new diagnostic objects.
* diagnostics-constructors.ads: Same as above.
* diagnostics-converter.adb: New package for converting old
Error_Msg_Object-s to Diagnostic_Types.
* diagnostics-converter.ads: Same as above.
* diagnostics-json_utils.adb: Package for utility methods related
to emitting JSON.
* diagnostics-json_utils.ads: Same as above.
* diagnostics-pretty_emitter.adb: New package for displaying
diagnostic messages in a more elaborate manner.
* diagnostics-pretty_emitter.ads: Same as above.
* diagnostics-repository.adb: New package for collecting all
created error messages.
* diagnostics-repository.ads: Same as above.
* diagnostics-sarif_emitter.adb: New pacakge for converting all of
the diagnostics into a report in the SARIF format.
* diagnostics-sarif_emitter.ads: Same as above.
* diagnostics-switch_repository.adb: New package containing the
definitions for all of the warninging switches.
* diagnostics-switch_repository.ads: Same as above.
* diagnostics-utils.adb: Contains various utility methods for the
diagnostic pacakges.
* diagnostics-utils.ads: Same as above.
* diagnostics.adb: Contains the definitions and common functions
for all the new diagnostics objects.
* diagnostics.ads: Same as above.
* errout.adb: Relocate the old implementations for brief and
pretty printing the diagnostic messages and the entrypoint to the
new implementation if a debug switch is used.
* errout.ads: Improve documentation. Make Set_Msg_Text publicly
available.
* opt.ads: Add the flag SARIF_File which controls whether the
diagnostic messages should be printed to a file in the SARIF
format. Add the flag SARIF_Output to control whether the
diagnostic messages should be printed to std-err in the SARIF
format.
* gcc-interface/Make-lang.in: Add new pacakages to the object
list.
* gcc-interface/Makefile.in: Add new pacakages to the object list.

ada: Binder respects Ada version for checksum of runtime files

The parsing to compute the checksums of runtime files (within the
binder) was done using the default Ada version (Ada 2012 currently),
while the creation of the checksum, when the runtime files are
compiled, is performed in a more recent Ada version (Ada 2022
currently). This change forces the checksum computation for runtime
files to be done with the same Ada version as when they were created.

gcc/ada/

* ali-util.adb (Get_File_Checksum): Force the parsing for
the checksum computation of runtime files to be done in
the corresponding recent Ada version.

ada: Tweak assertions in Inline.Cannot_Inline

The purpose of this patch is to silence a GNATSAS report.

gcc/ada/

* inline.adb (Cannot_Inline): Remove assertion.
* inline.ads (Cannot_Inline): Add precondition.

Handle unused-only-live stmts in SLP discovery

The following adds SLP discovery for roots that are only live but
otherwise unused. These are usually inductions. This allows a
few more testcases to be handled fully with SLP, for example
gcc.dg/vect/no-scevccp-pr86725-1.c

* tree-vect-slp.cc (vect_analyze_slp): Analyze SLP for live
but otherwise unused defs.

Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'

..., such that also for repeated 'NEXT_PASS', 'PUSH_INSERT_PASSES_WITHIN' for a
given 'PASS', the 'PUSH_INSERT_PASSES_WITHIN' applies to the preceeding
'NEXT_PASS', and not unconditionally applies to the first 'NEXT_PASS'.

gcc/
* gen-pass-instances.awk: Handle 'PUSH_INSERT_PASSES_WITHIN'.
* pass_manager.h (PUSH_INSERT_PASSES_WITHIN): Adjust.
* passes.cc (PUSH_INSERT_PASSES_WITHIN): Likewise.

[PATCH] RISC-V: Make the setCC/REE tests robust to instruction selection

These tests were checking that the output of the setCC instruction was bit
flipped, but it looks like they're really designed to test that
redundant sign extension elimination fires on conditionals from function
inputs. Jeff just posed a patch to clean this code up with trips up on
the arbitrary xori/snez instruction selection decision changing, so
let's just robustify the tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sge.c: Adjust regex to match the input.
* gcc.target/riscv/sgeu.c: Likewise.
* gcc.target/riscv/sle.c: Likewise.
* gcc.target/riscv/sleu.c: Likewise.

i386: Support partial vectorized FMA for V2BF/V4BF

This patch introduces support for vectorized FMA operations for bf16 types in
V2BF and V4BF modes on the i386 architecture. New mode iterators and
define_expand entries for fma, fnma, fms, and fnms operations are added in
mmx.md, enhancing the i386 backend to handle these complex arithmetic operations.

gcc/ChangeLog:

* config/i386/mmx.md (TARGET_MMX_WITH_SSE): New mode iterator VBF_32_64
(fma<mode>4): define_expand for V2BF/V4BF fma<mode>4.
(fnma<mode>4): define_expand for V2BF/V4BF fnma<mode>4.
(fms<mode>4): define_expand for V2BF/V4BF fms<mode>4.
(fnms<mode>4): define_expand for V2BF/V4BF fnms<mode>4.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: New test.

Match: Fix ordered and nonequal

Need to add :c for bit_and, because bit_and is commutative. And is (ltgt @0 @1)
is simpler than (bit_not (uneq @0 @1)).

gcc/ChangeLog:

* match.pd: Fix match for (bit_and (ordered @0 @1) (ne @0 @1)).

gcc/testsuite/ChangeLog:

* gcc.dg/opt-ordered-and-nonequal-1.c: New test.
* gcc.target/i386/optimize_one.c: Change name to opt-comi-1.c.
* gcc.target/i386/opt-comi-1.c: New test.

i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

This patch adds support for bf16 operations in V2BF and V4BF modes on i386,
handling signbit, xorsign, copysign, abs, neg, and various logical operations.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_build_const_vector): Add V2BF/V4BF.
(ix86_build_signbit_mask): Add V2BF/V4BF.
* config/i386/mmx.md: Modified supported logic op to use VHBF_32_64.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-absnegbf.c: New test.

i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

This change adds BFmode support to the ix86_preferred_simd_mode function
enhancing SIMD vectorization for BF16 operations. The update ensures
optimized usage of SIMD capabilities improving performance and aligning
vector sizes with processor capabilities.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_preferred_simd_mode): Add BFmode Support.

Daily bump.

[PATCH 1/3] RISC-V: Improve codegen for negative repeating large constants

Improve handling of constants where its upper and lower 32-bit
halves are the same and have negative values.

e.g. for:

unsigned long f (void) { return 0xf0f0f0f0f0f0f0f0UL; }

Without the patch:

li      a0,-252645376
addi    a0,a0,240
li      a5,-252645376
addi    a5,a5,241
slli    a5,a5,32
add     a0,a5,a0

With the patch:

li      a5,252645376
addi    a5,a5,-241
slli    a0,a5,32
add     a0,a0,a5
xori    a0,a0,-1

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_split_integer_cost): Adjust the
cost of negative repeating constants.
(riscv_split_integer): Handle negative repeating constants.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/synthesis-11.c: New test.

Check DECL_NAMELESS in modified_type_die

While working on a patch to the Ada compiler, I found a spot in
dwarf2out.cc that calls add_name_attribute without respecting
DECL_NAMELESS.

gcc

* dwarf2out.cc (modified_type_die): Check DECL_NAMELESS.

[RISC-V] Fix scan test output after recent path-splitting changes

The recent path splitting changes from Andrew result in identifying more
saturation idioms instead of just identifying an overflow check. As a result
many of the tests in the RISC-V port started failing a scan check on the
.expand output.

As expected, identifying a saturation idiom is more helpful than identifying an
overflow check and the resultant code is better based on my spot checks.

So the right thing to do is to expect more saturation intrinsics in the .expand
output.

I've verified this fixes the regressions for riscv32-elf and riscv64-elf.
Pushing to the trunk.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Adjust
expected output.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Likewise.

c++: cleanup coerce_template_template_parm

This function could use some sprucing up.

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parm): Return bool instead of int.

c++: noexcept and pointer to member function type [PR113108]

We ICE in nothrow_spec_p because it got a DEFERRED_NOEXCEPT.
This DEFERRED_NOEXCEPT was created in implicitly_declare_fn
when declaring

Foo& operator=(Foo&&) = default;

in the test. The problem is that in resolve_overloaded_unification
we call maybe_instantiate_noexcept before try_one_overload only in
the TEMPLATE_ID_EXPR case.

PR c++/113108

gcc/cp/ChangeLog:

* pt.cc (resolve_overloaded_unification): Call
maybe_instantiate_noexcept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/noexcept-type28.C: New test.

c++: add a testcase for [PR 108620]

Fixed by r15-2540-g32e678b2ed7521. Add a testcase, as the original ones
do not cover this particular failure mode.

gcc/testsuite/ChangeLog:

PR c++/108620
* g++.dg/coroutines/pr108620.C: New test.

coros: mark .CO_YIELD as LEAF [PR106973]

We rely on .CO_YIELD calls being followed by an assignment (optionally)
and then a switch/if in the same basic block. This implies that a
.CO_YIELD can never end a block. However, since a call to .CO_YIELD is
still a call, if the function containing it calls setjmp, GCC thinks
that the .CO_YIELD can introduce abnormal control flow, and generates an
edge for the call.

We know this is not the case; .CO_YIELD calls get removed quite early on
and have no effect, and result in no other calls, so .CO_YIELD can be
considered a leaf function, preventing generating an edge when calling
it.

PR c++/106973 - coroutine generator and setjmp

PR c++/106973

gcc/ChangeLog:

* internal-fn.def (CO_YIELD): Mark as ECF_LEAF.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr106973.C: New test.

object-size: Use simple_dce_from_worklist in object-size pass

While trying to see if there was a way to improve object-size pass
to use the ranger (for pointer plus), I noticed that it leaves around
the statement containing __builtin_object_size if it was reduced to a constant.
This fixes that by using simple_dce_from_worklist.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-object-size.cc (object_sizes_execute): Mark lhs for maybe dceing
if doing a propagate. Call simple_dce_from_worklist.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Use dg-additional-options for gfortran.dg/vect/vect-8.f90 and RISC-V

r14-9122-g67a29f99cc8138 disabled scheduling on a lot of testcases
for RISC-V for PR113249 but using dg-options. This makes
gfortran.dg/vect/vect-8.f90 UNRESOLVED as it relies on default
flags to enable vectorization.

The following uses dg-additional-options instead.

Tested on riscv64-linux with qemu-user, pushed.

I didn't check all the other adjusted tests for similar issues.

* gfortran.dg/vect/vect-8.f90: Use dg-additional-options.

nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'

This allows getting rid of the respective type casts. No change in behavior
intended.

gcc/
* config/nvptx/gen-opt.sh: Use 'enum ptx_isa' instead of 'int'.
* config/nvptx/nvptx-gen.opt: Regenerate.
* config/nvptx/nvptx.opt: Use 'enum ptx_version' instead of 'int'.
* config/nvptx/nvptx-opts.h (enum ptx_isa): Add 'PTX_ISA_unset'.
(enum ptx_version): Add 'PTX_VERSION_unset'.
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Adjust.
* config/nvptx/nvptx.cc (default_ptx_version_option)
(handle_ptx_version_option, nvptx_option_override)
(nvptx_file_start): Likewise.

Fix branch prediction dump message

Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".

gcc/ChangeLog:

* predict.cc (pass_profile::execute): Fix dump message.

Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>

Fix gimple_debug_cfg declaration

Silence a warning. The argument type did not match the definition.

gcc/ChangeLog:

* tree-cfg.h (gimple_debug_cfg): Change argument type from int
to dump_flags_t.

Document 'pass_postreload' vs. 'pass_late_compilation'

See Subversion r217124 (Git commit 433e4164339f18d0b8798968444a56b681b5232c)
"Reorganize post-ra pipeline for targets without register allocation".

gcc/
* passes.cc: Document 'pass_postreload' vs. 'pass_late_compilation'.
* passes.def: Likewise.

nvptx: Specify '-mno-alias' for 'gcc.dg/pr60797.c' [PR60797, PR104957]

2014 Subversion r209299 (Git commit 8330537b5b58bd0532a0a49f9cbd59bf526a7847)
"Fix PR60797" added this test case, which we now amend so that it's able to
test its thing also in '--target=nvptx-none' configurations with symbol alias
support enabled (..., and test nvptx '-mno-alias').

PR middle-end/60797
PR target/104957
gcc/testsuite/
* gcc.dg/pr60797.c: For nvptx, specify '-mno-alias'.

Add 'gcc.target/nvptx/alias-to-alias-1.c'

... similar to alias to alias usage in 'libgomp.c-c++-common/pr96390.c'.

PR target/104957
gcc/testsuite/
* gcc.target/nvptx/alias-to-alias-1.c: New.

Add 'gcc.target/nvptx/alias-weak-1.c'

... testing for the GCC/nvptx "weak alias definitions not supported" error
diagnostic (limitation of PTX).

gcc/testsuite/
* gcc.target/nvptx/alias-weak-1.c: New.

rust: avoid clobbering LIBS

Save LIBS around calls to AC_SEARCH_LIBS to avoid clobbering $LIBS.

ChangeLog:

* configure: Regenerate.
* configure.ac: Save LIBS around calls to AC_SEARCH_LIBS.

Signed-off-by: Marc Poulhiès <dkm@kataplop.net>
Reviewed-by: Thomas Schwinge <tschwinge@baylibre.com>
Tested-by: Thomas Schwinge <tschwinge@baylibre.com>

Also lower SLP grouped loads with just one consumer

This makes sure to produce interleaving schemes or load-lanes
for single-element interleaving and other permutes that otherwise
would use more than three vectors.

It exposes the latent issue that single-element interleaving with
large gaps can be inefficient - the mitigation in get_group_load_store_type
doesn't trigger when we clear the load permutation.

It also exposes the fact that not all permutes can be lowered in
the best way in a vector length agnostic way so I've added an
exception to keep power-of-two size contiguous aligned chunks
unlowered (unless we want load-lanes). The optimal handling
of load/store vectorization is going to continue to be a learning
process.

* tree-vect-slp.cc (vect_lower_load_permutations): Also
process single-use grouped loads.
Avoid lowering contiguous aligned power-of-two sized
chunks, those are better handled by the vector size
specific SLP code generation.
* tree-vect-stmts.cc (get_group_load_store_type): Drop
the unrelated requirement of a load permutation for the
single-element interleaving limit.

* gcc.dg/vect/slp-46.c: Remove XFAIL.

Zen5 tuning part 5: update instruction latencies in x86-tune-costs

there is nothing exciting in this patch.  I measured latencies and also compared
them with newly released optimization guide.  There are no dramatic changes
compared to zen4.  One interesting new bit is that addss is faster and can be
2 cycles when fed by another addss.

I also increased the large insn bound since decoders seems no longer require
instructions to be 8 bytes or less.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (znver5_cost): Update instruction
costs.

expand: Add dump for costing of positive divides

While trying to understand PR 115910 I found it was useful to print out
the two costs of doing a signed and unsigned division just like was added in
r15-3272-g3c89c41991d8e8 for popcount==1.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* expr.cc (expand_expr_divmod): Add dump of the two costs for
positive division.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"

Exposed when running the test-suite with -flate-combine-instructions.

* config/cris/cris.md (lra_szext_decomposed_indir_plus): New
peephole2 pattern.

RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD

This patch would like to allow the IMM operand of the unsigned
scalar .SAT_ADD. Like the operand 0, the operand 1 of .SAT_ADD
will be zero extended to Xmode before underlying code generation.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_usadd): Zero extend
the second operand of usadd as the first operand does.
* config/riscv/riscv.md (usadd<m>3): Allow imm operand for
scalar usadd pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_add-11.c: Make asm check robust.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Fix testcase vec-init-22-speed.c [PR116589]

For this testcase, the trunk produces:
```
f_s16:
        fmov    s31, w0
        fmov    s0, w1
```

While the testcase was expecting what was produced in GCC 14:
```
f_s16:
        sxth    w0, w0
        sxth    w1, w1
        fmov    d31, x0
        fmov    d0, x1
```

After r15-1575-gea8061f46a30 the code was:
```
        dup     v31.4h, w0
        dup     v0.4h, w1
```
But when ext-dce was added with r15-1901-g98914f9eba5f19, we get the better code generation now and only fmov's.

Pushed as obvious after running the testcase.

PR target/116589

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vec-init-22-speed.c: Update scan for better code gen.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

split-path: Improve ifcvt heurstic for split path [PR112402]

This simplifies the heurstic for split path to see if the join
bb is a ifcvt candidate.
For the predecessors bbs need either to be empty or only have one
statement in them which could be a decent ifcvt candidate.
The previous heurstics would miss that:
```
if (a) goto B else goto C;
B: goto C;
C:
c = PHI<d,e>
```

Would be a decent ifcvt candidate. And would also miss:
```
if (a) goto B else goto C;
B: d = f + 1; goto C;
C:
c = PHI<d,e>
```

Also since currently the max number of cmovs being able to produced is 3, we
should only assume `<= 3` phis can be ifcvt candidates.

The testcase changes for split-path-6.c is that lookharder function
is a true ifcvt case where we would get cmov as expected; it looks like it
was not a candidate when the heurstic was added but became one later on.
pr88797.C is now rejected via it being an ifcvt candidate rather than being about
DCE/const prop.

The rest of the testsuite changes are just slight change in the dump,
removing the "*diamnond" part as it was removed from the print.

Bootstrapped and tested on x86_64.

PR tree-optimization/112402

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (poor_ifcvt_pred): New function.
(is_feasible_trace): Remove old heurstics for ifcvt cases.
For num_stmts <=1 for both pred check poor_ifcvt_pred on both
pred.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/split-path-11.c: Update scan.
* gcc.dg/tree-ssa/split-path-2.c: Update scan.
* gcc.dg/tree-ssa/split-path-5.c: Update scan.
* gcc.dg/tree-ssa/split-path-6.c: Update scan.
* g++.dg/tree-ssa/pr88797.C: Update scan.
* gcc.dg/tree-ssa/split-path-13.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

split-paths: Move check for # of statements in join earlier

This moves the check for # of statements to copy in join to
be the first check. This check is the cheapest check so it
should be first. Plus add a print to the dump file since there
was none beforehand.

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (is_feasible_trace): Move
check for # of statments in join earlier and add a
debug print.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Explicitly document that the "counted_by" attribute is only supported in C.

The "counted_by" attribute currently is only supported in C, mention this
explicitly in documentation and also issue warnings when see "counted_by"
attribute in C++ with -Wattributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_counted_by_attribute): Is ignored and issues
warning with -Wattributes in C++ for now.

gcc/ChangeLog:

* doc/extend.texi: Explicitly mentions counted_by is available
only in C for now.

gcc/testsuite/ChangeLog:

* g++.dg/ext/flex-array-counted-by.C: New test.
* g++.dg/ext/flex-array-counted-by-2.C: New test.

c++: support C++11 attributes in C++98

I don't see any reason why we can't allow the [[]] attribute syntax in C++98
mode with a pedwarn just like many other C++11 features. In fact, we
already do support it in some places in the grammar, but not in places that
check cp_nth_tokens_can_be_std_attribute_p.

Let's also follow the C front-end's lead in only warning about them when
-pedantic.

It still isn't necessary for this function to guard against Objective-C
message passing syntax; we handle that with tentative parsing in
cp_parser_statement, and we don't call this function in that context anyway.

gcc/cp/ChangeLog:

* parser.cc (cp_nth_tokens_can_be_std_attribute_p): Don't check
cxx_dialect.
* error.cc (maybe_warn_cpp0x): Only complain about C++11 attributes
if pedantic.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/gen-attrs-1.C: Also run in C++98 mode.
* g++.dg/cpp0x/gen-attrs-11.C: Likewise.
* g++.dg/cpp0x/gen-attrs-13.C: Likewise.
* g++.dg/cpp0x/gen-attrs-15.C: Likewise.
* g++.dg/cpp0x/gen-attrs-75.C: Don't expect C++98 warning after
__extension__.

PR116080: Fix test suite checks for musttail

This is a new attempt to fix PR116080. The previous try was reverted
because it just broke a bunch of tests, hiding the problem.

- musttail behaves differently than tailcall at -O0. Some of the test
run at -O0, so add separate effective target tests for musttail.
- New effective target tests need to use unique file names
to make dejagnu caching work
- Change the tests to use new targets
- Add a external_musttail test to check for target's ability
to do tail calls between translation units. This covers some powerpc
ABIs.

gcc/testsuite/ChangeLog:

PR testsuite/116080
* c-c++-common/musttail1.c: Use musttail target.
* c-c++-common/musttail12.c: Use struct_musttail target.
* c-c++-common/musttail2.c: Use musttail target.
* c-c++-common/musttail3.c: Likewise.
* c-c++-common/musttail4.c: Likewise.
* c-c++-common/musttail7.c: Likewise.
* c-c++-common/musttail8.c: Likewise.
* g++.dg/musttail10.C: Likewise. Replace powerpc checks with
external_musttail.
* g++.dg/musttail11.C: Use musttail target.
* g++.dg/musttail6.C: Use musttail target. Replace powerpc
checks with external_musttail.
* g++.dg/musttail9.C: Use musttail target.
* lib/target-supports.exp: Add musttail, struct_musttail,
external_musttail targets. Remove optimization for musttail.
Use unique file names for musttail.

pretty-print: split up pretty_printer::format into subroutines

The body of pretty_printer::format is almost 500 lines long,
mostly comprising two distinct phases.

This patch splits it up so that there are explicit subroutines
for the two different phases, reducing the scope of various
locals, and making it easier to e.g. put a breakpoint on phase 2.

No functional change intended.

gcc/ChangeLog:
* pretty-print-markup.h (pp_markup::context::context): Drop
params "buf" and "chunk_idx", initializing m_buf from pp.
(pp_markup::context::m_chunk_idx): Drop field.
* pretty-print.cc (pretty_printer::format): Convert param
from a text_info * to a text_info &. Split out phase 1
and phase 2 into subroutines...
(format_phase_1): New, from pretty_printer::format.
(format_phase_2): Likewise.
* pretty-print.h (pretty_printer::format): Convert param
from a text_info * to a text_info &.
(pp_format): Update for above change. Assert that text_info is
non-null.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: add selftest of pp_format's stack

gcc/ChangeLog:
* pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New
accessor.
* pretty-print.cc (selftest::push_pp_format): New.
(ASSERT_TEXT_TOKEN): New macro.
(selftest::test_pp_format_stack): New test.
(selftest::pretty_print_cc_tests): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: naming cleanups

This patch is a followup to r15-3311-ge31b6176996567 making some
cleanups to pretty-printing to reflect those changes:
- renaming "chunk_info" to "pp_formatted_chunks"
- renaming "cur_chunk_array" to "m_cur_fomatted_chunks"
- rewording/clarifying comments
and taking the opportunity to add a "m_" prefix to all fields of
output_buffer.

No functional change intended.

gcc/analyzer/ChangeLog:
* analyzer-logging.cc (logger::logger): Prefix all output_buffer
fields with "m_".

gcc/c-family/ChangeLog:
* c-ada-spec.cc (dump_ada_node): Prefix all output_buffer fields
with "m_".
* c-pretty-print.cc (pp_c_integer_constant): Likewise.
(pp_c_integer_constant): Likewise.
(pp_c_floating_constant): Likewise.
(pp_c_fixed_constant): Likewise.

gcc/c/ChangeLog:
* c-objc-common.cc (print_type): Prefix all output_buffer fields
with "m_".

gcc/cp/ChangeLog:
* error.cc (type_to_string): Prefix all output_buffer fields with
"m_".
(append_formatted_chunk): Likewise.  Rename "chunk_info" to
"pp_formatted_chunks" and field cur_chunk_array with
m_cur_formatted_chunks.

gcc/fortran/ChangeLog:
* error.cc (gfc_move_error_buffer_from_to): Prefix all
output_buffer fields with "m_".
(gfc_diagnostics_init): Likewise.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_set_caret_max_width): Prefix all
output_buffer fields with "m_".
* dumpfile.cc (emit_any_pending_textual_chunks): Likewise.
(emit_any_pending_textual_chunks): Likewise.
* gimple-pretty-print.cc (gimple_dump_bb_buff): Likewise.
* json.cc (value::dump): Likewise.
* pretty-print-format-impl.h (class chunk_info): Rename to...
(class pp_formatted_chunks): ...this.  Add friend
class output_buffer.  Update comment near end of decl to show
the pp_formatted_chunks instance on the chunk_obstack.
(pp_formatted_chunks::pop_from_output_buffer): Delete decl.
(pp_formatted_chunks::on_begin_quote): Delete decl that should
have been removed in r15-3311-ge31b6176996567.
(pp_formatted_chunks::on_end_quote): Likewise.
(pp_formatted_chunks::m_prev): Update for renaming.
* pretty-print.cc (output_buffer::output_buffer): Prefix all
fields with "m_".  Rename "cur_chunk_array" to
"m_cur_formatted_chunks".
(output_buffer::~output_buffer): Prefix all fields with "m_".
(output_buffer::push_formatted_chunks): New.
(output_buffer::pop_formatted_chunks): New.
(pp_write_text_to_stream): Prefix all output_buffer fields with
"m_".
(pp_write_text_as_dot_label_to_stream): Likewise.
(pp_write_text_as_html_like_dot_to_stream): Likewise.
(chunk_info::append_formatted_chunk): Rename to...
(pp_formatted_chunks::append_formatted_chunk): ...this.
(chunk_info::pop_from_output_buffer): Delete.
(pretty_printer::format): Update leading comment to mention
pushing pp_formatted_chunks, and to reflect changes in
r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
"m_".
(pp_output_formatted_text): Update leading comment to mention
popping a pp_formatted_chunks, and to reflect the changes in
r15-3311-ge31b6176996567.  Prefix all output_buffer fields with
"m_" and rename "cur_chunk_array" to "m_cur_formatted_chunks".
Replace call to chunk_info::pop_from_output_buffer with a call to
output_buffer::pop_formatted_chunks.
(pp_flush): Prefix all output_buffer fields with "m_".
(pp_really_flush): Likewise.
(pp_clear_output_area): Likewise.
(pp_append_text): Likewise.
(pretty_printer::remaining_character_count_for_line): Likewise.
(pp_newline): Likewise.
(pp_character): Likewise.
(pp_markup::context::push_back_any_text): Likewise.
* pretty-print.h (class chunk_info): Rename to...
(class pp_formatted_chunks): ...this.
(class output_buffer): Delete unimplemented rule-of-5 members.
(output_buffer::push_formatted_chunks): New decl.
(output_buffer::pop_formatted_chunks): New decl.
(output_buffer::formatted_obstack): Rename to...
(output_buffer::m_formatted_obstack): ...this.
(output_buffer::chunk_obstack): Rename to...
(output_buffer::m_chunk_obstack): ...this.
(output_buffer::obstack): Rename to...
(output_buffer::m_obstack): ...this.
(output_buffer::cur_chunk_array): Rename to...
(output_buffer::m_cur_formatted_chunks): ...this.
(output_buffer::stream): Rename to...
(output_buffer::m_stream): ...this.
(output_buffer::line_length): Rename to...
(output_buffer::m_line_length): ...this.
(output_buffer::digit_buffer): Rename to...
(output_buffer::m_digit_buffer): ...this.
(output_buffer::flush_p): Rename to...
(output_buffer::m_flush_p): ...this.
(output_buffer_formatted_text): Prefix all output_buffer fields
with "m_".
(output_buffer_append_r): Likewise.
(output_buffer_last_position_in_text): Likewise.
(pretty_printer::set_output_stream): Likewise.
(pp_scalar): Likewise.
(pp_wide_int): Likewise.
* tree-pretty-print.cc (dump_generic_node): Likewise.
(dump_generic_node): Likewise.
(pp_double_int): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: add fixed test [PR109095]

Fixed by r13-6693.

PR c++/109095

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class66.C: New test.

Zen5 tuning part 4: update reassocation width

Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in
3 of them. FP units can do 2 additions and 2 multiplications with latency 2
and 3. This patch updates reassociation width accordingly. This has potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_reassociation_width): Update for Znver5.
* config/i386/x86-tune-costs.h (znver5_costs): Update reassociation
widths.

Drop file that should not have been committed.

* J: Drop file that should not have been committed

Zen5 tuning part 3: fix typo in previous patch

gcc/ChangeLog:

* config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Fix
typo.

libstdc++: Fix error handling in fs::hard_link_count for Windows

The recent change to use auto_win_file_handle for
std::filesystem::hard_link_count caused a regression. The
std::error_code argument should be cleared if no error occurs, but this
no longer happens. Add a call to ec.clear() in fs::hard_link_count to
fix this.

Also change the auto_win_file_handle class to take a reference to the
std::error_code and set it if an error occurs, to slightly simplify the
control flow in the fs::equiv_files function.

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (auto_win_file_handle): Add error_code&
member and set it if CreateFileW or GetFileInformationByHandle
fails.
(fs::equiv_files) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
control flow.
(fs::hard_link_count) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Clear ec
on success.
* testsuite/27_io/filesystem/operations/hard_link_count.cc:
Check error handling.

libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549]

LWG 3736 added a partial specialization of this variable template for
two std::move_iterator types. This is needed for the case where the
types satisfy std::sentinel_for and are subtractable, but do not model
the semantics requirements of std::sized_sentinel_for.

libstdc++-v3/ChangeLog:

PR libstdc++/116549
* include/bits/stl_iterator.h (disable_sized_sentinel_for):
Define specialization for two move_iterator types, as per LWG
3736.
* testsuite/24_iterators/move_iterator/lwg3736.cc: New test.

Dump whether a SLP node represents load/store-lanes

This makes it easier to discover whether SLP load or store nodes
participate in load/store-lanes accesses.

* tree-vect-slp.cc (vect_print_slp_tree): Annotate load
and store-lanes nodes.

Fix missed peeling for gaps with SLP load-lanes

The following disables peeling for gap avoidance with using smaller
vector accesses when using load-lanes.

* tree-vect-stmts.cc (get_group_load_store_type): Only disable
peeling for gaps by using smaller vectors when not using
load-lanes.

Zen5 tuning part 3: scheduler tweaks

this patch adds support for new fussion in znver5 documented in the
optimization manual:

   The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
   with certain ALU instructions. The following conditions need to be met for
   fusion to happen:
     - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
     - The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
     - The ALU instruction may source only registers or immediate data. There cannot be any memory source.
     - The ALU instruction sources either the source or dest of MOV instruction.
     - If ALU instruction has 2 reg sources, they should be different.
     - The following ALU instructions can fuse with an older qualified MOV instruction:
       ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
       (I assume OP is OR)

I also increased issue rate from 4 to 6.  Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.

Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).

New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
        .cfi_offset 3, -32
        leaq    63(%rsi), %rbx
        movq    %rbx, %rbp
+       shrq    $6, %rbp
+       salq    $3, %rbp
        subq    $16, %rsp
        .cfi_def_cfa_offset 48
        movq    %rdi, %r12
-       shrq    $6, %rbp
-       movq    %rsi, 8(%rsp)
-       salq    $3, %rbp
        movq    %rbp, %rdi
+       movq    %rsi, 8(%rsp)
        call    _Znwm
        movq    8(%rsp), %rsi
        movl    $0, 8(%r12)
@@ -2224,8 +2224,8 @@
        movq    %rax, (%r12)
        movq    %rbp, 32(%r12)
        testq   %rsi, %rsi
-       movq    %rsi, %rdx
        cmovns  %rsi, %rbx
+       movq    %rsi, %rdx
        sarq    $63, %rdx
        shrq    $58, %rdx
        sarq    $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.

gcc/ChangeLog:

* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
(ix86_adjust_cost): Add TODO about znver5 memory latency.
(ix86_fuse_mov_alu_p): New.
(ix86_macro_fusion_pair_p): Use it.
* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
(X86_TUNE_FUSE_MOV_AND_ALU): New tune;

libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning

We don't need to use std::aligned_storage in std::any. We just need a
POD type of the right size. The void* union member already ensures the
alignment will be correct. Avoiding std::aligned_storage means we don't
need to suppress a -Wdeprecated-declarations warning.

libstdc++-v3/ChangeLog:

* include/experimental/any (experimental::any::_Storage): Use
array of unsigned char instead of deprecated
std::aligned_storage.
* include/std/any (any::_Storage): Likewise.
* testsuite/20_util/any/layout.cc: New test.

libstdc++: Add missing feature-test macro in various headers

version.syn#2 requires various headers to define
__cpp_lib_allocator_traits_is_always_equal. Currently, only <memory> was
defining this macro. Implement fixes for the other headers as well.

Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
libstdc++-v3/ChangeLog:

* include/std/deque: Define macro
__glibcxx_want_allocator_traits_is_always_equal.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/map: Likewise.
* include/std/scoped_allocator: Likewise.
* include/std/set: Likewise.
* include/std/string: Likewise.
* include/std/unordered_map: Likewise.
* include/std/unordered_set: Likewise.
* include/std/vector: Likewise.
* testsuite/20_util/headers/memory/version.cc: New test.
* testsuite/20_util/scoped_allocator/version.cc: Likewise.
* testsuite/21_strings/headers/string/version.cc: Likewise.
* testsuite/23_containers/deque/version.cc: Likewise.
* testsuite/23_containers/forward_list/version.cc: Likewise.
* testsuite/23_containers/list/version.cc: Likewise.
* testsuite/23_containers/map/version.cc: Likewise.
* testsuite/23_containers/set/version.cc: Likewise.
* testsuite/23_containers/unordered_map/version.cc: Likewise.
* testsuite/23_containers/unordered_set/version.cc: Likewise.
* testsuite/23_containers/vector/version.cc: Likewise.

Zen5 tuning part 2: disable gather and scatter

We disable gathers for zen4. It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.

gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot
products in TSVC. Curiously enough breaking these out into microbenchmark
reversed the situation and it turns out that the performance depends on
how indices are distributed. gather is loss if indices are sequential,
neutral if they are random and win for some strides (4, 8).

This seems to be similar to earlier zens, so I think (especially for
backporting znver5 support) that it makes sense to be conistent and disable
gather unless we work out a good heuristics on when to use it. Since we
typically do not know the indices in advance, I don't see how that can be done.

I opened PR116582 with some examples of wins and loses

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for
ZNVER5.
(X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5.
(X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5.
(X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5.
(X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5.
(X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.

ipa: Don't disable function parameter analysis for fat LTO

Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects. Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

PR ipa/116410
* ipa-modref.cc (analyze_parms): Always analyze function parameter
for LTO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

[PR target/115921] Improve reassociation for rv64

As Jovan pointed out in pr115921, we're not reassociating expressions like this
on rv64:

(x & 0x3e) << 12

It generates something like this:

        li      a5,258048
        slli    a0,a0,12
        and     a0,a0,a5

We have a pattern that's designed to clean this up.  Essentially reassociating
the operations so that we don't need to load the constant resulting in
something like this:

        andi    a0,a0,63
        slli    a0,a0,12

That pattern wasn't working for certain constants due to its condition. The
condition is trying to avoid cases where this kind of reassociation would
hinder shadd generation on rv64.  That condition was just written poorly.

This patch tightens up that condition in a few ways.  First, there's no need to
worry about shadd cases if ZBA is not enabled.  Second we can't use shadd if
the shift value isn't 1, 2 or 3.  Finally rather than open-coding one of the
tests, we can use an existing operand predicate.

The net is we'll start performing this transformation in more cases on rv64
while still avoiding reassociation if it would spoil shadd generation.

PR target/115921
gcc/
* config/riscv/riscv.md (reassociate bitwise ops): Tighten test for
cases we do not want reassociate.

gcc/testsuite/
* gcc.target/riscv/pr115921.c: New test.

Zen5 tuning part 1: avoid FMA chains

testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//    recurrences
//    coupled recurrence

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations/2; nl++) {
        for (int i = 1; i < LEN_1D; i++) {
            a[i] = b[i-1] + c[i] * d[i];
            b[i] = a[i] + c[i] * e[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
znver5.
(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), output_offload_tables
wrote the output tables once per fork.

PR lto/116535

gcc/ChangeLog:

* lto-cgraph.cc (output_offload_tables): Remove offload_ frees.
* lto-streamer-out.cc (lto_output): Make call to it depend on
lto_get_out_decl_state ()->output_offload_tables_p.
* lto-streamer.h (struct lto_out_decl_state): Add
output_offload_tables_p field.
* tree-pass.h (ipa_write_optimization_summaries): Add bool argument.
* passes.cc (ipa_write_summaries_1): Add bool
output_offload_tables_p arg.
(ipa_write_summaries): Update call.
(ipa_write_optimization_summaries): Accept output_offload_tables_p.

gcc/lto/ChangeLog:

* lto.cc (stream_out): Update call to
ipa_write_optimization_summaries to pass true for first partition.

MAINTAINERS: Update my email address

* MAINTAINERS: Update my email address and add myself to DCO.

tree-optimization/116575 - avoid ICE with SLP mask_load_lane

The following avoids performing re-discovery with single lanes in
the attempt to for the use of mask_load_lane as rediscovery will
fail since a single lane of a mask load will appear permuted which
isn't supported.

PR tree-optimization/116575
* tree-vect-slp.cc (vect_analyze_slp): Properly compute
the mask argument for vect_load/store_lanes_supported.
When the load is masked for now avoid rediscovery.

* gcc.dg/vect/pr116575.c: New testcase.

i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

Do not assert NUM_POLY_INT_COEFFS != 1 early

The following moves the assert on NUM_POLY_INT_COEFFS != 1 after
INTEGER_CST processing.

* fold-const.cc (poly_int_binop): Move assert on
NUM_POLY_INT_COEFFS after INTEGER_CST processing.

lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501]

The following testcase is miscompiled.  The problem is in the last_ovf step.
The second operand has signed _BitInt(513) type but has the MSB clear,
so range_to_prec returns 512 for it (i.e. it fits into unsigned
_BitInt(512)).  Because of that the last step actually doesn't need to get
the most significant bit from the second operand, but the code was deciding
what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0,
otherwise sign-extend the last processed bit; but that in this case was set.
We don't want to treat the positive operand as if it was negative regardless
of the bit below that precision, and precN >= 0 indicates that the operand
is in the [0, inf) range.

2024-09-03  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/116501
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
In the last_ovf case, use build_zero_cst operand not just when
TYPE_UNSIGNED (typeN), but also when precN >= 0.

* gcc.dg/torture/bitint-73.c: New test.

ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double
scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly
may give the wrong answer for them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add kludge
to cope with ancient 32-bit ABIs.

ada: Plug loophole exposed by previous change

The change causes more temporaries to be created at call sites for unaligned
actual parameters, thus revealing that the machinery does not properly deal
with unconstrained nominal subtypes for them.

gcc/ada/

* gcc-interface/trans.cc (create_temporary): Deal with types whose
size is self-referential by allocating the maximum size.

ada: Fix internal error with Atomic Volatile_Full_Access object

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, unfortunately without adjusting the implementation.

gcc/ada/

* gcc-interface/trans.cc (get_atomic_access): Deal specifically with
nodes that are both Atomic and Volatile_Full_Access in Ada 2012.

ada: Pass unaligned record components by copy in calls on all platforms

This has historically been done only on platforms requiring the strict
alignment of memory references, but this can arguably be considered as
being mandated by the language on all of them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Take into
account the alignment of the field on all platforms.

ada: Fix internal error on pragma pack with discriminated record component

When updating the size after making a packable type in gnat_to_gnu_field,
we fail to clear it again when it is not constant.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size
after updating it if it is not constant.

ada: Simplify Note_Uplevel_Bound procedure

The procedure Note_Uplevel_Bound was implemented as a custom expression
tree walk. This change replaces this custom tree traversal by a more
idiomatic use of Traverse_Proc.

gcc/ada/

* exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor
to use the generic Traverse_Proc.
(Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the
previous second parameter was unused, so removed.

ada: Transform Length attribute references for non-Strict overflow mode.

The non-strict overflow checking code does a better job of eliminating
overflow checks if given an expression consisting only of predefined
operators (including relationals), literals, identifiers, and conditional
expressions. If it is both feasible and useful, rewrite a
Length attribute reference as such an expression. "Feasible" means
"index type is same type as attribute reference type, so we can rewrite without
using type conversions". "Useful" means "Overflow_Mode is something other than
Strict, so there is value in making overflow check elimination easier".

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense
to do so, then rewrite a Length attribute reference as an
equivalent conditional expression.

ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, but the implementation was not entirely adjusted.

In Ada 2012, it does not make sense to warn for the partial access to an
Atomic object if the object is also declared Volatile_Full_Access, since
the object will be accessed as a whole in this case (like in Ada 2022).

gcc/ada/

* sem_res.adb (Is_Atomic_Ref_With_Address): Rename into...
(Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the
implementation to exclude Volatile_Full_Access objects.
(Resolve_Indexed_Component): Adjust to above renaming.
(Resolve_Selected_Component): Likewise.

ada: Reject illegal array aggregates as per AI22-0106.

Implement the new legality rules of AI22-0106 which (as discussed in the AI)
are needed to disallow constructs whose semantics would otherwise be poorly
defined.

gcc/ada/

* sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new
legality rules of AI11-0106. Add code to avoid cascading error
messages.

ada: Fix Finalize_Storage_Only bug in b-i-p calls

Do not pass null for the Collection parameter when
Finalize_Storage_Only is in effect. If the collection
is null in that case, we will blow up later when we
deallocate the object.

gcc/ada/

* exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call):
Remove Finalize_Storage_Only from the code that checks whether to
pass null to the Collection parameter. Having done that, we don't
need to check for Is_Library_Level_Entity, because
No_Heap_Finalization requires that. And if we ever change
No_Heap_Finalization to allow nested access types, we will still
want to pass null. Note that the comment "Such a type lacks a
collection." is incorrect in the case of Finalize_Storage_Only;
such types have a collection.

SVE intrinsics: Fold constant operands for svmul.

This patch implements constant folding for svmul by calling
gimple_folder::fold_const_binary with tree_code MULT_EXPR.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svmul_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Try constant folding.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_mul_1.c: New test.

SVE intrinsics: Fold constant operands for svdiv.

This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_const_binary.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
New function to fold binary SVE intrinsics without overflow.
(gimple_folder::fold_const_binary): New helper function for
constant folding of SVE intrinsics.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.

SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.

This patch sets the stage for constant folding of binary operations for SVE
intrinsics:
In fold-const.cc, the code for folding vector constants was moved from
const_binop to a new function vector_const_binop. This function takes a
function pointer as argument specifying how to fold the vector elements.
The intention is to call vector_const_binop from the backend with an
aarch64-specific callback function.
The code in const_binop for folding operations where the first operand is a
vector constant and the second argument is an integer constant was also moved
into vector_const_binop to to allow folding of binary SVE intrinsics where
the second operand is an integer (_n).
To allow calling poly_int_binop from the backend, the latter was made public.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* fold-const.h: Declare vector_const_binop.
* fold-const.cc (const_binop): Remove cases for vector constants.
(vector_const_binop): New function that folds vector constants
element-wise.
(int_const_binop): Remove call to wide_int_binop.
(poly_int_binop): Add call to wide_int_binop.

Handle mixing REALPART/IMAGPART with other components in SLP groups

The following makes sure we handle a SLP load/store group from
a structure with complex and scalar members. This for example
happens in gcc.target/i386/pr106010-9a.c.

* tree-vect-slp.cc (vect_build_slp_tree_1): Handle mixing
all of handled components besides ARRAY_RANGE_REF, drop
handling of INDIRECT_REF.

Correctly handle store IFNs in vect_get_vector_types_for_stmt

Currently vect_get_vector_types_for_stmt only special-cases
IFN_MASK_STORE but there are now very many variants and simply
passing analysis without setting *VECTYPE will ICE duing SLP
discovery (noticed with IFN_SCATTER_STORE).  The following
properly uses internal_store_fn_p.  I also noticed we're
unnecessarily handing those again to determine the scalar type
but there should always be a data reference for them.

* tree-vect-stmts.cc (vect_get_vector_types_for_stmt):
Handle all internal_store_fn_p the same.  Remove special-casing
for the scalar_type of IFN_MASK_STORE.

i386: Support partial vectorized V2BF/V4BF smaxmin

This patch supports sminmax for partial vectorized V2BF/V4BF.

gcc/ChangeLog:

* config/i386/mmx.md (<code><mode>3): New define_expand for V2BF/V4BFsmaxmin

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: New test.

i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions.

gcc/ChangeLog:

* config/i386/mmx.md (VBF_32_64): New mode iterator for partial vectorized V2BF/V4BF.
(<insn><mode>3): New define_expand for plusminusmultdiv.
(sqrt<mode>2): New define_expand for sqrt.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: New test.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: New test.

RISC-V: Support form 1 of integer scalar .SAT_ADD

This patch would like to support the scalar signed ssadd pattern
for the RISC-V backend.  Aka

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))                  \
  sat_s_add_##T##_fmt_1 (T x, T y)             \
  {                                            \
    T sum = (UT)x + (UT)y;                     \
    return (x ^ y) < 0                         \
      ? sum                                    \
      : (sum ^ x) >= 0                         \
        ? sum                                  \
        : x < 0 ? MIN : MAX;                   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

Before this patch:
  10   │ sat_s_add_int64_t_fmt_1:
  11   │     mv   a5,a0
  12   │     add  a0,a0,a1
  13   │     xor  a1,a5,a1
  14   │     not  a1,a1
  15   │     xor  a4,a5,a0
  16   │     and  a1,a1,a4
  17   │     blt  a1,zero,.L5
  18   │     ret
  19   │ .L5:
  20   │     srai a5,a5,63
  21   │     li   a0,-1
  22   │     srli a0,a0,1
  23   │     xor  a0,a5,a0
  24   │     ret

After this patch:
  10   │ sat_s_add_int64_t_fmt_1:
  11   │     add  a2,a0,a1
  12   │     xor  a1,a0,a1
  13   │     xor  a5,a0,a2
  14   │     srli a5,a5,63
  15   │     srli a1,a1,63
  16   │     xori a1,a1,1
  17   │     and  a5,a5,a1
  18   │     srai a4,a0,63
  19   │     li   a3,-1
  20   │     srli a3,a3,1
  21   │     xor  a3,a3,a4
  22   │     neg  a4,a5
  23   │     and  a3,a3,a4
  24   │     addi a5,a5,-1
  25   │     and  a0,a2,a5
  26   │     or   a0,a0,a3
  27   │     ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func
decl for expanding ssadd.
* config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func
impl to gen the max int rtx.
(riscv_expand_ssadd): Add new func impl to expand the ssadd.
* config/riscv/riscv.md (ssadd<mode>3): Add new pattern for
signed integer .SAT_ADD.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_arith_data.h: Add test data.
* gcc.target/riscv/sat_s_add-1.c: New test.
* gcc.target/riscv/sat_s_add-2.c: New test.
* gcc.target/riscv/sat_s_add-3.c: New test.
* gcc.target/riscv/sat_s_add-4.c: New test.
* gcc.target/riscv/sat_s_add-run-1.c: New test.
* gcc.target/riscv/sat_s_add-run-2.c: New test.
* gcc.target/riscv/sat_s_add-run-3.c: New test.
* gcc.target/riscv/sat_s_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

MIPS: Support vector reduc for MSA

We have SHF.fmt and HADD_S/U.fmt with MSA, which can be used for
vector reduc.

For min/max for U8/S8, we can
SHF.B W1, W0, 0xb1  # swap byte inner every half
MIN.B W1, W1, W0
SHF.H W2, W1, 0xb1  # swap half inner every word
MIN.B W2, W2, W1
SHF.W W3, W2, 0xb1  # swap word inner every doubleword
MIN.B W4, W3, W2
SHF.W W4, W4, 0x4e  # swap the two doubleword
MIN.B W4, W4, W3

For plus of S8/U8, we can use HADD
HADD.H W0, W0, W0
HADD.W W0, W0, W0
HADD.D W0, W0, W0
SHF.W W1, W0, 0x4e  # swap the two doubleword
ADDV.D W1, W1, W0
COPY_S.B  T0, W1      # COPY_U.B for U8

We can do similar for S16/U16/S32/U32/S64/U64/FLOAT/DOUBLE.

gcc

* config/mips/mips-msa.md: (MSA_NO_HADD): we have HADD for
S8/U8/S16/U16/S32/U32 only.
(reduc_smin_scal_<mode>): New define pattern.
(reduc_smax_scal_<mode>): Ditto.
(reduc_umin_scal_<mode>): Ditto.
(reduc_umax_scal_<mode>): Ditto.
(reduc_plus_scal_<mode>): Ditto.
(reduc_plus_scal_v4si): Ditto.
(reduc_plus_scal_v8hi): Ditto.
(reduc_plus_scal_v16qi): Ditto.
(reduc_<optab>_scal_<mode>): Ditto.
* config/mips/mips-protos.h: New function mips_expand_msa_reduc.
* config/mips/mips.cc: New function mips_expand_msa_reduc.
* config/mips/mips.md: Define any_bitwise iterator.

gcc/testsuite:

* gcc.target/mips/msa-reduc.c: New tests.

testsuite: Fix optimize_one.c FAIL on i686-linux

The test FAILs on i686-linux because -mfpmath=sse is used without
-msse2 being enabled.

2024-09-02 Jakub Jelinek <jakub@redhat.com>

* gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.

[libstdc++-v3] [testsuite] improve future/*/poll.cc calibration

30_threads/future/members/poll.cc has calibration code that, on
systems with very low clock resolution, may spuriously fail to run.
Even when it does run, low resolution and reasonable
timeouts limit severely the viability of increasing the loop counts so
as to reduce measurement noise, so we end up with very noisy results.

On various vxworks targets, high iteration count (low-noise)
measurements confirmed that some of the operations that we expected to
be up to 100x slower than the fastest ones can run a little slower
than that and, with significant noise, may seem to be even slower,
comparatively.

Bump the factors up to 200x, so that we have plenty of margin over
measured results.

for  libstdc++-v3/ChangeLog

* testsuite/30_threads/future/members/poll.cc: Factor out
calibration, and run it unconditionally.  Lower its
strictness.  Bump wait_until_*'s slowness factor.