git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

Fixes: 3496ca4e6566 ("RISC-V: Add runtime invariant support")
riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case. It intends to do that however the code is broken (regression).

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
returned for libcall case.

Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #526
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

c: Add Walloc-size to warn about insufficient size in allocations [PR71219]

Add option Walloc-size that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to. Added to Wextra.

PR c/71219
gcc:
* doc/invoke.texi: Document -Walloc-size option.

gcc/c-family:

* c.opt (Walloc-size): New option.

gcc/c:
* c-typeck.cc (convert_for_assignment): Add warning.

gcc/testsuite:

* gcc.dg/Walloc-size-1.c: New test.
* gcc.dg/Walloc-size-2.c: New test.

Make genautomata.cc output reflect insn-attr.h expectation

genautomata was writing the insn_has_dfa_reservation_p function
inside of the CPU_UNITS_QUERY conditional when it shouldn't have.
Move insn_has_dfa_reservation_p outside of conditional group.

gcc/ChangeLog:

* genautomata.cc (write_automata): move endif

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

omp: Reorder call for TARGET_SIMD_CLONE_ADJUST

This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the arguments
and return types have been transformed into vector types. It also constructs
the adjuments and retval modifications after this call, allowing targets to
alter the types of the arguments and return of the clone prior to the
modifications to the function definition.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out code to
create return array and don't return new type.
(simd_clone_adjust_argument_types): Hoist out code that creates
ipa_param_body_adjustments and don't return them.
(simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized, create adjustments and return array
after the hook.
(expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized.

i386: Fix stack protector peephole2 operand predicate [PR112332]

PR target/112332

gcc/ChangeLog:

* config/i386/i386.md (stack_protexct_set_2 peephole2):
Use general_gr_operand as operand 4 predicate.

i386: Improve stack protector patterns and peephole2s

Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:

a. Explicitly generate scratch register as named pseudo. This allows
optimizers to eventually reuse the zero value in the register.

b. Allow scratch register in different mode (SWI48) than PTR mode:

 d000: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
 d007: 00 00
 d009: 48 89 44 24 08 mov %rax,0x8(%rsp)
 d00e: 8b 87 e0 01 00 00 mov 0x1e0(%rdi),%eax

 SImode moves on x86 zero-extend to the whole DImode register,
 so stack protector paranoia is not compromised.

c. Relax peephole2 constraint that stack protector scratch register
 must match new initialized register. This relaxation substantially
 improves peephole2 opportunities, and generates sequences like:

 a310: 65 4c 8b 34 25 28 00 mov %gs:0x28,%r14
 a317: 00 00
 a319: 4c 89 74 24 08 mov %r14,0x8(%rsp)
 a31e: 4c 8b b7 98 00 00 00 mov 0x98(%rdi),%r14

 We have to ensure the new scratch is dead in front of the sequence.

The patch also fixes omission of earlyclobbers for all alternatives of
new initialized register in *stack_protect_set_3, avoiding the need for
reg_overlap_mentioned_p constraint. Earlyclobbers are per alternative,
not per operand.

Also, instructions are already valid in peephole2 pass, so we don't
have to explicitly re-check their operands for validity.

gcc/ChangeLog:

* config/i386/i386.md (stack_protect_set): Explicitly
generate scratch register in word mode.
(@stack_protect_set_1_<mode>): Rename to ...
(@stack_protect_set_1_<PTR:mode>_<SWI48:mode>): ... this.
Use SWI48 mode iterator to match scratch register.
(stack_protexct_set_1 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.
(*stack_protect_set_2_<mode>): Rename to ...
(*stack_protect_set_2_<mode>_si): .. this.
(*stack_protect_set_3): Rename to ...
(*stack_protect_set_2_<mode>_di): ... this.
Use PTR mode iterator to match stack protector memory move.
Use earlyclobber for all alternatives of operand 1.
(stack_protexct_set_2 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.

PR modula2/102989: reimplement overflow detection in ztype though WIDE_INT_MAX_PRECISION

The ZTYPE in iso modula2 is used to denote intemediate ordinal type const
expressions and these are always converted into the
approriate language or user ordinal type prior to code generation.
The increase of bits supported by _BitInt causes the modula2 largeconst.mod
regression failure tests to pass. The largeconst.mod test has been
increased to fail, however the char at a time overflow check is now too slow
to detect failure. The overflow detection for the ZTYPE has been
rewritten to check against exceeding WIDE_INT_MAX_PRECISION (many orders of
magnitude faster).

gcc/m2/ChangeLog:

PR modula2/102989
* gm2-compiler/SymbolTable.mod (OverflowZType): Import from m2expr.
(ConstantStringExceedsZType): Remove import.
(GetConstLitType): Replace ConstantStringExceedsZType with OverflowZType.
* gm2-gcc/m2decl.cc (m2decl_ConstantStringExceedsZType): Remove.
(m2decl_BuildConstLiteralNumber): Re-write.
* gm2-gcc/m2decl.def (ConstantStringExceedsZType): Remove.
* gm2-gcc/m2decl.h (m2decl_ConstantStringExceedsZType): Remove.
* gm2-gcc/m2expr.cc (m2expr_StrToWideInt): Rewrite to check overflow.
(m2expr_OverflowZType): New function.
(ToWideInt): New function.
* gm2-gcc/m2expr.def (OverflowZType): New procedure function declaration.
* gm2-gcc/m2expr.h (m2expr_OverflowZType): New prototype.

gcc/testsuite/ChangeLog:

PR modula2/102989
* gm2/pim/fail/largeconst.mod: Updated foo to an outrageous value.
* gm2/pim/fail/largeconst2.mod: Duplicate test removed.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Support vundefine intrinsics for tuple types

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (vundefined): Add vundefine
intrinsics for tuple types.
* config/riscv/riscv-vector-builtins.cc: Ditto.
* config/riscv/vector.md (@vundefined<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple_vundefined.c: New test.

NFC: Fix whitespace

Notice there is a whitspace issue in previous commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f66b2fc122b8a17591afbb881d580b32e8ddb708

Sorry for missing fixing this whitespace.

Committed as it is obvious.

gcc/ChangeLog:

* tree-vect-slp.cc (vect_build_slp_tree_1): Fix whitespace.

Daily bump.

analyzer: move class record_layout to its own .h/.cc

No functional change intended.

gcc/ChangeLog:
* Makefile.in (ANALYZER_OBJS): Add analyzer/record-layout.o.

gcc/analyzer/ChangeLog:
* record-layout.cc: New file, based on material in region-model.cc.
* record-layout.h: Likewise.
* region-model.cc: Include "analyzer/record-layout.h".
(class record_layout): Move to record-layout.cc and .h

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate MACRO_MAP_EXPANSION_POINT_LOCATION

This patch eliminates the function "MACRO_MAP_EXPANSION_POINT_LOCATION"
(which hasn't been a macro since r6-739-g0501dbd932a7e9) in favor of
a new line_map_macro::get_expansion_point_location accessor.

No functional change intended.

gcc/c-family/ChangeLog:
* c-warn.cc (warn_for_multistatement_macros): Update for removal
of MACRO_MAP_EXPANSION_POINT_LOCATION.

gcc/cp/ChangeLog:
* module.cc (ordinary_loc_of): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
(module_state::note_location): Update for renaming of field.
(module_state::write_macro_maps): Likewise.

gcc/ChangeLog:
* input.cc (dump_location_info): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc):
Likewise.

libcpp/ChangeLog:
* include/line-map.h
(line_map_macro::get_expansion_point_location): New accessor.
(line_map_macro::expansion): Rename field to...
(line_map_macro::mexpansion): Rename field to...
(MACRO_MAP_EXPANSION_POINT_LOCATION): Delete this function.
* line-map.cc (linemap_enter_macro): Update for renaming of field.
(linemap_macro_map_loc_to_exp_point): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

opts.cc: fix comment about DOCUMENTATION_ROOT_URL

gcc/ChangeLog:
* opts.cc (get_option_url): Update comment; the requirement to
pass DOCUMENTATION_ROOT_URL's value via -D was removed in
r10-8065-ge33a1eae25b8a8.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

pretty-print: gracefully handle null URLs

gcc/ChangeLog:
* pretty-print.cc (pretty_printer::pretty_printer): Initialize
m_skipping_null_url.
(pp_begin_url): Handle URL being null.
(pp_end_url): Likewise.
(selftest::test_null_urls): New.
(selftest::pretty_print_cc_tests): Call it.
* pretty-print.h (pretty_printer::m_skipping_null_url): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

VECT: Support SLP MASK_LEN_GATHER_LOAD with conditional mask

This patch leverage current MASK_GATHER_LOAD to support SLP MASK_LEN_GATHER_LOAD with condtional mask.

Unconditional MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) SLP is not included in this patch
since it seems that we can't support it in the middle-end:

FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885)

May be we should support GATHER_LOAD explictily in RISC-V backend to walk around this issue.

I am gonna support GATHER_LOAD explictly work around in RISC-V backend.

This patch also adds conditional gather load test since there is no conditional gather load test.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

bpf: Improvements in CO-RE builtins implementation.

This patch moved the processing of attribute preserve_access_index to
its own independent pass in a gimple lowering pass.
This approach is more consistent with the implementation of the CO-RE
builtins when used explicitly in the code. The attributed type accesses
are now early converted to __builtin_core_reloc builtin instead of being
kept as an expression in code through out all of the middle-end.
This disables the compiler to optimize out or manipulate the expression
using the local defined type, instead of assuming nothing is known about
this expression, as it should be the case in all of the CO-RE
relocations.

In the process, also the __builtin_preserve_access_index has been
improved to generate code for more complex expressions that would
require more then one CO-RE relocation.
This turned out to be a requirement, since bpf-next selftests would rely on
loop unrolling in order to convert an undefined index array access into a
defined one. This seemed extreme to expect for the unroll to happen, and for
that reason GCC still generates correct code in such scenarios, even when index
access is never predictable or unrolling does not occur.

gcc/ChangeLog:
* config/bpf/bpf-passes.def (pass_lower_bpf_core): Added pass.
* config/bpf/bpf-protos.h: Added prototype for new pass.
* config/bpf/bpf.cc (bpf_delegitimize_address): New function.
* config/bpf/bpf.md (mov_reloc_core<MM:mode>): Prefixed
name with '*'.
* config/bpf/core-builtins.cc (cr_builtins) Added access_node to
struct.
(is_attr_preserve_access): Improved check.
(core_field_info): Make use of root_for_core_field_info
function.
(process_field_expr): Adapted to new functions.
(pack_type): Small improvement.
(bpf_handle_plugin_finish_type): Adapted to GTY(()).
(bpf_init_core_builtins): Changed to new function names.
(construct_builtin_core_reloc): Improved implementation.
(bpf_resolve_overloaded_core_builtin): Changed how
__builtin_preserve_access_index is converted.
(compute_field_expr): Corrected implementation. Added
access_node argument.
(bpf_core_get_index): Added valid argument.
(root_for_core_field_info, pack_field_expr)
(core_expr_with_field_expr_plus_base, make_core_safe_access_index)
(replace_core_access_index_comp_expr, maybe_get_base_for_field_expr)
(core_access_clean, core_is_access_index, core_mark_as_access_index)
(make_gimple_core_safe_access_index, execute_lower_bpf_core)
(make_pass_lower_bpf_core): Added functions.
(pass_data_lower_bpf_core): New pass struct.
(pass_lower_bpf_core): New gimple_opt_pass class.
(pack_field_expr_for_preserve_field)
(bpf_replace_core_move_operands): Removed function.
(bpf_enum_value_kind): Added GTY(()).
* config/bpf/core-builtins.h (bpf_field_info_kind, bpf_type_id_kind)
(bpf_type_info_kind, bpf_enum_value_kind): New enum.
* config/bpf/t-bpf: Added pass bpf-passes.def to PASSES_EXTRA.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: New test.
* gcc.target/bpf/core-attr-6.c: New test.
* gcc.target/bpf/core-builtin-1.c: Corrected
* gcc.target/bpf/core-builtin-enumvalue-opt.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-enumvalue.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-exprlist-1.c: New test.
* gcc.target/bpf/core-builtin-exprlist-2.c: New test.
* gcc.target/bpf/core-builtin-exprlist-3.c: New test.
* gcc.target/bpf/core-builtin-exprlist-4.c: New test.
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Extra tests

gcc: config: microblaze: fix cpu version check

The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options. By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

gcc/ChangeLog:

* config/microblaze/microblaze.cc: Fix mcpu version check.

gcc/testsuite/ChangeLog:

* gcc.target/microblaze/isa/bshift.c: Bump to mcpu=v10.0.
* gcc.target/microblaze/isa/div.c: Ditto.
* gcc.target/microblaze/isa/fcmp1.c: Ditto.
* gcc.target/microblaze/isa/fcmp2.c: Ditto.
* gcc.target/microblaze/isa/fcmp3.c: Ditto.
* gcc.target/microblaze/isa/fcmp4.c: Ditto.
* gcc.target/microblaze/isa/fcvt.c: Ditto.
* gcc.target/microblaze/isa/float.c: Ditto.
* gcc.target/microblaze/isa/fsqrt.c: Ditto.
* gcc.target/microblaze/isa/mul-bshift-pcmp.c: Ditto.
* gcc.target/microblaze/isa/mul-bshift.c: Ditto.
* gcc.target/microblaze/isa/mul.c: Ditto.
* gcc.target/microblaze/isa/mulh-bshift-pcmp.c: Ditto.
* gcc.target/microblaze/isa/mulh.c: Ditto.
* gcc.target/microblaze/isa/nofcmp.c: Ditto.
* gcc.target/microblaze/isa/nofloat.c: Ditto.
* gcc.target/microblaze/isa/pcmp.c: Ditto.
* gcc.target/microblaze/isa/vanilla.c: Ditto.
* gcc.target/microblaze/microblaze.exp: Ditto.

Signed-off-by: Neal Frager <neal.frager@amd.com>
Signed-off-by: Michael J. Eager <eager@eagercon.com>

RISC-V: Require a extension for testcases with atomic insns

Add testsuite infrastructure for the A extension and use it to require the A
extension for dg-do run and add the add extension for non-A dg-do compile.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Add A extension to
dg-options for dg-do compile.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/inline-atomics-2.c: Ditto.
* gcc.target/riscv/inline-atomics-3.c: Require A extension for dg-do
run.
* gcc.target/riscv/inline-atomics-4.c: Ditto.
* gcc.target/riscv/inline-atomics-5.c: Ditto.
* gcc.target/riscv/inline-atomics-6.c: Ditto.
* gcc.target/riscv/inline-atomics-7.c: Ditto.
* gcc.target/riscv/inline-atomics-8.c: Ditto.
* lib/target-supports.exp: Add testing infrastructure to require the A
extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Let non-atomic targets use optimized amo loads/stores

Non-atomic targets are currently prevented from using the optimized fencing for
seq_cst load/seq_cst store. This patch removes that constraint.

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md (atomic_load_rvwmo<mode>): Remove
TARGET_ATOMIC constraint
(atomic_store_rvwmo<mode>): Ditto.
* config/riscv/sync-ztso.md (atomic_load_ztso<mode>): Ditto.
(atomic_store_ztso<mode>): Ditto.
* config/riscv/sync.md (atomic_load<mode>): Ditto.
(atomic_store<mode>): Ditto.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

riscv: thead: Add support for the XTheadFMemIdx ISA extension

The XTheadFMemIdx ISA extension provides additional load and store
instructions for floating-point registers with new addressing modes.

The following memory accesses types are supported:
* load/store: [w,d] (single-precision FP, double-precision FP)

The following addressing modes are supported:
* register offset with additional immediate offset (4 instructions):
flr<type>, fsr<type>
* zero-extended register offset with additional immediate offset
(4 instructions): flur<type>, fsur<type>

These addressing modes are also part of the similar XTheadMemIdx
ISA extension support, whose code is reused and extended to support
floating-point registers.

One challenge that this patch needs to solve are GP registers in FP-mode
(e.g. "(reg:DF a2)"), which cannot be handled by the XTheadFMemIdx
instructions. Such registers are the result of independent
optimizations, which can happen after register allocation.
This patch uses a simple but efficient method to address this:
add a dependency for XTheadMemIdx to XTheadFMemIdx optimizations.
This allows to use the instructions from XTheadMemIdx in case
of such registers.

The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite and SPEC CPU 2017 intrate (base&peak).

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadFMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadFMemIdx.
* config/riscv/riscv.h (HARDFP_REG_P): New macro.
* config/riscv/thead.cc (is_fmemidx_mode): New function.
(th_memidx_classify_address_index): Add support for XTheadFMemIdx.
(th_fmemidx_output_index): New function.
(th_output_move): Add support for XTheadFMemIdx.
* config/riscv/thead.md (TH_M_ANYF): New mode iterator.
(TH_M_NOEXTF): Likewise.
(*th_fmemidx_movsf_hardfloat): New INSN.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.
(*th_fmemidx_I_a): Likewise.
(*th_fmemidx_I_c): Likewise.
(*th_fmemidx_US_a): Likewise.
(*th_fmemidx_US_c): Likewise.
(*th_fmemidx_UZ_a): Likewise.
(*th_fmemidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-index.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: thead: Add support for the XTheadMemIdx ISA extension

The XTheadMemIdx ISA extension provides a additional load and store
instructions with new addressing modes.

The following memory accesses types are supported:
* load: b,bu,h,hu,w,wu,d
* store: b,h,w,d

The following addressing modes are supported:
* immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
 l<ltype>.ia, l<ltype>.ib, s<stype>.ia, s<stype>.ib
* register offset with additional immediate offset (11 instructions):
 lr<ltype>, sr<stype>
* zero-extended register offset with additional immediate offset
 (11 instructions): lur<ltype>, sur<stype>

The RISC-V base ISA does not support index registers, so the changes
are kept separate from the RISC-V standard support as much as possible.

To combine the shift/multiply instructions into the memory access
instructions, this patch comes with a few insn_and_split optimizations
that allow the combiner to do this task.

Handling the different cases of extensions results in a couple of INSNs
that look redundant on first view, but they are just the equivalence
of what we already have for Zbb as well. The only difference is, that
we have much more load instructions.

We already have a constraint with the name 'th_f_fmv', therefore,
the new constraints follow this pattern and have the same length
as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').

The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite, GCC bootstrap build, and
SPEC CPU 2017 intrate (base&peak) on C920.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/constraints.md (th_m_mia): New constraint.
(th_m_mib): Likewise.
(th_m_mir): Likewise.
(th_m_miu): Likewise.
* config/riscv/riscv-protos.h (enum riscv_address_type):
Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
and ADDRESS_REG_WB and their documentation.
(struct riscv_address_info): Add new field 'shift' and
document the field usage for the new address types.
(riscv_valid_base_register_p): New prototype.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
(riscv_classify_address): Call th_classify_address() on top.
(riscv_output_move): Call th_output_move() on top.
(riscv_print_operand_address): Call th_print_operand_address()
on top.
* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
(HAVE_PRE_MODIFY_DISP): Likewise.
* config/riscv/riscv.md (zero_extendqi<SUPERQI:mode>2): Disable
for XTheadMemIdx.
(*zero_extendqi<SUPERQI:mode>2_internal): Convert to expand,
create INSN with same name and disable it for XTheadMemIdx.
(extendsidi2): Likewise.
(*extendsidi2_internal): Disable for XTheadMemIdx.
* config/riscv/thead.cc (valid_signed_immediate): New helper
function.
(th_memidx_classify_address_modify): New function.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_output_modify): Likewise.
(is_memidx_mode): Likewise.
(th_memidx_classify_address_index): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_memidx_output_index): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/thead.md (*th_memidx_operand): New splitter.
(*th_memidx_zero_extendqi<SUPERQI:mode>2): New INSN.
(*th_memidx_extendsidi2): Likewise.
(*th_memidx_zero_extendsidi2): Likewise.
(*th_memidx_zero_extendhi<GPR:mode>2): Likewise.
(*th_memidx_extend<SHORT:mode><SUPERQI:mode>2): Likewise.
(*th_memidx_bb_zero_extendsidi2): Likewise.
(*th_memidx_bb_zero_extendhi<GPR:mode>2): Likewise.
(*th_memidx_bb_extendhi<GPR:mode>2): Likewise.
(*th_memidx_bb_extendqi<SUPERQI:mode>2): Likewise.
(TH_M_ANYI): New mode iterator.
(TH_M_NOEXTI): Likewise.
(*th_memidx_I_a): New combiner optimization.
(*th_memidx_I_b): Likewise.
(*th_memidx_I_c): Likewise.
(*th_memidx_US_a): Likewise.
(*th_memidx_US_b): Likewise.
(*th_memidx_US_c): Likewise.
(*th_memidx_UZ_a): Likewise.
(*th_memidx_UZ_b): Likewise.
(*th_memidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h: New test.
* gcc.target/riscv/xtheadmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-index.c: New test.
* gcc.target/riscv/xtheadmemidx-modify-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-modify.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex.c: New test.

rs6000, Add missing overloaded bcd builtin tests, documentation

Currently we have the documentation for __builtin_vec_bcdsub_{eq,gt,lt} but
not for __builtin_bcdsub_{gl}e, this patch is to supplement the descriptions
for them. Although they are mainly for __builtin_bcdcmp{ge,le}, we already
have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this patch
adds the corresponding explicit test cases as well.

gcc/ChangeLog:
* doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
documentation for the builti-ins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/bcd-3.c (do_sub_ge, do_suble): Add functions
to test builtins __builtin_bcdsub_ge and __builtin_bcdsub_le.

gcc: config: microblaze: fix cpu version check

The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options. By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Fix incorrect warning with -mcpu=10.0:
 warning: '-mxl-multiply-high' can be used only with
 '-mcpu=v6.00.a' or greater

Signed-off-by: Neal Frager <neal.frager@amd.com>
Signed-off-by: Michael J. Eager <eager@eagercon.com>

[RA]: Fixing LRA cycling for multi-reg variable containing a fixed reg

PR111971 test case uses a multi-reg variable containing a fixed reg. LRA
rejects such multi-reg because of this when matching the constraint for
an asm insn. The rejection results in LRA cycling. The patch fixes this issue.

gcc/ChangeLog:

PR rtl-optimization/111971
* lra-constraints.cc: (process_alt_operands): Don't check start
hard regs for regs originated from register variables.

gcc/testsuite/ChangeLog:

PR rtl-optimization/111971
* gcc.target/powerpc/pr111971.c: New test.

Add OpenACC 'acc_map_data' variant to 'libgomp.oacc-c-c++-common/deep-copy-8.c'

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC
'acc_map_data' variant.

RISC-V: Add vector fmin/fmax expanders.

This patch adds expanders for fmin and fmax. As per RISC-V V Spec 1.0
vfmin/vfmax are IEEE 754-2019 compliant which differs from IEEE 754-2008
that fmin/fmax require (particularly in the signaling-NaN handling).
Therefore the pattern conditions include a !HONOR_SNANS.

gcc/ChangeLog:

* config/riscv/autovec.md (<ieee_fmaxmin_op><mode>3): fmax/fmin
expanders.
(cond_<ieee_fmaxmin_op><mode>): Ditto.
(cond_len_<ieee_fmaxmin_op><mode>): Ditto.
(reduc_fmax_scal_<mode>): Ditto.
(reduc_fmin_scal_<mode>): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add fmin/fmax.
* config/riscv/vector-iterators.md (fmin): New UNSPEC.
(UNSPEC_VFMIN): Ditto.
* config/riscv/vector.md (@pred_<ieee_fmaxmin_op><mode>): Add
UNSPEC insn patterns.
(@pred_<ieee_fmaxmin_op><mode>_scalar): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Remove
-ffast-math.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh_run-10.c: New test.

genemit: Split insn-emit.cc into several partitions.

On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc). The available patterns are
written to the given files in a sequential fashion.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.

hardcfr: support checking at abnormal edges [PR111943]

Control flow redundancy may choose abnormal edges for early checking,
but that breaks because we can't insert checks on such edges.

Introduce conditional checking on the dest block of abnormal edges,
and leave it for the optimizer to drop the conditional.

for  gcc/ChangeLog

PR tree-optimization/111943
* gimple-harden-control-flow.cc: Adjust copyright year.
(rt_bb_visited): Add vfalse and vtrue data members.
Zero-initialize them in the ctor.
(rt_bb_visited::insert_exit_check_on_edge): Upon encountering
abnormal edges, insert initializers for vfalse and vtrue on
entry, and insert the check sequence guarded by a conditional
in the dest block.

for  libgcc/ChangeLog

* hardcfr.c: Adjust copyright year.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111943
* gcc.dg/harden-cfr-pr111943.c: New.

tree-optimization/112305 - SCEV cprop and conditional undefined overflow

The following adjusts final value replacement to also rewrite the
replacement to defined overflow behavior if there's conditionally
evaluated stmts (with possibly undefined overflow), not only when
we "folded casts". The patch hooks into expression_expensive for
this.

PR tree-optimization/112305
* tree-scalar-evolution.h (expression_expensive): Adjust.
* tree-scalar-evolution.cc (expression_expensive): Record
when we see a COND_EXPR.
(final_value_replacement_loop): When the replacement contains
a COND_EXPR, rewrite it to defined overflow.
* tree-ssa-loop-ivopts.cc (may_eliminate_iv): Adjust.

* gcc.dg/torture/pr112305.c: New testcase.

d: Clean-up unused variable assignments after interface change

The lowering done for invoking `new' on a single dimension array was
moved from the code generator to the front-end semantic pass in
r14-4996. This removes the detritus left behind in the code generator
from that deletion.

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (NewExp *)): Remove unused assignments.

LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

PR target/112299
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.

RISC-V: Add assert of the number of vmerge in autovec cond testcases

This patch adds more asserts about the vmerge insns which is intended
to ensure better performance for cond autovec.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vmerge assert.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-10.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-11.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-5.c: New test.

match.pd: Support combine cond_len_op + vec_cond similar to cond_op

This patch adds combine cond_len_op and vec_cond to cond_len_op like
cond_op.

Consider this code (RISC-V target):
 void
 foo (uint8_t *__restrict x, uint8_t *__restrict y, uint8_t *__restrict z,
 uint8_t *__restrict pred, uint8_t *__restrict merged, int n)
 {
 for (int i = 0; i < n; ++i)
 x[i] = pred[i] != 1 ? y[i] / z[i] : merged[i];
 }

Before this patch:
 ...
 vect_iftmp.18_71 = .COND_LEN_DIV (mask__31.11_61, vect__5.14_65, vect__7.17_69, { 0, ... }, _86, 0);
 vect_iftmp.23_78 = .VCOND_MASK (mask__31.11_61, vect_iftmp.18_71, vect_iftmp.22_77);
 ...

After this patch:
 ...
 _30 = .COND_LEN_DIV (mask__31.16_61, vect__5.19_65, vect__7.22_69, vect_iftmp.27_77, _85, 0);
 ...

gcc/ChangeLog:

* gimple-match.h (gimple_match_op::gimple_match_op):
Add interfaces for more arguments.
(gimple_match_op::set_op): Add interfaces for more arguments.
* match.pd: Add support of combining cond_len_op + vec_cond

Fix incorrect option mask and avx512cd target push

gcc/ChangeLog:

* config/i386/avx512cdintrin.h (target): Push evex512 for
avx512cd.
* config/i386/avx512vlintrin.h (target): Split avx512cdvl part
out from avx512vl.
* config/i386/i386-builtin.def (BDESC): Do not check evex512
for builtins not needed.

RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

Hi,

This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.

Consider this code:
 void
 foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
 int64_t *__restrict pred, int n)
 {
 for (int i = 0; i < n; i += 1)
 {
 r[i] = pred[i] ? (_Float16) a[i] : b[i];
 }
 }

Before this patch:
 ...
 vfncvt.f.f.w v2,v2
 vmerge.vvm v1,v1,v2,v0
 vse16.v v1,0(a0)
 ...

After this patch:
 ...
 vfncvt.f.f.w v1,v2,v0.t
 vse16.v v1,0(a0)
 ...

gcc/ChangeLog:

* config/riscv/autovec.md (<float_cvt><mode><vnnconvert>2):
Change to define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.

Fix wrong code due to incorrect define_split

-(define_split
- [(set (match_operand:V2HI 0 "register_operand")
- (eq:V2HI
- (eq:V2HI
- (us_minus:V2HI
- (match_operand:V2HI 1 "register_operand")
- (match_operand:V2HI 2 "register_operand"))
- (match_operand:V2HI 3 "const0_operand"))
- (match_operand:V2HI 4 "const0_operand")))]
- "TARGET_SSE4_1"
- [(set (match_dup 0)
- (umin:V2HI (match_dup 1) (match_dup 2)))
- (set (match_dup 0)
- (eq:V2HI (match_dup 0) (match_dup 2)))])

the splitter is wrong when op1 == op2.(the original pattern returns 0, after split, it returns 1)
So remove the splitter.

Also extend another define_split to define_insn_and_split to handle
below pattern

494(set (reg:V4QI 112)
495 (unspec:V4QI [
496 (subreg:V4QI (reg:V2HF 111 [ bf ]) 0)
497 (subreg:V4QI (reg:V2HF 110 [ af ]) 0)
498 (subreg:V4QI (eq:V2HI (eq:V2HI (reg:V2HI 105)
499 (const_vector:V2HI [
500 (const_int 0 [0]) repeated x2
501 ]))
502 (const_vector:V2HI [
503 (const_int 0 [0]) repeated x2
504 ])) 0)
505 ] UNSPEC_BLENDV))

define_split doesn't work since pass_combine assume it produces at
most 2 insns after split, but here it produces 3 since we need to move
const0_rtx (V2HImode) to reg. The move insn can be eliminated later.

gcc/ChangeLog:

PR target/112276
* config/i386/mmx.md (*mmx_pblendvb_v8qi_1): Change
define_split to define_insn_and_split to handle
immediate_operand for comparison.
(*mmx_pblendvb_v8qi_2): Ditto.
(*mmx_pblendvb_<mode>_1): Ditto.
(*mmx_pblendvb_v4qi_2): Ditto.
(<code><mode>3): Remove define_split after it.
(<code>v8qi3): Ditto.
(<code><mode>3): Ditto.
(<ode>v2hi3): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: Adjust testcase.
* gcc.target/i386/pr112276.c: New test.

MATCH: Add some more value_replacement simplifications to match

This moves a few more value_replacements simplifications to match.
/* a == 1 ? b : a * b -> a * b */
/* a == 1 ? b : b / a -> b / a */
/* a == -1 ? b : a & b -> a & b */

Also adds a testcase to show can we catch these where value_replacement would not
(but other passes would).

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`a == 1 ? b : a OP b`): New pattern.
(`a == -1 ? b : a & b`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-4.c: New test.

MATCH: first of the value replacement moving from phiopt

This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.

This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c is an example there).

Changes since v1:
* v2: Add an extra testcase to showcase improvements at -O1.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: (`a == 0 ? b : b + a`,
`a == 0 ? b : b - a`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cond-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1a.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-2.c: New test.

Daily bump.

i386: Zhaoxin yongfeng enablement

Enable -march/-mtune=yongfeng. Costs and tunings are set according
to the characteristics of the processor. Add a new .md file to describe
yongfeng processor.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng.
* common/config/i386/i386-common.cc: Add yongfeng.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_YONGFENG.
* config.gcc: Add yongfeng.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize yongfeng processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng.
* config/i386/i386-options.cc (m_YONGFENG): New definition.
(m_ZHAOXIN): Ditto.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG.
* config/i386/i386.md: Add yongfeng.
* config/i386/lujiazui.md: Fix typo.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add yongfeng costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace
m_LUJIAZUI with m_ZHAOXIN.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
(X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about yongfeng.
* doc/invoke.texi: Ditto.
* config/i386/yongfeng.md: New file to describe yongfeng processor.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv32.C: Handle new -march.
* gcc.target/i386/funcspec-56.inc: Ditto.

libstdc++: [_GLIBCXX_INLINE_VERSION] Add comment on emul TLS symbols

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu-versioned-namespace.ver: Add comment on recently
added emul TLS symbols.

libstdc++: [_GLIBCXX_INLINE_VERSION] Un-weak handle_contract_violation

libstdc++-v3/ChangeLog:

* src/experimental/contract.cc
[_GLIBCXX_INLINE_VERSION](handle_contract_violation): Rework comment.
Remove weak attribute.

configure, fixincludes: Add change missed in r14-4825.

This corrects an oversight in the r14-4825 commit.

fixincludes/ChangeLog:

* configure: Regenerate.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

PR 111157 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.

This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed." Unfortunately, doing so is tricky. First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary. Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.

To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt. However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).

Therefore this patch:

 1) marks constants that IPA-modref has in its kill list with a new
 "killed" flag, and
 2) prunes the list from entries with this flag after materialization
 and IPA-CP transformation is done using the template introduced in
 the previous patch

It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.

gcc/ChangeLog:

2023-10-27 Martin Jambor <mjambor@suse.cz>

PR ipa/111157
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed. Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.

gcc/testsuite/ChangeLog:

2023-09-18 Martin Jambor <mjambor@suse.cz>

PR ipa/111157
* gcc.dg/lto/pr111157_0.c: New test.
* gcc.dg/lto/pr111157_1.c: Second file of the same new test.

ipa-cp: Templatize filtering of m_agg_values

PR 111157 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra. In order to re-use code,
this patch turns the common bit into a template.

The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.

gcc/ChangeLog:

2023-09-13 Martin Jambor <mjambor@suse.cz>

PR ipa/111157
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.

RISC-V: Make rv32i_zcmp testcase more robust

GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

ARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

This patch optimizes PR middle-end/101955 for the ARC backend. On ARC
CPUs with a barrel shifter, using two shifts is optimal as:

 asl_s r0,r0,31
 asr_s r0,r0,31

but without a barrel shifter, GCC -O2 -mcpu=em currently generates:

 and r2,r0,1
 ror r2,r2
 add.f 0,r2,r2
 sbc r0,r0,r0

with this patch, we now generate the smaller, faster and non-flags
clobbering:

 bmsk_s r0,r0,0
 neg_s r0,r0

2023-10-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/101955
* config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
to convert sign extract of the least significant bit into an
AND $1 then a NEG when !TARGET_BARREL_SHIFTER.

gcc/testsuite/ChangeLog
PR middle-end/101955
* gcc.target/arc/pr101955.c: New test case.

ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc. The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.

The motivating example is derived from PR rtl-optimization/110717.

struct S { int a : 5; };
unsigned int foo (struct S *p) {
 return p->a;
}

With a barrel shifter, GCC -O2 generates the reasonable:

foo: ldb_s r0,[r0]
 asl_s r0,r0,27
 j_s.d [blink]
 asr_s r0,r0,27

What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.

Trying 8, 9 -> 11:
 8: r158:SI=r157:QI#0<<0x3
 REG_DEAD r157:QI
 9: r159:SI=sign_extend(r158:SI#0)
 REG_DEAD r158:SI
 11: r155:SI=r159:SI>>0x3
 REG_DEAD r159:SI

Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops. This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.

Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:

foo: ldb_s r0,[r0]
 mov lp_count,27
 lp 2f
 add r0,r0,r0
 nop
2: # end single insn loop
 mov lp_count,27
 lp 2f
 asr r0,r0
 nop
2: # end single insn loop
 j_s [blink]

which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:

foo: ldb_s r0,[r0]
 mov_s r2,0 ;3
 add3 r0,r2,r0
 sexb_s r0,r0
 asr_s r0,r0
 asr_s r0,r0
 j_s.d [blink]
 asr_s r0,r0

which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.

2023-10-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
Provide reasonable values for SHIFTS and ROTATES by constant
bit counts depending upon TARGET_BARREL_SHIFTER.
(arc_insn_cost): Use insn attributes if the instruction is
recognized. Avoid calling get_attr_length for type "multi",
i.e. define_insn_and_split patterns without explicit type.
Fall-back to set_rtx_cost for single_set and pattern_cost
otherwise.
* config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
(BRANCH_COST): Improve/correct definition.
(LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.

ARC: Improved SImode shifts and rotates with -mswap.

This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap. The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits. Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.

As a representative example:
int shl20 (int x) { return x << 20; }

GCC with -O2 -mcpu=em -mswap would previously generate:

shl20: mov lp_count,10
 lp 2f
 add r0,r0,r0
 add r0,r0,r0
2: # end single insn loop
 j_s [blink]

with this patch we now generate:

shl20: mov_s r2,0 ;3
 lsl16 r0,r0
 add3 r0,r2,r0
 j_s.d [blink]
 asl_s r0,r0

Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.

2023-10-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP.
(arc_split_ashr): Use swap and sign-extend on TARGET_SWAP.
(arc_split_lshr): Use lsr16 on TARGET_SWAP.
(arc_split_rotl): Use swap on TARGET_SWAP.
(arc_split_rotr): Likewise.
* config/arc/arc.md (ANY_ROTATE): New code iterator.
(<ANY_ROTATE>si2_cnt16): New define_insn for alternate form of
swap instruction on TARGET_SWAP.
(ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier.
(lshrsi2_cnt16): New define_insn for LSR16 instruction.
(*ashlsi2_cnt16): See above.

gcc/testsuite/ChangeLog
* gcc.target/arc/lsl16-1.c: New test case.
* gcc.target/arc/lsr16-1.c: Likewise.
* gcc.target/arc/swap-1.c: Likewise.
* gcc.target/arc/swap-2.c: Likewise.

arm: move the switch tables for Arm to the RO data section.

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.

Testsuite, i386: Mark test as requiring ifunc

Test is currently failing on x86_64-apple-darwin.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105554.c: Require ifunc.

Testsuite, Darwin: Fix trampoline warning

Heap-based trampolines are enabled on darwin20 and later,
meaning that no warning is emitted.

gcc/testsuite/ChangeLog:

* gcc.dg/Wtrampolines.c: Skip on darwin20 and later.

Testsuite, i386: Fix test by passing -march

The test currently fails on Darwin, where the default arch is core2.

gcc/testsuite/ChangeLog:

PR target/112287
* gcc.target/i386/pr111698.c: Pass -march=sandybridge.

Testsuite, Darwin: skip PIE test

gcc/testsuite/ChangeLog:

* gcc.dg/pie-2.c: Skip test on darwin.

rs6000: Change bitwise xor to an equality operator [PR106907]

PR106907 has a few warnings spotted from cppcheck. These warnings
are related to the need of precedence clarification. Instead of using xor,
it has been changed to equality check, which achieves the same result.
Additionally, comment indentation has been fixed.

2023-10-11 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/106907
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Change bitwise
xor to an equality and fix comment indentation.

PR testsuite/111462 - add powerpc64le to list of ssa-sink-18.c XFAIL

PR testsuite/111462
gcc/testsuite/
* gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also powerpc64le.

RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32

sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system.
According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system
since RV32 GR reg is 32bit.

Consider this following case:

vsetvl e64m1
vadd.vx v,v,x

will be transform by sew64_scalar_helper:

vsetvl e64m1
sw
sw
vlse v
vadd.vv

This bug is reported by Robin.
(insn 143 179 230 9 (set (reg:SI 15 a5 [234])
        (unspec:SI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)) 751 {vlmax_avlsi}
     (expr_list:REG_EQUIV (unspec:SI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)
        (nil)))
(insn 230 143 78 9 (parallel [
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (reg:SI 15 a5 [234])
                        (const_int 64 [0x40])
                        (const_int 0 [0])
                    ] UNSPEC_VSETVL))
            (set (reg:SI 67 vtype)
                (unspec:SI [
                        (const_int 64 [0x40])
                        (const_int 0 [0])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
        ]) "bug.c":14:14 discrim 1 1469 {vsetvl_discard_resultsi}
     (nil))
(insn 78 230 84 9 (set (reg:RVVM1DI 102 v6 [203])
        (if_then_else:RVVM1DI (unspec:RVVMF64BI [
                    (const_vector:RVVMF64BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 0 [0])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:RVVM1DI (mem/u/c:DI (reg/f:SI 29 t4 [230]) [0  S8 A64]))
            (unspec:RVVM1DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "bug.c":14:14 discrim 1 1872 {*pred_broadcastrvvm1di}
     (expr_list:REG_DEAD (reg/f:SI 29 t4 [230])
        (nil)))

The root cause of this is because we missed VLMAX handling since the codes was invented
long time ago (Callers always intrinsics codes, no VLMAX situation).

Now, all following bugs are fixed after this patch:

FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

gcc/ChangeLog:

* config/riscv/riscv-protos.h (sew64_scalar_helper): Fix bug.
* config/riscv/riscv-v.cc (sew64_scalar_helper): Ditto.
* config/riscv/vector.md: Ditto.

Fortran: Fix a problem with SELECT TYPE selectors [PR104555].

2023-10-30 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/104555
* resolve.cc (resolve_select_type): If the selector expression
has no class component references and the expression is a
derived type, copy the typespec of the symbol to that of the
expression.

gcc/testsuite/
PR fortran/104555
* gfortran.dg/pr104555.f90: New test.

Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

When 2 vectors are equal, kmask is allones and kortest will set CF,
else CF will be cleared.

So CF bit can be used to check for the result of the comparison.

Before:
 vmovdqu (%rsi), %ymm0
 vpxorq (%rdi), %ymm0, %ymm0
 vptest %ymm0, %ymm0
 jne .L2
 vmovdqu 32(%rsi), %ymm0
 vpxorq 32(%rdi), %ymm0, %ymm0
 vptest %ymm0, %ymm0
 je .L5
.L2:
 movl $1, %eax
 xorl $1, %eax
 vzeroupper
 ret

After:
 vmovdqu64 (%rsi), %zmm0
 xorl %eax, %eax
 vpcmpeqd (%rdi), %zmm0, %k0
 kortestw %k0, %k0
 setc %al
 vzeroupper
 ret

gcc/ChangeLog:

PR target/104610
* config/i386/i386-expand.cc (ix86_expand_branch): Handle
512-bit vector with vpcmpeq + kortest.
* config/i386/i386.md (cbranchxi4): New expander.
* config/i386/sse.md: (cbranch<mode>4): Extend to V16SImode
and V8DImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104610-2.c: New test.

Expand: Checking available optabs for scalar modes in by pieces operations

The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p. It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function. This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.

gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.

Daily bump.

libstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu-versioned-namespace.ver: Add missing emul TLS
symbols.

libstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation symbol

libstdc++-v3/ChangeLog:

* src/experimental/contract.cc
[_GLIBCXX_INLINE_VERSION](handle_contract_violation): Provide symbol
without version namespace decoration for gcc.

d: Fix ICE: verify_gimple_failed (conversion of register to a different size in 'view_convert_expr')

Static arrays in D are passed around by value, rather than decaying to a
pointer. On x86_64 __builtin_va_list is an exception to this rule, but
semantically it's still treated as a static array.

This makes certain assignment operations fail due a mismatch in types.
As all examples in the test program are rejected by C/C++ front-ends,
these are now errors in D too to be consistent.

PR d/110712

gcc/d/ChangeLog:

* d-codegen.cc (d_build_call): Update call to convert_for_argument.
* d-convert.cc (is_valist_parameter_type): New function.
(check_valist_conversion): New function.
(convert_for_assignment): Update signature. Add check whether
assigning va_list is permissible.
(convert_for_argument): Likewise.
* d-tree.h (convert_for_assignment): Update signature.
(convert_for_argument): Likewise.
* expr.cc (ExprVisitor::visit (AssignExp *)): Update call to
convert_for_assignment.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110712.d: New test.

d: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.

D front-end changes:

    - Import dmd v2.106.0-beta.1.

D runtime changes:

    - Import druntime v2.106.0-beta.1.

Phobos changes:

    - Import phobos v2.106.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd e48bc0987d.
* expr.cc (ExprVisitor::visit (NewExp *)): Update for new front-end
interface.
* runtime.def (NEWARRAYT): Remove.
(NEWARRAYIT): Remove.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime e48bc0987d.
* src/MERGE: Merge upstream phobos 2458e8f82.

testsuite, X86, Darwin: Skip a test for mcmodel=large.

The large model is not implemented so far for Darwin (and the
codegen will be different when it is).

gcc/testsuite/ChangeLog:

* gcc.target/i386/large-data.c: Skip for Darwin.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

testsuite, X86, Darwin: Skip tests with incompatible output.

Darwin platforms do not currently emit .cfi_xxx instructions so that these
tests do not work there.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-interrupt-1.c: Skip for Darwin.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

tree-optimization/109334: Improve computation for access attribute

The fix for PR104970 restricted size computations to the case
where the access attribute was specified explicitly (no VLA).
It also restricted it to void pointers or elements with constant
sizes. The second restriction is enough to fix the original bug.
Revert the first change to again allow size computations for VLA
parameters and for VLA parameters together with an explicit access
attribute.

gcc/ChangeLog:

PR tree-optimization/109334
* tree-object-size.cc (parm_object_size): Allow size
computation for implicit access attributes.

gcc/testsuite/ChangeLog:

PR tree-optimization/109334
* gcc.dg/builtin-dynamic-object-size-0.c
(test_parmsz_simple3): Supported again.
(test_parmsz_external4): New test.
* gcc.dg/builtin-dynamic-object-size-20.c: New test.
* gcc.dg/pr104970.c: New test.

gcc: xtensa: fix salt/saltu version check

gcc/
* config/xtensa/xtensa.h (TARGET_SALT): Change HW version from
260000 (which corresponds to RF-2014.0) to 270000 (which
corresponds to RG-2015.0, the release where salt/saltu opcodes
were introduced).

RISC-V: Fix one range-loop-construct warning of avlprop

This patch would like to fix one warning of avlprop as below.

../../gcc/config/riscv/riscv-avlprop.cc: In member function 'virtual
unsigned int pass_avlprop::execute(function*)':
../../gcc/config/riscv/riscv-avlprop.cc:346:23: error: loop variable
'candidate' creates a copy from type 'const std::pair<avlprop_type,
rtl_ssa::insn_info*>' [-Werror=range-loop-construct]
 346 | for (const auto candidate : m_candidates)
 | ^~~~~~~~~
../../gcc/config/riscv/riscv-avlprop.cc:346:23: note: use reference type
to prevent copying
 346 | for (const auto candidate : m_candidates)
 | ^~~~~~~~~
 | &

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Use
reference type to prevent copying.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

d: Fix ICE: in verify_gimple_in_seq on powerpc-darwin9 [PR112270]

This ICE was seen during stage2 on powerpc-darwin9 only. There were
still some uses of GCC's boolean_type_node in the D front-end, which
caused a type mismatch to trigger as D bool size is fixed to 1 byte on
all targets.

So two new nodes have been introduced - d_bool_false_node and
d_bool_true_node - which have replaced all remaining uses of
boolean_false_node and boolean_true_node respectively.

PR d/112270

gcc/d/ChangeLog:

* d-builtins.cc (d_build_d_type_nodes): Initialize d_bool_false_node,
d_bool_true_node.
* d-codegen.cc (build_array_struct_comparison): Use d_bool_false_node
instead of boolean_false_node.
* d-convert.cc (d_truthvalue_conversion): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
* d-tree.h (enum d_tree_index): Add DTI_BOOL_FALSE and DTI_BOOL_TRUE.
(d_bool_false_node): New macro.
(d_bool_true_node): New macro.
* modules.cc (build_dso_cdtor_fn): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
(register_moduleinfo): Use d_bool_type instead of boolean_type_node.

gcc/testsuite/ChangeLog:

* gdc.dg/pr112270.d: New test.

d: Add warning for call expression without side effects

In the last merge of the dmd front-end with upstream (r14-4830), this
warning got removed from the semantic passes. Reimplement the warning
for the code generation pass instead, where it cannot have an effect on
conditional compilation.

gcc/d/ChangeLog:

* d-codegen.cc (call_side_effect_free_p): New function.
* d-tree.h (CALL_EXPR_WARN_IF_UNUSED): New macro.
(call_side_effect_free_p): New prototype.
* expr.cc (ExprVisitor::visit (CallExp *)): Set
CALL_EXPR_WARN_IF_UNUSED on matched call expressions.
(ExprVisitor::visit (NewExp *)): Don't dereference the result of an
allocation call here.
* toir.cc (add_stmt): Emit warning when call expression added to
statement list without being used.

gcc/testsuite/ChangeLog:

* gdc.dg/Wunused_value.d: New test.

Daily bump.

[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch

GCC with my recent patch improving cost calculation for pseudos with
equivalence may generate different code with and without debug info
and as the result i686 bootstrap fails on i686. The patch fixes this
bug.

gcc/ChangeLog:

PR rtl-optimization/112107
* ira-costs.cc: (calculate_equiv_gains): Use NONDEBUG_INSN_P
instead of INSN_P.

RISC-V: Make stack_save_restore_2 more robust

GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

PR modula2/112110: fails to build on freebsd when compiling wrapclock.cc

This patch fixes a mangled #if #endif conditional section within
wrapclock.cc. The conditional section in wrapclock_timezone
should return 0 rather than return timezone.

libgm2/ChangeLog:

PR modula2/112110
* libm2iso/wrapclock.cc (timezone): Return 0 if unable
to get the timezone from the tm struct.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Fortran: diagnostics of MODULE PROCEDURE declaration conflicts [PR104649]

gcc/fortran/ChangeLog:

PR fortran/104649
* decl.cc (gfc_match_formal_arglist): Handle conflicting declarations
of a MODULE PROCEDURE when one of the declarations is an alternate
return.

gcc/testsuite/ChangeLog:

PR fortran/104649
* gfortran.dg/pr104649.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>

amdgcn: Fix bug in gfx1030 support patch

The previous patch to add gfx1030 support introduced an issue with passing
exit codes from kernels run under gcn-run (offload kernels were unaffected).

gcc/ChangeLog:

PR target/112088
* config/gcn/gcn.cc (gcn_expand_epilogue): Fix kernel epilogue register
conflict.

amdgcn: silence warnings

The operands really should be VOIDmode, so the warnings are false.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): Mention "operands" in
condition to silence the warnings.
(vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): Likewise.
* config/gcn/gcn.md (*movti_insn): Likewise.

recog: Fix propagation into ASM_OPERANDS

An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propagation didn't account for this, and instead propagated
into each ASM_OPERANDS individually. This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.

This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.

gcc/
* recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
ASM_OPERANDS.

c++: another build_new_1 folding fix [PR111929]

In build_new_1, we also need to avoid folding 'outer_nelts_check' when
in a template context to prevent an ICE on the below testcase. This
patch replaces the problematic fold_build2 call with build2 (we'll later
fold it if appropriate during cp_fully_fold).

In passing, this patch removes an unnecessary conversion of 'nelts'
since it should always already be a size_t (and 'convert' isn't the best
conversion entry point to use anyway since it lacks a complain parameter).

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Remove unnecessary call to convert
on 'nelts'. Use build2 instead of fold_build2 for
'outer_nelts_checks'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28a.C: New test.

c++: add testcase verifying non-dep new-expr checking

gcc/testsuite/ChangeLog:

* g++.dg/template/new14.C: New test.

c++: more ahead-of-time -Wparentheses warnings

Now that we don't have to worry about looking through NON_DEPENDENT_EXPR,
we can easily extend the -Wparentheses warning in convert_for_assignment
to consider (non-dependent) templated assignment operator expressions as
well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond.

gcc/cp/ChangeLog:

* cp-tree.h (maybe_warn_unparenthesized_assignment): Declare.
* semantics.cc (is_assignment_op_expr_p): Generalize to return
true for any assignment operator expression, not just one that
has been resolved to an operator overload.
(maybe_warn_unparenthesized_assignment): Factored out from ...
(maybe_convert_cond): ... here.
(finish_parenthesized_expr): Mention
maybe_warn_unparenthesized_assignment.
* typeck.cc (convert_for_assignment): Replace -Wparentheses
warning logic with maybe_warn_unparenthesized_assignment.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-13.C: Strengthen by expecting that
we issue the -Wparentheses warnings ahead of time.
* g++.dg/warn/Wparentheses-23.C: Likewise.
* g++.dg/warn/Wparentheses-32.C: Remove xfails.

PR modula2/111530: Build failure on BSD due to getopt_long_only GNU extension dependency

This patch uses the libiberty getopt long functions (wrapped up inside
libgm2/libm2pim/cgetopt.cc) and only enables this implementation if
libgm2/configure.ac detects no getopt_long and friends on the target.

gcc/m2/ChangeLog:

PR modula2/111530
* gm2-libs-ch/cgetopt.c (cgetopt_cgetopt_long): Re-format.
(cgetopt_cgetopt_long_only): Re-format.
(cgetopt_SetOption): Re-format and assign flag to NULL
if name is also NULL.
* gm2-libs/GetOpt.def (AddLongOption): Add index parameter
and change flag to be a VAR parameter rather than a pointer.
(GetOptLong): Re-format.
(GetOpt): Correct comment.
* gm2-libs/GetOpt.mod: Re-write to rely on cgetopt rather
than implement long option creation in GetOpt.
* gm2-libs/cgetopt.def (SetOption): has_arg type is INTEGER.

libgm2/ChangeLog:

PR modula2/111530
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (AC_CHECK_HEADERS): Include getopt.h.
(GM2_CHECK_LIB): getopt_long check.
(GM2_CHECK_LIB): getopt_long_only check.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.in: Regenerate.
* libm2pim/cgetopt.cc: Re-write using conditional on configure
and long function code from libiberty/getopt.c.

gcc/testsuite/ChangeLog:

PR modula2/111530
* gm2/pimlib/run/pass/testgetopt.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[PATCH] RISC-V: Fix wrong tune parameters on int_div

This patch fixes an issue with the cost on "int_div" in various RISC-V
tune parameters including those for Rocket, SiFive U7 series, and T-Head
C906. This incorrect cost value interferes with the optimization process.
For example, it prevents the optimization of division by a constant to a
more efficient method known as Barrett reduction. This lack of
optimization negatively affects the performance of these systems.

The integer div cost of the Rocket and SiFive U7 is taken from the
Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows
the divUnroll unchanged which is 1 by default. Thus, the maximum int_div
cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for
64-bit.

As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit
result each cycle[4]. Thus, the maximum int_div cycles should be the
dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit.

I also test the performance on VisionFive2 which has Qual-Core Sifive U74.
I write a simple C program to do 1e8 times div by constant 6 in int32. The
result shows it takes 1.998s using div, and 0.420s using barrett reduction
to replace div with mul, which is 4.75x faster.

[1] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/rocket/Multiplier.scala#L40
[2] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/subsystem/Configs.scala#L97
[3] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div.v#L267
[4] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div_shift2_kernel.v#L93

gcc/ChangeLog:

* config/riscv/riscv.cc (rocket_tune_info): Fix int_div cost.
(sifive_7_tune_info, thead_c906_tune_info): Likewise.

RISC-V: Add rawmemchr expander.

This patch adds a vectorized rawmemchr expander. It also moves the
vectorized expand_block_move to riscv-string.cc.

gcc/ChangeLog:

* config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx):
Define.
(expand_rawmemchr): Define.
* config/riscv/riscv-v.cc (force_vector_length_operand): Remove
static.
(expand_block_move): Move from here...
* config/riscv/riscv-string.cc (expand_block_move): ...to here.
(expand_rawmemchr): Add vectorized expander.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/peel-2.c: Add
-fno-tree-loop-distribute-patterns.
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.

RISC-V: Fix cond_sqrt tests.

As long as we do not have universal Zvfh support in binutils
linking against libm does not work out of the box. This patch
splits the cond_sqrt tests into non-zvfh and zvfh variants and
makes the run-zvfh ones depend on a zvfh target.

While at it, I also added Zvfh handling to the testsuite helpers.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Remove
Float16.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* lib/target-supports.exp: Add zvfh handling.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c: New test.

[RA]: Add cost calculation for reg equivalence invariants

My recent patch improving cost calculation for pseudos with equivalence
resulted in failure of gcc.target/arm/eliminate.c on aarch64. This patch
fixes this failure.

gcc/ChangeLog:

* ira-costs.cc: (get_equiv_regno, calculate_equiv_gains):
Process reg equivalence invariants.

i386: Fiy typo in "partial_memory_read_stall" tune option.

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
i386: Fiy typo in "partial_memory_read_stall" tune option.

Move OpenMP tests to gomp subdir

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: Moved to...
* gfortran.dg/gomp/c_ptr_tests_20.f90: ...here.
* gfortran.dg/c_ptr_tests_21.f90: Moved to...
* gfortran.dg/gomp/c_ptr_tests_21.f90: ...here.

aarch64: Add basic target_print_operand support for CONST_STRING

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

  (set (reg/i:DI 0 x0)
         (unspec:DI [(const_string ("s3_3_c13_c2_2"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

  "mrs\t%x0, %1"

gcc/ChangeLog

* config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.

PR target/110551: Fix reg allocation for widening multiplications on x86.

This patch contains clean-ups of the widening multiplication patterns in
i386.md, and provides variants of the existing highpart multiplication
peephole2 transformations (that tidy up register allocation after
reload), and thereby fixes PR target/110551, which is a superfluous
move instruction.

For the new test case, compiled on x86_64 with -O2.

Before:
mulx64: movabsq $-7046029254386353131, %rcx
 movq %rcx, %rax
 mulq %rdi
 xorq %rdx, %rax
 ret

After:
mulx64: movabsq $-7046029254386353131, %rax
 mulq %rdi
 xorq %rdx, %rax
 ret

The clean-ups are (i) that operand 1 is consistently made register_operand
and operand 2 becomes nonimmediate_operand, so that predicates match the
constraints, (ii) the representation of the BMI2 mulx instruction is
updated to use the new umul_highpart RTX, and (iii) because operands
0 and 1 have different modes in widening multiplications, "a" is a more
appropriate constraint than "0" (which avoids spills/reloads containing
SUBREGs). The new peephole2 transformations are based upon those at
around line 9951 of i386.md, that begins with the comment
;; Highpart multiplication peephole2s to tweak register allocation.
;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq %rdi

2023-10-27 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/110551
* config/i386/i386.md (mul<mode><dwi>3): Make operands 1 and
2 take "regiser_operand" and "nonimmediate_operand" respectively.
(mulqihi3): Likewise.
(*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand
matching the %d constraint. Use umul_highpart RTX to represent
the highpart multiplication.
(*umul<mode><dwi>3_1): Operand 2 should use regiser_operand
predicate, and "a" rather than "0" as operands 0 and 2 have
different modes.
(define_split): For mul to mulx conversion, use the new
umul_highpart RTX representation.
(*mul<mode><dwi>3_1): Operand 1 should be register_operand
and the constraint %a as operands 0 and 1 have different modes.
(*mulqihi3_1): Operand 1 should be register_operand matching
the constraint %0.
(define_peephole2): Providing widening multiplication variants
of the peephole2s that tweak highpart multiplication register
allocation.

gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551.c: New test case.

preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

`#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
when running gcc -E or gcc -save-temps). As noted in the PR, this means that
if the target pragma defines any macros, those macros are not effective in
preprocess-only mode. Similarly, such macros are not effective when
compiling with C++ (even when compiling without -save-temps), because C++
does not process the pragma until after all tokens have been obtained from
libcpp, at which point it is too late for macro expansion to take place.

Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
under these conditions as well, so resolve the PR by using the new "early
pragma" support.

toplev.cc required some changes because the target-specific handlers for
`#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting
that function to be called in preprocess-only mode.

I added some additional testcases from the PR for x86. The other targets
that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
already had tests verifying that the pragma sets macros as expected; here I
have added -save-temps versions of some of them, to test that they now work
in preprocess-only mode as well.

gcc/c-family/ChangeLog:

PR preprocessor/87299
* c-pragma.cc (init_pragma): Register `#pragma GCC target' and
related pragmas in preprocess-only mode, and enable early handling.
(c_reset_target_pragmas): New function refactoring code from...
(handle_pragma_reset_options): ...here.
* c-pragma.h (c_reset_target_pragmas): Declare.

gcc/cp/ChangeLog:

PR preprocessor/87299
* parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
after preprocessing is complete, before starting compilation.

gcc/ChangeLog:

PR preprocessor/87299
* toplev.cc (no_backend): New static global.
(finalize): Remove argument no_backend, which is now a
static global.
(process_options): Likewise.
(do_compile): Likewise.
(target_reinit): Don't do anything in preprocess-only mode.
(toplev::main): Adapt to no_backend change.
(toplev::finalize): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/87299
* c-c++-common/pragma-target-1.c: New test.
* c-c++-common/pragma-target-2.c: New test.
* g++.target/i386/pr87299-1.C: New test.
* g++.target/i386/pr87299-2.C: New test.
* gcc.target/i386/pr87299-1.c: New test.
* gcc.target/i386/pr87299-2.c: New test.
* gcc.target/s390/target-attribute/tattr-2b.c: New test.
* gcc.target/aarch64/pragma_cpp_predefs_1b.c: New test.
* gcc.target/arm/pragma_arch_attribute_1b.c: New test.
* gcc.target/nios2/custom-fp-2b.c: New test.
* gcc.target/powerpc/float128-3b.c: New test.

Fortran: Fix some problems with SELECT TYPE selectors [PR104625].

2023-10-27 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/104625
* expr.cc (gfc_check_vardef_context): Check that the target
does have a vector index before emitting the specific error.
* match.cc (copy_ts_from_selector_to_associate): Ensure that
class valued operator expressions set the selector rank and
use the rank to provide the associate variable with an
appropriate array spec.
* resolve.cc (resolve_operator): Reduce stacked parentheses to
a single pair.
(fixup_array_ref): Extract selector symbol from parentheses.

gcc/testsuite/
PR fortran/104625
* gfortran.dg/pr104625.f90: New test.
* gfortran.dg/associate_55.f90: Change error check.

MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]

I noticed we were missing these simplifications so let's add them.

This adds the following simplifications:
U & N <= U -> true
U & N > U -> false
When U is known to be as non-negative.

When N is also known to be non-negative, this is also true:
U | N false
U | N >= U -> true

When N is a negative integer, the result flips and we get:
U | N true
U | N >= U -> false

We could extend this later on to be the case where we know N
is nonconstant but is known to be negative.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/101590
PR tree-optimization/94884

gcc/ChangeLog:

* match.pd (`(X BIT_OP Y) CMP X`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitcmp-1.c: New test.
* gcc.dg/tree-ssa/bitcmp-2.c: New test.
* gcc.dg/tree-ssa/bitcmp-3.c: New test.
* gcc.dg/tree-ssa/bitcmp-4.c: New test.
* gcc.dg/tree-ssa/bitcmp-5.c: New test.
* gcc.dg/tree-ssa/bitcmp-6.c: New test.

Support vec_cmpmn/vcondmn for v2hf/v4hf.

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
V2HF/V2BF/V4HF/V4BFmode.
* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
data_mode is V4HF/V2HFmode.
* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
(vcond_mask_<mode>v4hi): Ditto.
(vcond_mask_<mode>qi): Ditto.
(vec_cmpv2hfqi): Ditto.
(vcond_mask_<mode>v2hi): Ditto.
(mmx_plendvb_<mode>): Add 2 combine splitters after the
patterns.
(mmx_pblendvb_v8qi): Ditto.
(<code>v2hi3): Add a combine splitter after the pattern.
(<code><mode>3): Ditto.
(<code>v8qi3): Ditto.
(<code><mode>3): Ditto.
* config/i386/sse.md (vcond<mode><mode>): Merge this with ..
(vcond<sseintvecmodelower><mode>): .. this into ..
(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this,
and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: New test.
* gcc.target/i386/part-vect-vec_cmphf.c: New test.

Daily bump.

RISC-V: Move lmul calculation into macro

Notice we calculate LMUL according to --param=riscv-autovec-lmul
in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;

Create a new macro for it for easier matain.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_MAX_LMUL): New macro.
* config/riscv/riscv-v.cc (preferred_simd_mode): Adapt macro.
(autovectorize_vector_modes): Ditto.
(can_find_related_mode_p): Ditto.

RISC-V: Add AVL propagation PASS for RVV auto-vectorization

This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
 int *__restrict b,
 int *__restrict n)
{
 for (int i = 0; i < n; i++)
 a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
 _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma
 ...
 vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0)
 vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1)
 vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
 .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:

.L3:
 ld1w z30.s, p7/z, [x0, x3, lsl 2] -> predicated load
 ld1w z31.s, p7/z, [x1, x3, lsl 2] -> predicated load
 add z31.s, z31.s, z30.s -> un-predicated add
 st1w z31.s, p7, [x0, x3, lsl 2] -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
 We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
 turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
 This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
 int *__restrict b,
 int *__restrict c,
 int *__restrict a2,
 int *__restrict b2,
 int *__restrict c2,
 int *__restrict a3,
 int *__restrict b3,
 int *__restrict c3,
 int *__restrict a4,
 int *__restrict b4,
 int *__restrict c4,
 int *__restrict a5,
 int *__restrict b5,
 int *__restrict c5,
 int n)
{
 for (int i = 0; i < n; i++){
 a[i] = b[i] + c[i];
 b5[i] = b[i] + c[i];
 a2[i] = b2[i] + c2[i];
 a3[i] = b3[i] + c3[i];
 a4[i] = b4[i] + c4[i];
 a5[i] = a[i] + a4[i];
 a[i] = a5[i] + b5[i]+ a[i];

 a[i] = a[i] + c[i];
 b5[i] = a[i] + c[i];
 a2[i] = a[i] + c2[i];
 a3[i] = a[i] + c3[i];
 a4[i] = a[i] + c4[i];
 a5[i] = a[i] + a4[i];
 a[i] = a[i] + b5[i]+ a[i];
 }
}

1. Loop Body:

Before this patch: After this patch:

 vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma
 vle32.v v2,0(a2) vle32.v v2,0(a2)
 vle32.v v4,0(a1) vle32.v v3,0(t2)
 vle32.v v1,0(t2) vle32.v v4,0(a1)
 vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0)
 vadd.vv v4,v2,v4 vadd.vv v4,v2,v4
 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1
 vle32.v v3,0(s0) vadd.vv v1,v1,v4
 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4
 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4
 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2
 vadd.vv v1,v1,v4 vadd.vv v2,v1,v2
 vadd.vv v1,v1,v4 vse32.v v2,0(t5)
 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1
 vle32.v v4,0(a5) vadd.vv v2,v2,v1
 vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2
 vadd.vv v1,v1,v2 vadd.vv v3,v1,v3
 vadd.vv v2,v1,v2 vle32.v v5,0(a5)
 vadd.vv v4,v1,v4 vle32.v v6,0(t6)
 vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3)
 vse32.v v2,0(t5) vse32.v v2,0(a0)
 vse32.v v4,0(a3) vadd.vv v3,v3,v1
 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5
 vadd.vv v3,v1,v3 vse32.v v3,0(t4)
 vadd.vv v2,v2,v1 vadd.vv v1,v1,v6
 vadd.vv v2,v2,v1 vse32.v v2,0(a3)
 vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6)
 vse32.v v2,0(a0)
 vse32.v v3,0(t3)
 vle32.v v2,0(t0)
 vsetvli a7,zero,e32,m1,ta,ma
 vadd.vv v3,v3,v1
 vsetvli zero,a4,e32,m1,ta,ma
 vse32.v v3,0(t4)
 vsetvli a7,zero,e32,m1,ta,ma
 slli a7,a4,2
 vadd.vv v1,v1,v2
 sub t1,t1,a4
 vsetvli zero,a4,e32,m1,ta,ma
 vse32.v v1,0(a6)

It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.

2. Epilogue:
 Before this patch: After this patch:

 .L5: .L5:
 ld s0,8(sp) ret
 addi sp,sp,16
 jr ra

This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'

The final codegen after this patch:

foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret

PR target/111318
PR target/111888

gcc/ChangeLog:

* config.gcc: Add AVL propagation pass.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-avlprop.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>