git.ipfire.org Git - thirdparty/gcc.git/log

MATCH: [PR111282] Simplify `a & (b ^ ~a)` to `a & b`

While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level,
it is always good to optimize this at the gimple level and allows
us to match a few extra things including where a is a comparison.

Note I had to update/change the testcase and-1.c to avoid matching
this case as we can match -2 and 1 as bitwise inversions.

PR tree-optimization/111282

gcc/ChangeLog:

* match.pd (`a & ~(a ^ b)`, `a & (a == b)`,
`a & ((~a) ^ b)`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/and-1.c: Update testcase to avoid
matching `~1 & (a ^ 1)` simplification.
* gcc.dg/tree-ssa/bitops-6.c: New test.

modula2: Narrow subranges to int or unsigned int if ZTYPE is the base type.

This patch narrows the subrange base type to INTEGER or CARDINAL
providing the range is satisfied. It only does this when the subrange
base type is the ZTYPE.

gcc/m2/ChangeLog:

* gm2-compiler/M2GCCDeclare.mod (DeclareSubrange): Check
the base type of the subrange against the ZTYPE and call
DeclareSubrangeNarrow if necessary.
(DeclareSubrangeNarrow): New procedure function.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[PATCH v4 2/2] RISC-V: Add support for XCValu extension in CV32E40P

Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett <mary.bennett@embecosm.com>
  Nandni Jamnadas <nandni.jamnadas@embecosm.com>
  Pietra Ferreira <pietra.ferreira@embecosm.com>
  Charlie Keaney
  Jessica Mills
  Craig Blackmore <craig.blackmore@embecosm.com>
  Simon Cook <simon.cook@embecosm.com>
  Jeremy Bennett <jeremy.bennett@embecosm.com>
  Helene Chelin <helene.chelin@embecosm.com>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add the XCValu
extension.
* config/riscv/constraints.md: Add builtins for the XCValu
extension.
* config/riscv/predicates.md (immediate_register_operand):
Likewise.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
(RISCV_ATYPE_UHI): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* config/riscv/riscv.cc (riscv_print_operand): Likewise.
* doc/extend.texi: Add XCValu documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add proc for the XCValu extension.
* gcc.target/riscv/cv-alu-compile.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addrn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addun.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-addurn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-clip.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-clipu.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subrn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-subun.c: New test.
* gcc.target/riscv/cv-alu-fail-compile-suburn.c: New test.
* gcc.target/riscv/cv-alu-fail-compile.c: New test.

[PATCH v4 1/2] RISC-V: Add support for XCVmac extension in CV32E40P

Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
  Mary Bennett <mary.bennett@embecosm.com>
  Nandni Jamnadas <nandni.jamnadas@embecosm.com>
  Pietra Ferreira <pietra.ferreira@embecosm.com>
  Charlie Keaney
  Jessica Mills
  Craig Blackmore <craig.blackmore@embecosm.com>
  Simon Cook <simon.cook@embecosm.com>
  Jeremy Bennett <jeremy.bennett@embecosm.com>
  Helene Chelin <helene.chelin@embecosm.com>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add XCVmac.
* config/riscv/riscv-ftypes.def: Add XCVmac builtins.
* config/riscv/riscv-builtins.cc: Likewise.
* config/riscv/riscv.md: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVmac builtin documentation.
* doc/sourcebuild.texi: Likewise.
* config/riscv/corev.def: New file.
* config/riscv/corev.md: New file.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add new effective target check.
* gcc.target/riscv/cv-mac-compile.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mac.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-machhurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-macurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-msu.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulhhurn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulsn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulsrn.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulun.c: New test.
* gcc.target/riscv/cv-mac-fail-compile-mulurn.c: New test.
* gcc.target/riscv/cv-mac-test-autogeneration.c: New test.

MAINTAINERS: Fix write after approval name order

ChangeLog:

* MAINTAINERS: Fix name order.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

PR modula2/111675 Incorrect packed record field value passed to a procedure

This patch allows a packed field to be extracted and passed to a
procedure. It ensures that the subrange type is the same for both the
procedure and record field. It also extends the <* bytealignment (0) *>
to cover packed subrange types.

gcc/m2/ChangeLog:

PR modula2/111675
* gm2-compiler/M2CaseList.mod (appendTree): Replace
InitStringCharStar with InitString.
* gm2-compiler/M2GCCDeclare.mod: Import AreConstantsEqual.
(DeclareSubrange): Add zero alignment test and call
BuildSmallestTypeRange if necessary.
(WalkSubrangeDependants): Walk the align expression.
(IsSubrangeDependants): Test the align expression.
* gm2-compiler/M2Quads.mod (BuildStringAdrParam): Correct end name.
* gm2-compiler/P2SymBuild.mod (BuildTypeAlignment): Allow subranges
to be zero aligned (packed).
* gm2-compiler/SymbolTable.mod (Subrange): Add Align field.
(MakeSubrange): Set Align to NulSym.
(PutAlignment): Assign Subrange.Align to align.
(GetAlignment): Return Subrange.Align.
* gm2-gcc/m2expr.cc (noBitsRequired): Rewrite.
(calcNbits): Rename ...
(m2expr_calcNbits): ... to this and test for negative values.
(m2expr_BuildTBitSize): Replace calcNBits with m2expr_calcNbits.
* gm2-gcc/m2expr.def (calcNbits): Export.
* gm2-gcc/m2expr.h (m2expr_calcNbits): New prototype.
* gm2-gcc/m2type.cc (noBitsRequired): Remove.
(m2type_BuildSmallestTypeRange): Call m2expr_calcNbits.
(m2type_BuildSubrangeType): Create range_type from
build_range_type (type, lowval, highval).

gcc/testsuite/ChangeLog:

PR modula2/111675
* gm2/extensions/run/pass/packedrecord3.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

RISC-V: Fix incorrect index(offset) of gather/scatter

I suddenly discovered I made a mistake that was lucky un-exposed.

https://godbolt.org/z/c3jzrh7or

GCC is using 32 bit index offset:

        vsll.vi v1,v1,2
        vsetvli zero,a5,e32,m1,ta,ma
        vluxei32.v      v1,(a1),v1

This is wrong since v1 may overflow 32bit after vsll.vi.

After this patch:

vsext.vf2 v8,v4
vsll.vi v8,v8,2
vluxei64.v v8,(a1),v8

Same as Clang.

Regression passed. Ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Fix index bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug.
(gather_scatter_valid_offset_mode_p): New function.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.

RISC-V: Support FP lrint/lrintf auto vectorization

This patch would like to support the FP lrint/lrintf auto vectorization.

* long lrint (double) for rv64
* long lrintf (float) for rv32

Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on DF => DI for
rv64, and SF => SI for rv32.

Given we have code like:

void
test_lrint (long *out, double *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrint (in[i]);
}

Before this patch:
.L3:
  ...
  fld      fa5,0(a1)
  fcvt.l.d a5,fa5,dyn
  sd       a5,-8(a0)
  ...
  bne      a1,a4,.L3

After this patch:
.L3:
  ...
  vsetvli     a3,zero,e64,m1,ta,ma
  vfcvt.x.f.v v1,v1
  vsetvli     zero,a2,e64,m1,ta,ma
  vse32.v     v1,0(a0)
  ...
  bne         a2,zero,.L3

The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.

gcc/ChangeLog:

* config/riscv/autovec.md (lrint<mode><vlconvert>2): New pattern
for lrint/lintf.
* config/riscv/riscv-protos.h (expand_vec_lrint): New func decl
for expanding lint.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl
for vfcvt.x.f.v.
(expand_vec_lrint): New function impl for expanding lint.
* config/riscv/vector-iterators.md: New mode attr and iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for
CVT like test case.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Remove XFAIL of ssa-dom-cse-2.c

Confirm RISC-V is able to CSE this case no matter whether we enable RVV or not.

Remove XFAIL, to fix:
XPASS: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove riscv.

tree-ssa-strlen: optimization skips clobbering store [PR111519]

The following testcase is miscompiled, because count_nonzero_bytes incorrectly
uses get_strinfo information on a pointer from which an earlier instruction
loads SSA_NAME stored at the current instruction.  get_strinfo shows a state
right before the current store though, so if there are some stores in between
the current store and the load, the string length information might have
changed.

The patch passes around gimple_vuse from the store and punts instead of using
strinfo on loads from MEM_REF which have different gimple_vuse from that.

2023-10-11  Richard Biener  <rguenther@suse.de>
    Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/111519
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse
argument and pass it through to recursive calls and
count_nonzero_bytes_addr calls.  Don't shadow the stmt argument, but
change stmt for gimple_assign_single_p statements for which we don't
immediately punt.
(strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass
it through to recursive calls and count_nonzero_bytes calls.  Don't
use get_strinfo if gimple_vuse (stmt) is different from vuse.  Don't
shadow the stmt argument.

* gcc.dg/torture/pr111519.c: New testcase.

Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).

This patch is the middle-end piece of an improvement to PRs 101955 and
106245, that adds a missing simplification to the RTL optimizers.
This transformation is to simplify (char)(x << 7) != 0 as x & 1.
Technically, the cast can be any truncation, where shift is by one
less than the narrower type's precision, setting the most significant
(only) bit from the least significant bit.

This transformation applies to any target, but it's easy to see
(and add a new test case) on x86, where the following function:

int f(int a) { return (a << 31) >> 31; }

currently gets compiled with -O2 to:

foo:    movl    %edi, %eax
        sall    $7, %eax
        sarb    $7, %al
        movsbl  %al, %eax
        ret

but with this patch, we now generate the slightly simpler.

foo:    movl    %edi, %eax
        sall    $31, %eax
        sarl    $31, %eax
        ret

2023-10-11  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* simplify-rtx.cc (simplify_relational_operation_1): Simplify
the RTL (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) to (and:SI x 1).

gcc/testsuite/ChangeLog
* gcc.target/i386/pr106245-1.c: New test case.

RISC-V: Enable full coverage vect tests

I have analyzed all existing FAILs.

Except these following FAILs need to be addressed:
FAIL: gcc.dg/vect/slp-reduc-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-reduc-7.c execution test
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_(LEN_)?SUB"
FAIL: gcc.dg/vect/vect-cond-arith-2.c scan-tree-dump optimized " = \\.COND_(LEN_)?SUB"

All other FAILs are dumple fail can be ignored (Confirm ARM SVE also has such FAILs and didn't fix them on either tests or implementation).

Now, It's time to enable full coverage vect tests including vec_unpack, vec_pack, vec_interleave, ... etc.

To see what we are still missing:

Before this patch:

                === gcc Summary ===

# of expected passes            182839
# of unexpected failures        79
# of unexpected successes       11
# of expected failures          1275
# of unresolved testcases       4
# of unsupported tests          4223

After this patch:

                === gcc Summary ===

# of expected passes            183411
# of unexpected failures        93
# of unexpected successes       7
# of expected failures          1285
# of unresolved testcases       4
# of unsupported tests          4157

There is an important issue increased that I have noticed after this patch:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"

It has a related PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111721

I am gonna fix this first in the middle-end after commit this patch.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add RVV.

Refine predicate of operands[2] in divv4hf3 with register_operand.

In the expander, it will emit below insn.

rtx tmp = gen_rtx_VEC_CONCAT (V4SFmode, operands[2],
force_reg (V2SFmode, CONST1_RTX (V2SFmode)));

but *vec_concat<mode> only allow register_operand.

gcc/ChangeLog:

PR target/111745
* config/i386/mmx.md (divv4hf3): Refine predicate of
operands[2] with register_operand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111745.c: New test.

RISC-V Regression: Make pattern match more accurate of vect-live-2.c

Like previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632400.html
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-live-2.c: Make pattern match more accurate.

RISC-V Regression: Fix FAIL of vect-multitypes-16.c for RVV

As Richard suggested: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632288.html

Add vect_ext_char_longlong to fix FAIL for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multitypes-16.c: Adapt check for RVV.
* lib/target-supports.exp: Add vect_ext_char_longlong property.

Daily bump.

RISC-V: far-branch: Handle far jumps and branches for functions larger than 1MB

On RISC-V, branches further than +/-1MB require a longer instruction
sequence (3 instructions): we can reuse the jump-construction in the
assmbler (which clobbers $ra) and a temporary to set up the jump
destination.

gcc/ChangeLog:

* config/riscv/riscv.cc (struct machine_function): Track if a
far-branch/jump is used within a function (and $ra needs to be
saved).
(riscv_print_operand): Implement 'N' (inverse integer branch).
(riscv_far_jump_used_p): Implement.
(riscv_save_return_addr_reg_p): New function.
(riscv_save_reg_p): Use riscv_save_return_addr_reg_p.
* config/riscv/riscv.h (FIXED_REGISTERS): Update $ra.
(CALL_USED_REGISTERS): Update $ra.
* config/riscv/riscv.md: Add new types "ret" and "jalr".
(length attribute): Handle long conditional and unconditional
branches.
(conditional branch pattern): Handle case where jump can not
reach the intended target.
(indirect_jump, tablejump): Use new "jalr" type.
(simple_return): Use new "ret" type.
(simple_return_internal, eh_return_internal): Likewise.
(gpr_restore_return, riscv_mret): Likewise.
(riscv_uret, riscv_sret): Likewise.
* config/riscv/generic.md (generic_branch): Also recognize jalr & ret
types.
* config/riscv/sifive-7.md (sifive_7_jump): Likewise.

Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

c++: mangle multiple levels of template parms [PR109422]

This becomes be more important with concepts, but can also be seen with
generic lambdas.

PR c++/109422

gcc/cp/ChangeLog:

* mangle.cc (write_template_param): Also mangle level.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-generic-mangle1.C: New test.
* g++.dg/cpp2a/lambda-generic-mangle1a.C: New test.

MATCH: [PR111679] Add alternative simplification of `a | ((~a) ^ b)`

So currently we have a simplification for `a | ~(a ^ b)` but
that does not match the case where we had originally `(~a) | (a ^ b)`
so we need to add a new pattern that matches that and uses bitwise_inverted_equal_p
that also catches comparisons too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111679

gcc/ChangeLog:

* match.pd (`a | ((~a) ^ b)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-5.c: New test.

RISC-V Regression: Make match patterns more accurate

This patch fixes following 2 FAILs in RVV regression since the check is not accurate.

It's inspired by Robin's previous patch:
https://patchwork.sourceware.org/project/gcc/patch/dde89b9e-49a0-d70b-0906-fb3022cac11b@gmail.com/

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-7.c: Adjust regex pattern.
* gcc.dg/vect/no-scevccp-vect-iv-3.c: Ditto.

RISC-V Regression: Fix FAIL of predcom-2.c

Like GCN, add -fno-tree-vectorize.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/predcom-2.c: Add riscv.

RISC-V Regression: Fix FAIL of pr65947-8.c for RVV

This test is testing fold_extract_last pattern so it's more reasonable use
vect_fold_extract_last instead of specifying targets.

This is the vect_fold_extract_last property:
proc check_effective_target_vect_fold_extract_last { } {
    return [expr { [check_effective_target_aarch64_sve]
   || [istarget amdgcn*-*-*]
   || [check_effective_target_riscv_v] }]
}

include ARM SVE/GCN/RVV.

It perfectly matches what we want and more reasonable, better maintainment.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr65947-8.c: Use vect_fold_extract_last.

MAINTAINERS: Add myself to write after approval

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
ChangeLog:

* MAINTAINERS: Add myself.

RISC-V: Add VLS BOOL mode vcond_mask[PR111751]

Richard patch resolve PR111751: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=7c76c876e917a1f20a788f602cc78fff7d0a2a65

which cause ICE in RISC-V regression:

FAIL: gcc.dg/torture/pr53144.c   -O2  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:328)
FAIL: gcc.dg/torture/pr53144.c   -O3 -g  (test for excess errors)

VLS BOOL modes vcond_mask is needed to fix this regression ICE.

More details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111751

Tested and Committed.

PR target/111751

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS BOOL modes.

tree-optimization/111751 - support 1024 bit vector constant reinterpretation

The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV. It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.

PR tree-optimization/111751
* fold-const.cc (fold_view_convert_expr): Up the buffer size
to 128 bytes.
* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
constants, giving up when re-interpretation to the target type
fails.

ada: Fix internal error on too large representation clause for small component

This is a small bug present on strict-alignment platforms for questionable
representation clauses.

gcc/ada/

* gcc-interface/decl.cc (inline_status_for_subprog): Minor tweak.
(gnat_to_gnu_field): Try harder to get a packable form of the type
for a bitfield.

ada: Tweak internal subprogram in Ada.Directories

The purpose of this patch is to work around false-positive warnings
emitted by GNAT SAS (also known as CodePeer). It does not change
the behavior of the modified subprogram.

gcc/ada/

* libgnat/a-direct.adb (Start_Search_Internal): Tweak subprogram
body.

ada: Remove superfluous setter procedure

It is only called once.

gcc/ada/

* sem_util.ads (Set_Scope_Is_Transient): Delete.
* sem_util.adb (Set_Scope_Is_Transient): Likewise.
* exp_ch7.adb (Create_Transient_Scope): Set Is_Transient directly.

ada: Fix bad finalization of limited aggregate in conditional expression

This happens when the conditional expression is immediately returned, for
example in an expression function.

gcc/ada/

* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true
if the aggregate is a dependent expression of a conditional
expression being returned from a build-in-place function.

ada: Fix infinite loop with multiple limited with clauses

This occurs when one of the types has an incomplete declaration in addition
to its full declaration in its package. In this case AI05-129 says that the
incomplete type is not part of the limited view of the package, i.e. only
the full view is. Now, in the GNAT implementation, it's the opposite in the
regular view of the package, i.e. the incomplete type is the visible one.

That's why the implementation needs to also swap the types on the visibility
chain while it is swapping the views when the clauses are either installed
or removed. This works correctly for the installation, but does not for the
removal, so this change rewrites the code doing the latter.

gcc/ada/
PR ada/111434
* sem_ch10.adb (Replace): New procedure to replace an entity with
another on the homonym chain.
(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
sake of consistency.  Call Replace to do the replacements and split
the code into the regular and the special cases.  Add debuggging
output controlled by -gnatdi.
(Install_With_Clause): Print the Parent_With and Implicit_With flags
in the debugging output controlled by -gnatdi.
(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
using a direct replacement of E4 by E2.   Call Replace to do the
replacements.  Add debuggging output controlled by -gnatdi.

ada: Fix filesystem entry filtering

This patch fixes the behavior of Ada.Directories.Search when being
requested to filter out regular files or directories. One of the
configurations in which that behavior was incorrect was that when the
caller requested only the regular and special files but not the
directories, the directories would still be returned.

gcc/ada/

* libgnat/a-direct.adb: Fix filesystem entry filtering.

ada: Tweak documentation comments

The concept of extended nodes was retired at the same time Gen_IL
was introduced, but there was a reference to that concept left over
in a comment. This patch removes that reference.

Also, the description of the field Comes_From_Check_Or_Contract was
incorrectly placed in a section for fields present in all nodes in
sinfo.ads. This patch fixes this.

gcc/ada/

* atree.ads, nlists.ads, types.ads: Remove references to extended
nodes. Fix typo.
* sinfo.ads: Likewise and fix position of
Comes_From_Check_Or_Contract description.

ada: Crash processing pragmas Compile_Time_Error and Compile_Time_Warning

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Protect the frontend against
replacing 'Size by its static value if 'Size is not known at
compile time and we are processing pragmas Compile_Time_Warning or
Compile_Time_Errors.

RISC-V: Add testcase for SCCVN optimization[PR111751]

Add testcase for PR111751 which has been fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html

PR target/111751

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111751.c: New test.

Fix missed CSE with a BLKmode entity

The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
us cautionous about CSEing a load to an object that has padding bits.
The added check also triggers for BLKmode entities like STRING_CSTs
but by definition a BLKmode entity does not have padding bits.

PR tree-optimization/111751
* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
BLKmode result from the padding bits check.

RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

Here is the reference comparing dump IR between ARM SVE and RVV.

https://godbolt.org/z/zqess8Gss

We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.

The codegen is reasonable good.

However, I saw GCN also has 1024 bit vector.
This patch may cause this case FAIL in GCN port ?

Hi, GCN folk, could you check this patch in GCN port for me ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
* lib/target-supports.exp: Ditto.

arc: Refurbish add.f combiner patterns

Refurbish add compare patterns: use 'r' constraint, fix identation,
and fix pattern to match 'if (a+b) { ... }' constructions.

gcc/

* config/arc/arc.cc (arc_select_cc_mode): Match NEG code with
the first operand.
* config/arc/arc.md (addsi_compare): Make pattern canonical.
(addsi_compare_2): Fix identation, constraint letters.
(addsi_compare_3): Likewise.

gcc/testsuite/

* gcc.target/arc/add_f-combine.c: New test.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

RISC-V: Add available vector size for RVV

For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
from M1 to M8.

For example, when TARGET_MIN_VLEN = 128 bits, we enable
128/256/512/1024 bits VLS modes.

This patch fixes following FAIL:
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add 256/512/1024

Daily bump.

Fixes for profile count/probability maintenance

Verifier checks have recently been strengthened to check that
all counts and probabilities are initialized. The checks fired
during autoprofiledbootstrap build and this patch fixes it.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
when scaling loop profile

analyzer: fix build with gcc < 6

gcc/analyzer/ChangeLog:
* access-diagram.cc (boundaries::add): Explicitly state
"boundaries::" scope for "kind" enum.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Ensure float equivalences include + and - zero.

A floating point equivalence may not properly reflect both signs of
zero, so be pessimsitic and ensure both signs are included.

PR tree-optimization/111694
gcc/
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust
equivalence range.
* value-relation.cc (adjust_equivalence_range): New.
* value-relation.h (adjust_equivalence_range): New prototype.

gcc/testsuite/
* gcc.dg/pr111694.c: New.

Remove unused get_identity_relation.

Turns out we didnt need this as there is no unordered relations
managed by the oracle.

* gimple-range-gori.cc (gori_compute::compute_operand1_range): Do
not call get_identity_relation.
(gori_compute::compute_operand2_range): Ditto.
* value-relation.cc (get_identity_relation): Remove.
* value-relation.h (get_identity_relation): Remove protyotype.

RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV

RVV vectorize it with stride5 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.

RISC-V Regression tests: Fix FAIL of pr97832* for RVV

These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8.
* gcc.dg/vect/pr97832-3.c: Ditto.
* gcc.dg/vect/pr97832-4.c: Ditto.

RISC-V Regression test: Fix FAIL of slp-12a.c

This case is vectorized by stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.

RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV

RVV vectortizes this case with stride8 load_lanes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.

RISC-V Regression test: Adapt SLP tests like ARM SVE

Like ARM SVE, RVV is vectorizing these 2 cases in the same way.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
* gcc.dg/vect/slp-perm-10.c: Ditto.

RISC-V: Add initial pipeline description for an out-of-order core.

This adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor would look like.

In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented. It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted
this feature is only enabled when specifying -madjust-lmul-cost.

Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.

In the future we might also want a latency adjustment similar to lmul
for reductions, i.e. make the latency dependent on the type and its
number of units.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add generic_ooo.
* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
scheduler hook.
(TARGET_SCHED_ADJUST_COST): Define.
* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
* config/riscv/riscv.opt: Add -madjust-lmul-cost.
* config/riscv/generic-ooo.md: New file.
* config/riscv/vector.md: Add vsetvl_pre.

RISC-V: Support movmisalign of RVV VLA modes

This patch fixed these following FAILs in regressions:
FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum"

Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit:
https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520

I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern.
However, after deep investigation && reading RVV ISA again and experiment on SPIKE,
I realized I was wrong.

RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints

"If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
either the element is transferred successfully or an address misaligned exception is raised on that element."

It's obvious that RVV ISA does allow misaligned vector load/store.

And experiment and confirm on SPIKE:

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
bbl loader
z  0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48
tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000
s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018
a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71
a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620
Store/AMO access fault!

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
bbl loader

We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE.

So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since
it can improve multiple vectorization tests and fix dumple FAILs.

This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled).

Consider this following case:

struct s {
    unsigned i : 31;
    char a : 4;
};

#define N 32
#define ELT0 {0x7FFFFFFFUL, 0}
#define ELT1 {0x7FFFFFFFUL, 1}
#define ELT2 {0x7FFFFFFFUL, 2}
#define ELT3 {0x7FFFFFFFUL, 3}
#define RES 48
struct s A[N]
  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};

int __attribute__ ((noipa))
f(struct s *ptr, unsigned n) {
    int res = 0;
    for (int i = 0; i < n; ++i)
      res += ptr[i].a;
    return res;
}

-O3 -S -fno-vect-cost-model (default strict-align):

f:
mv a4,a0
beq a1,zero,.L9
addiw a5,a1,-1
li a3,14
vsetivli zero,16,e64,m8,ta,ma
bleu a5,a3,.L3
andi a5,a0,127
bne a5,zero,.L3
srliw a3,a1,4
slli a3,a3,7
li a0,15
slli a0,a0,32
add a3,a3,a4
mv a5,a4
li a2,32
vmv.v.x v16,a0
vsetvli zero,zero,e32,m4,ta,ma
vmv.v.i v4,0
.L4:
vsetvli zero,zero,e64,m8,ta,ma
vle64.v v8,0(a5)
addi a5,a5,128
vand.vv v8,v8,v16
vsetvli zero,zero,e32,m4,ta,ma
vnsrl.wx v8,v8,a2
vadd.vv v4,v4,v8
bne a5,a3,.L4
li a3,0
andi a5,a1,15
vmv.s.x v1,a3
andi a3,a1,-16
vredsum.vs v1,v4,v1
vmv.x.s a0,v1
mv a2,a0
beq a5,zero,.L15
slli a5,a3,3
add a5,a4,a5
lw a0,4(a5)
andi a0,a0,15
addiw a4,a3,1
addw a0,a0,a2
bgeu a4,a1,.L15
lw a2,12(a5)
andi a2,a2,15
addiw a4,a3,2
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,20(a5)
andi a2,a2,15
addiw a4,a3,3
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,28(a5)
andi a2,a2,15
addiw a4,a3,4
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,36(a5)
andi a2,a2,15
addiw a4,a3,5
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,44(a5)
andi a2,a2,15
addiw a4,a3,6
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,52(a5)
andi a2,a2,15
addiw a4,a3,7
addw a0,a2,a0
bgeu a4,a1,.L15
lw a4,60(a5)
andi a4,a4,15
addw a4,a4,a0
addiw a2,a3,8
mv a0,a4
bgeu a2,a1,.L15
lw a0,68(a5)
andi a0,a0,15
addiw a2,a3,9
addw a0,a0,a4
bgeu a2,a1,.L15
lw a2,76(a5)
andi a2,a2,15
addiw a4,a3,10
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,84(a5)
andi a2,a2,15
addiw a4,a3,11
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,92(a5)
andi a2,a2,15
addiw a4,a3,12
addw a0,a2,a0
bgeu a4,a1,.L15
lw a2,100(a5)
andi a2,a2,15
addiw a4,a3,13
addw a0,a2,a0
bgeu a4,a1,.L15
lw a4,108(a5)
andi a4,a4,15
addiw a3,a3,14
addw a0,a4,a0
bgeu a3,a1,.L15
lw a5,116(a5)
andi a5,a5,15
addw a0,a5,a0
ret
.L9:
li a0,0
.L15:
ret
.L3:
mv a5,a4
slli a4,a1,32
srli a1,a4,29
add a1,a5,a1
li a0,0
.L7:
lw a4,4(a5)
andi a4,a4,15
addi a5,a5,8
addw a0,a4,a0
bne a5,a1,.L7
ret

-O3 -S -mno-strict-align -fno-vect-cost-model:

f:
beq a1,zero,.L4
slli a1,a1,32
li a5,15
vsetvli a4,zero,e64,m1,ta,ma
slli a5,a5,32
srli a1,a1,32
li a6,32
vmv.v.x v3,a5
vsetvli zero,zero,e32,mf2,ta,ma
vmv.v.i v2,0
.L3:
vsetvli a5,a1,e64,m1,ta,ma
vle64.v v1,0(a0)
vsetvli a3,zero,e64,m1,ta,ma
slli a2,a5,3
vand.vv v1,v1,v3
sub a1,a1,a5
vsetvli zero,zero,e32,mf2,ta,ma
add a0,a0,a2
vnsrl.wx v1,v1,a6
vsetvli zero,a5,e32,mf2,tu,ma
vadd.vv v2,v2,v1
bne a1,zero,.L3
li a5,0
vsetvli a3,zero,e32,mf2,ta,ma
vmv.s.x v1,a5
vredsum.vs v2,v2,v1
vmv.x.s a0,v2
ret
.L4:
li a0,0
ret

We can see it improves this case codegen a lot.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro.
* config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern.
* config/riscv/vector.md (movmisalign<mode>): New pattern.

THead: Fix missing CFI directives for th.sdd in prologue.

When generating CFI directives for the store-pair instruction,
if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like
  (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp)
    (const_int 8 [0x8])) [1  S8 A64])
    (reg:DI 1 ra))
  (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1  S8 A64])
    (reg:DI 8 s0))
only the first expr_list will be recognized by dwarf2out_frame_debug
funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR,
which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the
dwarf2out_frame_debug_expr function will iterate through all the sub-expressions
and generate the corresponding CFI directives.

gcc/
* config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI
directives for store-pair instruction.

gcc/testsuite/
* gcc.target/riscv/xtheadmempair-4.c: New test.

tree-optimization/111715 - improve TBAA for access paths with pun

The following improves basic TBAA for access paths formed by
C++ abstraction where we are able to combine a path from an
address-taking operation with a path based on that access using
a pun to avoid memory access semantics on the address-taking part.

The trick is to identify the point the semantic memory access path
starts which allows us to use the alias set of the outermost access
instead of only that of the base of this path.

PR tree-optimization/111715
* alias.cc (reference_alias_ptr_type_1): When we have
a type-punning ref at the base search for the access
path part that's still semantically valid.

* gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.

RISC-V: Refine bswap16 auto vectorization code gen

Update in v2

* Remove emit helper functions.
* Take expand_binop instead.

Original log:

This patch would like to refine the code gen for the bswap16.

We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.

  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 slli    a2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub     a4,a4,a5
7 vse16.v v1,0(a3)
8 add     a0,a0,a2
9 add     a3,a3,a2
  bne     a4,zero,.L2

But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.

  .L5
1 vle8.v  v2,0(a5)
2 addi    a5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addi    a4,a4,32
  bne     a5,a6,.L5

Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
for shuffle bswap.
(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V Regression test: Fix FAIL of pr45752.c for RVV

RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model
instead of SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5.

testsuite: Fix vect_cond_arith_* dump checks for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: Also match COND_LEN.
* gcc.dg/vect/vect-cond-arith-4.c: Ditto.
* gcc.dg/vect/vect-cond-arith-5.c: Ditto.
* gcc.dg/vect/vect-cond-arith-6.c: Ditto.

RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

Reference: https://godbolt.org/z/G9jzf5Grh

RVV is able to vectorize this case using SLP. However, with -fno-vect-cost-model,
RVV vectorize it by vec_load_lanes with stride 6.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.

i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
functions to implement doubleword right shifts by 1 bit, using a shift
of the highpart that sets the carry flag followed by a rotate-carry-right
(RCR) instruction on the lowpart.

Conceptually this is similar to the recent left shift patch, but with two
complicating factors.  The first is that although the RCR sequence is
shorter, and is a ~3x performance improvement on AMD, my microbenchmarking
shows it ~10% slower on Intel.  Hence this patch also introduces a new
X86_TUNE_USE_RCR tuning parameter.  The second is that I believe this is
the first time a "rotate-right-through-carry" and a right shift that sets
the carry flag from the least significant bit has been modelled in GCC RTL
(on a MODE_CC target).  For this I've used the i386 back-end's UNSPEC_CC_NE
which seems appropriate.  Finally rcrsi2 and rcrdi2 are separate
define_insns so that we can use their generator functions.

For the pair of functions:
unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
__int128 bar(__int128 x) { return x >> 1; }

with -O2 -march=znver4 we previously generated:

foo: movq    %rdi, %rax
        movq    %rsi, %rdx
        shrdq   $1, %rsi, %rax
        shrq    %rdx
        ret
bar: movq    %rdi, %rax
        movq    %rsi, %rdx
        shrdq   $1, %rsi, %rax
        sarq    %rdx
        ret

with this patch we now generate:

foo: movq    %rsi, %rdx
        movq    %rdi, %rax
        shrq    %rdx
        rcrq    %rax
        ret
bar: movq    %rsi, %rdx
        movq    %rdi, %rax
        sarq    %rdx
        rcrq    %rax
        ret

2023-10-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by
one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
or -Oz.
(ix86_split_lshr): Likewise, split shifts by one bit into
lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
* config/i386/i386.h (TARGET_USE_RCR): New backend macro.
* config/i386/i386.md (rcrsi2): New define_insn for rcrl.
(rcrdi2): New define_insn for rcrq.
(<anyshiftrt><mode>3_carry): New define_insn for right shifts that
set the carry flag from the least significant bit, modelled using
UNSPEC_CC_NE.
* config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter
controlling use of rcr 1 vs. shrd, which is significantly faster on
AMD processors.

gcc/testsuite/ChangeLog
* gcc.target/i386/rcr-1.c: New 64-bit test case.
* gcc.target/i386/rcr-2.c: New 32-bit test case.

Allow -mno-evex512 usage

gcc/ChangeLog:

* config/i386/i386.opt: Allow -mno-evex512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/noevex512-1.c: New test.
* gcc.target/i386/noevex512-2.c: Ditto.
* gcc.target/i386/noevex512-3.c: Ditto.

Support -mevex512 for AVX512FP16 intrins

gcc/ChangeLog:

* config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512.
(VFH): Ditto.
(VF2H): Ditto.
(VFH_AVX512VL): Ditto.
(VHFBF): Ditto.
(VHF_AVX512VL): Ditto.
(VI2H_AVX512VL): Ditto.
(VI2F_256_512): Ditto.
(VF48_I1248): Remove unused iterator.
(VF48H_AVX512VL): Add TARGET_EVEX512.
(VF_AVX512): Remove unused iterator.
(REDUC_PLUS_MODE): Add TARGET_EVEX512.
(REDUC_SMINMAX_MODE): Ditto.
(FMAMODEM): Ditto.
(VFH_SF_AVX512VL): Ditto.
(VEC_PERM_AVX2): Ditto.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

Support -mevex512 for AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins

gcc/ChangeLog:

* config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512.
(VI8_FVL): Ditto.
(VI1_AVX512F): Ditto.
(VI1_AVX512VNNI): Ditto.
(VI1_AVX512VL_F): Ditto.
(VI12_VI48F_AVX512VL): Ditto.
(*avx512f_permvar_truncv32hiv32qi_1): Ditto.
(sdot_prod<mode>): Ditto.
(VEC_PERM_AVX2): Ditto.
(VPERMI2): Ditto.
(VPERMI2I): Ditto.
(vpmadd52<vpmadd52type>v8di): Ditto.
(usdot_prod<mode>): Ditto.
(vpdpbusd_v16si): Ditto.
(vpdpbusds_v16si): Ditto.
(vpdpwssd_v16si): Ditto.
(vpdpwssds_v16si): Ditto.
(VI48_AVX512VP2VL): Ditto.
(avx512vp2intersect_2intersectv16si): Ditto.
(VF_AVX512BF16VL): Ditto.
(VF1_AVX512_256): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr90096.c: Adjust error message.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

Support -mevex512 for AVX512BW intrins

gcc/Changelog:

* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Make sure there is EVEX512 enabled.
(ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512.
* config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask
when !TARGET_EVEX512.
* config/i386/i386.md (avx512bw_512): New.
(SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512.
(*zero_extendsidi2): Change isa to avx512bw_512.
(kmov_isa): Ditto.
(*anddi_1): Ditto.
(*andn<mode>_1): Change isa to kmov_isa.
(*<code><mode>_1): Ditto.
(*notxor<mode>_1): Ditto.
(*one_cmpl<mode>2_1): Ditto.
(*one_cmplsi2_1_zext): Change isa to avx512bw_512.
(*ashl<mode>3_1): Change isa to kmov_isa.
(*lshr<mode>3_1): Ditto.
* config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512.
(VI1248_AVX512VLBW): Ditto.
(VHFBF_AVX512VL): Ditto.
(VI): Ditto.
(VIHFBF): Ditto.
(VI_AVX2): Ditto.
(VI1_AVX512): Ditto.
(VI12_256_512_AVX512VL): Ditto.
(VI2_AVX2_AVX512BW): Ditto.
(VI2_AVX512VNNIBW): Ditto.
(VI2_AVX512VL): Ditto.
(VI2HFBF_AVX512VL): Ditto.
(VI8_AVX2_AVX512BW): Ditto.
(VIMAX_AVX2_AVX512BW): Ditto.
(VIMAX_AVX512VL): Ditto.
(VI12_AVX2_AVX512BW): Ditto.
(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
(VI248_AVX512VL): Ditto.
(VI248_AVX512VLBW): Ditto.
(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
(VI248_AVX512BW): Ditto.
(VI248_AVX512BW_AVX512VL): Ditto.
(VI248_512): Ditto.
(VI124_256_AVX512F_AVX512BW): Ditto.
(VI_AVX512BW): Ditto.
(VIHFBF_AVX512BW): Ditto.
(SWI1248_AVX512BWDQ): Ditto.
(SWI1248_AVX512BW): Ditto.
(SWI1248_AVX512BWDQ2): Ditto.
(*knotsi_1_zext): Ditto.
(define_split for zero_extend + not): Ditto.
(kunpckdi): Ditto.
(REDUC_SMINMAX_MODE): Ditto.
(VEC_EXTRACT_MODE): Ditto.
(*avx512bw_permvar_truncv16siv16hi_1): Ditto.
(*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto.
(truncv32hiv32qi2): Ditto.
(avx512bw_<code>v32hiv32qi2): Ditto.
(avx512bw_<code>v32hiv32qi2_mask): Ditto.
(avx512bw_<code>v32hiv32qi2_mask_store): Ditto.
(usadv64qi): Ditto.
(VEC_PERM_AVX2): Ditto.
(AVX512ZEXTMASK): Ditto.
(SWI24_MASK): New.
(vec_pack_trunc_<mode>): Change iterator to SWI24_MASK.
(avx512bw_packsswb<mask_name>): Add TARGET_EVEX512.
(avx512bw_packssdw<mask_name>): Ditto.
(avx512bw_interleave_highv64qi<mask_name>): Ditto.
(avx512bw_interleave_lowv64qi<mask_name>): Ditto.
(<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto.
(<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto.
(vec_unpacks_lo_di): Ditto.
(SWI48x_MASK): New.
(vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK.
(avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512.
(VI1248_AVX512VL_AVX512BW): Ditto.
(avx512bw_<code>v32qiv32hi2<mask_name>): Ditto.
(*avx512bw_zero_extendv32qiv32hi2_1): Ditto.
(*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
(<insn>v32qiv32hi2): Ditto.
(pbroadcast_evex_isa): Change isa attribute to avx512bw_512.
(VPERMI2): Add TARGET_EVEX512.
(VPERMI2I): Ditto.

Support -mevex512 for AVX512DQ intrins

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3):
Add TARGET_EVEX512 for 512 bit usage.
* config/i386/i386.cc (standard_sse_constant_opcode): Ditto.
* config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto.
(VF1_128_256VL): Ditto.
(VF2_AVX512VL): Ditto.
(VI8_256_512): Ditto.
(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
Ditto.
(AVX512_VEC): Ditto.
(AVX512_VEC_2): Ditto.
(VI4F_BRCST32x2): Ditto.
(VI8F_BRCST64x2): Ditto.

Support -mevex512 for AVX512F intrins

gcc/ChangeLog:

* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Disable 512 bit gather
when !TARGET_EVEX512.
* config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode):
Add TARGET_EVEX512.
(ix86_expand_int_sse_cmp): Ditto.
(ix86_expand_vector_init_one_nonzero): Disable subroutine
when !TARGET_EVEX512.
(ix86_emit_swsqrtsf): Add TARGET_EVEX512.
(ix86_vectorize_vec_perm_const): Disable subroutine when
!TARGET_EVEX512.
* config/i386/i386.cc
(standard_sse_constant_p): Add TARGET_EVEX512.
(standard_sse_constant_opcode): Ditto.
(ix86_get_ssemov): Ditto.
(ix86_legitimate_constant_p): Ditto.
(ix86_vectorize_builtin_scatter): Diable 512 bit scatter
when !TARGET_EVEX512.
* config/i386/i386.md (avx512f_512): New.
(movxi): Add TARGET_EVEX512.
(*movxi_internal_avx512f): Ditto.
(*movdi_internal): Change alternative 12 to ?Yv. Adjust mode
for alternative 13.
(*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for
alternative 9.
(*movhi_internal): Change alternative 11 to *Yv.
(*movdf_internal): Change alternative 12 to Yv.
(*movsf_internal): Change alternative 5 to Yv. Adjust mode for
alternative 5 and 6.
(*mov<mode>_internal): Change alternative 4 to Yv.
(define_split for convert SF to DF): Add TARGET_EVEX512.
(extendbfsf2_1): Ditto.
* config/i386/predicates.md (bcst_mem_operand): Disable predicate
for 512 bit when !TARGET_EVEX512.
* config/i386/sse.md (VMOVE): Add TARGET_EVEX512.
(V48_AVX512VL): Ditto.
(V48_256_512_AVX512VL): Ditto.
(V48H_AVX512VL): Ditto.
(VI12_AVX512VL): Ditto.
(V): Ditto.
(V_512): Ditto.
(V_256_512): Ditto.
(VF): Ditto.
(VF1_VF2_AVX512DQ): Ditto.
(VFH): Ditto.
(VFB): Ditto.
(VF1): Ditto.
(VF1_AVX2): Ditto.
(VF2): Ditto.
(VF2H): Ditto.
(VF2_512_256): Ditto.
(VF2_512_256VL): Ditto.
(VF_512): Ditto.
(VFB_512): Ditto.
(VI48_AVX512VL): Ditto.
(VI1248_AVX512VLBW): Ditto.
(VF_AVX512VL): Ditto.
(VFH_AVX512VL): Ditto.
(VF1_AVX512VL): Ditto.
(VI): Ditto.
(VIHFBF): Ditto.
(VI_AVX2): Ditto.
(VI8): Ditto.
(VI8_AVX512VL): Ditto.
(VI2_AVX512F): Ditto.
(VI4_AVX512F): Ditto.
(VI4_AVX512VL): Ditto.
(VI48_AVX512F_AVX512VL): Ditto.
(VI8_AVX2_AVX512F): Ditto.
(VI8_AVX_AVX512F): Ditto.
(V8FI): Ditto.
(V16FI): Ditto.
(VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto.
(VI248_AVX512VLBW): Ditto.
(VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto.
(VI248_AVX512BW): Ditto.
(VI248_AVX512BW_AVX512VL): Ditto.
(VI48_AVX512F): Ditto.
(VI48_AVX_AVX512F): Ditto.
(VI12_AVX_AVX512F): Ditto.
(VI148_512): Ditto.
(VI124_256_AVX512F_AVX512BW): Ditto.
(VI48_512): Ditto.
(VI_AVX512BW): Ditto.
(VIHFBF_AVX512BW): Ditto.
(VI4F_256_512): Ditto.
(VI48F_256_512): Ditto.
(VI48F): Ditto.
(VI12_VI48F_AVX512VL): Ditto.
(V32_512): Ditto.
(AVX512MODE2P): Ditto.
(STORENT_MODE): Ditto.
(REDUC_PLUS_MODE): Ditto.
(REDUC_SMINMAX_MODE): Ditto.
(*andnot<mode>3): Change isa attribute to avx512f_512.
(*andnot<mode>3): Ditto.
(<code><mode>3): Ditto.
(<code>tf3): Ditto.
(FMAMODEM): Add TARGET_EVEX512.
(FMAMODE_AVX512): Ditto.
(VFH_SF_AVX512VL): Ditto.
(avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>):
Ditto.
(avx512f_cvtdq2pd512_2): Ditto.
(avx512f_cvtpd2dq512<mask_name><round_name>): Ditto.
(fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>):
Ditto.
(<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto.
(vec_unpacks_lo_v16sf): Ditto.
(vec_unpacks_hi_v16sf): Ditto.
(vec_unpacks_float_hi_v16si): Ditto.
(vec_unpacks_float_lo_v16si): Ditto.
(vec_unpacku_float_hi_v16si): Ditto.
(vec_unpacku_float_lo_v16si): Ditto.
(vec_pack_sfix_trunc_v8df): Ditto.
(avx512f_vec_pack_sfix_v8df): Ditto.
(<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto.
(<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto.
(<mask_codefor>avx512f_movshdup512<mask_name>): Ditto.
(<mask_codefor>avx512f_movsldup512<mask_name>): Ditto.
(AVX512_VEC): Ditto.
(AVX512_VEC_2): Ditto.
(vec_extract_lo_v64qi): Ditto.
(vec_extract_hi_v64qi): Ditto.
(VEC_EXTRACT_MODE): Ditto.
(<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto.
(avx512f_movddup512<mask_name>): Ditto.
(avx512f_unpcklpd512<mask_name>): Ditto.
(*<avx512>_vternlog<mode>_all): Ditto.
(*<avx512>_vpternlog<mode>_1): Ditto.
(*<avx512>_vpternlog<mode>_2): Ditto.
(*<avx512>_vpternlog<mode>_3): Ditto.
(avx512f_shufps512_mask): Ditto.
(avx512f_shufps512_1<mask_name>): Ditto.
(avx512f_shufpd512_mask): Ditto.
(avx512f_shufpd512_1<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto.
(vec_dupv2df<mask_name>): Ditto.
(trunc<pmov_src_lower><mode>2): Ditto.
(*avx512f_<code><pmov_src_lower><mode>2): Ditto.
(*avx512f_vpermvar_truncv8div8si_1): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask): Ditto.
(avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto.
(truncv8div8qi2): Ditto.
(avx512f_<code>v8div16qi2): Ditto.
(*avx512f_<code>v8div16qi2_store_1): Ditto.
(*avx512f_<code>v8div16qi2_store_2): Ditto.
(avx512f_<code>v8div16qi2_mask): Ditto.
(*avx512f_<code>v8div16qi2_mask_1): Ditto.
(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
(avx512f_<code>v8div16qi2_mask_store_2): Ditto.
(vec_widen_umult_even_v16si<mask_name>): Ditto.
(*vec_widen_umult_even_v16si<mask_name>): Ditto.
(vec_widen_smult_even_v16si<mask_name>): Ditto.
(*vec_widen_smult_even_v16si<mask_name>): Ditto.
(VEC_PERM_AVX2): Ditto.
(one_cmpl<mode>2): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto.
(*one_cmpl<mode>2_pternlog_false_dep): Ditto.
(define_split to xor): Ditto.
(*andnot<mode>3): Ditto.
(define_split for ior): Ditto.
(*iornot<mode>3): Ditto.
(*xnor<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
(<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto.
(<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto.
(avx512f_pshufdv3_mask): Ditto.
(avx512f_pshufd_1<mask_name>): Ditto.
(*vec_extractv4ti): Ditto.
(VEXTRACTI128_MODE): Ditto.
(define_split to vec_extract): Ditto.
(VI1248_AVX512VL_AVX512BW): Ditto.
(<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto.
(<insn>v16qiv16si2): Ditto.
(avx512f_<code>v16hiv16si2<mask_name>): Ditto.
(<insn>v16hiv16si2): Ditto.
(avx512f_zero_extendv16hiv16si2_1): Ditto.
(avx512f_<code>v8qiv8di2<mask_name>): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto.
(*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto.
(<insn>v8qiv8di2): Ditto.
(avx512f_<code>v8hiv8di2<mask_name>): Ditto.
(<insn>v8hiv8di2): Ditto.
(avx512f_<code>v8siv8di2<mask_name>): Ditto.
(*avx512f_zero_extendv8siv8di2_1): Ditto.
(*avx512f_zero_extendv8siv8di2_2): Ditto.
(<insn>v8siv8di2): Ditto.
(avx512f_roundps512_sfix): Ditto.
(vashrv8di3): Ditto.
(vashrv16si3): Ditto.
(pbroadcast_evex_isa): Change isa attribute to avx512f_512.
(vec_dupv4sf): Add TARGET_EVEX512.
(*vec_dupv4si): Ditto.
(*vec_dupv2di): Ditto.
(vec_dup<mode>): Change isa attribute to avx512f_512.
(VPERMI2): Add TARGET_EVEX512.
(VPERMI2I): Ditto.
(VEC_INIT_MODE): Ditto.
(VEC_INIT_HALF_MODE): Ditto.
(<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>):
Ditto.
(avx512f_vcvtps2ph512_mask_sae): Ditto.
(<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>):
Ditto.
(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
(INT_BROADCAST_MODE): Ditto.

Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Disable zmm broadcast for !TARGET_EVEX512.
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not use PVW_512 when no-evex512.
(ix86_simd_clone_adjust): Add evex512 target into string.
* config/i386/i386.cc (type_natural_mode): Report ABI warning
when using zmm register w/o evex512.
(ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_set_reg_reg_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_mode_supported_p): Ditto.
(ix86_preferred_simd_mode): Ditto.
(ix86_get_mask_mode): Ditto.
(ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit
libmvec call when !TARGET_EVEX512.
(ix86_simd_clone_usable): Ditto.
* config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment
when !TARGET_EVEX512
(MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512.
(STORE_MAX_PIECES): Ditto.

[PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.

[PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins

gcc/ChangeLog:

* config/i386/i386-builtin.def (BDESC): Add
OPTION_MASK_ISA2_EVEX512.
* config/i386/i386-builtins.cc
(ix86_init_mmx_sse_builtins): Ditto.

[PATCH 5/5] Push evex512 target for 512 bit intrins

gcc/Changelog:

* config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit
intrins.

Co-authored-by: Hu, Lin1 <lin1.hu@intel.com>

[PATCH 4/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config.gcc: Add avx512bitalgvlintrin.h.
* config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit
intrins.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/avx512bf16intrin.h: Ditto.
* config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit
intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h.
* config/i386/avx512erintrin.h: Add evex512 target for 512 bit
intrins
* config/i386/avx512ifmaintrin.h: Ditto
* config/i386/avx512pfintrin.h: Ditto
* config/i386/avx512vbmi2intrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vnniintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vpopcntdqintrin.h: Ditto.
* config/i386/gfniintrin.h: Ditto.
* config/i386/immintrin.h: Add avx512bitalgvlintrin.h.
* config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins.
* config/i386/vpclmulqdqintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: New.

[PATCH 4/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512bwintrin.h: Add evex512 target for 512 bit
intrins.

[PATCH 2/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512dqintrin.h: Add evex512 target for 512 bit
intrins.

[PATCH 1/5] Push evex512 target for 512 bit intrins

gcc/ChangeLog:

* config/i386/avx512fintrin.h: Add evex512 target for 512 bit intrins.

Initial support for -mevex512

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_EVEX512_SET): New.
(OPTION_MASK_ISA2_EVEX512_UNSET): Ditto.
(ix86_handle_option): Handle EVEX512.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__
when AVX512VL is set.
* config/i386/i386-options.cc: (isa2_opts): Handle EVEX512.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Set EVEX512 target if it is not
explicitly set when AVX512 is enabled. Disable
AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512.
* config/i386/i386.opt: Add mevex512. Temporaily RejectNegative.

TEST: Fix dump FAIL for RVV (RISCV-V vector)

As this showed: https://godbolt.org/z/3K9oK7fx3

ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.

This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis.
Then we succeed at the second time.

So RVV has 4 times of showing "FOLD_EXTRACT_LAST:.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant.

rs6000: support 32bit inline lrint

gcc/
PR target/88558
* config/rs6000/rs6000.md (lrint<mode>di2): Remove TARGET_FPRND
from insn condition.
(lrint<mode>si2): New insn pattern for 32bit lrint.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr88558.h: New.
* gcc.target/powerpc/pr88558-p7.c: New.
* gcc.target/powerpc/pr88558-p8.c: New.

rs6000: enable SImode in FP register on P7

gcc/
PR target/88558
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Enable SImode on FP registers for P7.
* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
move between FP registers. Set attribute isa of stfiwx to "*"
and attribute of stxsiwx to "p7".

s390: Make use of new copysign RTL

gcc/ChangeLog:

* config/s390/s390.md: Make use of new copysign RTL.

[i386] APX EGPR: fix missing patterns that prohibit egpr

For some pattern m/Bm constraint in alternative 0 and 1 could result in
egpr allocated on memory operand under -mapxf. Should use jm/ja instead.

gcc/ChangeLog:

* config/i386/sse.md (vec_concatv2di): Replace constraint "m"
with "jm" for alternative 0 and 1 of operand 2.
(sse4_1_<code><mode>3<mask_name>): Replace constraint "Bm" with
"ja" for alternative 0 and 1 of operand2.

Daily bump.

libcpp: eliminate LINEMAPS_{ORDINARY,MACRO}_MAPS

libcpp/ChangeLog:
* include/line-map.h (LINEMAPS_ORDINARY_MAPS): Delete.
(LINEMAPS_MACRO_MAPS): Delete.
* line-map.cc (linemap_tracks_macro_expansion_locs_p): Update for
deletion of LINEMAPS_MACRO_MAPS.
(linemap_get_statistics): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate LINEMAPS_{,ORDINARY_,MACRO_}CACHE

It's simpler to use field access than to go through these inline
functions that look as if they are macros.

No functional change intended.

libcpp/ChangeLog:
* include/line-map.h (maps_info_ordinary::cache): Rename to...
(maps_info_ordinary::m_cache): ...this.
(maps_info_macro::cache): Rename to...
(maps_info_macro::m_cache): ...this.
(LINEMAPS_CACHE): Delete.
(LINEMAPS_ORDINARY_CACHE): Delete.
(LINEMAPS_MACRO_CACHE): Delete.
* init.cc (read_original_filename): Update for adding "m_" prefix.
* line-map.cc (linemap_add): Eliminate LINEMAPS_ORDINARY_CACHE in
favor of a simple field access.
(linemap_enter_macro): Likewise for LINEMAPS_MACRO_CACHE.
(linemap_ordinary_map_lookup): Likewise for
LINEMAPS_ORDINARY_CACHE, twice.
(linemap_lookup_macro_index): Likewise for LINEMAPS_MACRO_CACHE.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate LINEMAPS_LAST_ALLOCATED{,_ORDINARY,_MACRO}_MAP

Nothing uses these; delete them.

libcpp/ChangeLog:
* include/line-map.h (LINEMAPS_LAST_ALLOCATED_MAP): Delete.
(LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Delete.
(LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Delete.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: improvements to out-of-bounds diagrams [PR111155]

Update out-of-bounds diagrams to show existing string values,
and the initial write index within a string buffer.

For example, given the out-of-bounds write in strcat in:

void test (void)
{
  char buf[10];
  strcpy (buf, "hello");
  strcat (buf, " world!");
}

the diagram improves from:

                           ┌─────┬─────┬────┬────┬────┐┌─────┬─────┬─────┐
                           │ [0] │ [1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │
                           ├─────┼─────┼────┼────┼────┤├─────┼─────┼─────┤
                           │ ' ' │ 'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
                           ├─────┴─────┴────┴────┴────┴┴─────┴─────┴─────┤
                           │      string literal (type: 'char[8]')       │
                           └─────────────────────────────────────────────┘
                              │     │    │    │    │      │     │     │
                              │     │    │    │    │      │     │     │
                              v     v    v    v    v      v     v     v
  ┌─────┬────────────────────────────────────────┬────┐┌─────────────────┐
  │ [0] │                  ...                   │[9] ││                 │
  ├─────┴────────────────────────────────────────┴────┤│after valid range│
  │             'buf' (type: 'char[10]')              ││                 │
  └───────────────────────────────────────────────────┘└─────────────────┘
  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
                            │                                   │
                  ╭─────────┴────────╮                ╭─────────┴─────────╮
                  │capacity: 10 bytes│                │overflow of 3 bytes│
                  ╰──────────────────╯                ╰───────────────────╯

to:

                             ┌────┬────┬────┬────┬────┐┌─────┬─────┬─────┐
                             │[0] │[1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │
                             ├────┼────┼────┼────┼────┤├─────┼─────┼─────┤
                             │' ' │'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
                             ├────┴────┴────┴────┴────┴┴─────┴─────┴─────┤
                             │     string literal (type: 'char[8]')      │
                             └───────────────────────────────────────────┘
                               │    │    │    │    │      │     │     │
                               │    │    │    │    │      │     │     │
                               v    v    v    v    v      v     v     v
  ┌─────┬────────────────────┬────┬──────────────┬────┐┌─────────────────┐
  │ [0] │        ...         │[5] │     ...      │[9] ││                 │
  ├─────┼────┬────┬────┬────┬┼────┼──────────────┴────┘│                 │
  │ 'h' │'e' │'l' │'l' │'o' ││NUL │                    │after valid range│
  ├─────┴────┴────┴────┴────┴┴────┴───────────────────┐│                 │
  │             'buf' (type: 'char[10]')              ││                 │
  └───────────────────────────────────────────────────┘└─────────────────┘
  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
                            │                                   │
                  ╭─────────┴────────╮                ╭─────────┴─────────╮
                  │capacity: 10 bytes│                │overflow of 3 bytes│
                  ╰──────────────────╯                ╰───────────────────╯

gcc/analyzer/ChangeLog:
PR analyzer/111155
* access-diagram.cc (boundaries::boundaries): Add logger param
(boundaries::add): Add logging.
(boundaries::get_hard_boundaries_in_range): New.
(boundaries::m_logger): New field.
(boundaries::get_table_x_for_offset): Make public.
(class svalue_spatial_item): New.
(class compound_svalue_spatial_item): New.
(add_ellipsis_to_gaps): New.
(valid_region_spatial_item::valid_region_spatial_item): Add theme
param.  Initialize m_boundaries, m_existing_sval, and
m_existing_sval_spatial_item.
(valid_region_spatial_item::add_boundaries): Set m_boundaries.
Add boundaries for any m_existing_sval_spatial_item.
(valid_region_spatial_item::add_array_elements_to_table): Rewrite
creation of min/max index in terms of
maybe_add_array_index_to_table.  Rewrite ellipsis code using
add_ellipsis_to_gaps. Add index values for any hard boundaries
within the valid region.
(valid_region_spatial_item::maybe_add_array_index_to_table): New,
based on code formerly in add_array_elements_to_table.
(valid_region_spatial_item::make_table): Make use of
m_existing_sval_spatial_item, if any.
(valid_region_spatial_item::m_boundaries): New field.
(valid_region_spatial_item::m_existing_sval): New field.
(valid_region_spatial_item::m_existing_sval_spatial_item): New
field.
(class svalue_spatial_item): Rename to...
(class written_svalue_spatial_item): ...this.
(class string_region_spatial_item): Rename to..
(class string_literal_spatial_item): ...this.  Add "kind".
(string_literal_spatial_item::add_boundaries): Use m_kind to
determine kind of boundary.  Update for renaming of m_actual_bits
to m_bits.
(string_literal_spatial_item::make_table): Likewise.  Support not
displaying a row for byte indexes, and not displaying a row for
the type.
(string_literal_spatial_item::add_column_for_byte): Make byte index
row optional.
(svalue_spatial_item::make): Convert to...
(make_written_svalue_spatial_item): ...this.
(make_existing_svalue_spatial_item): New.
(access_diagram_impl::access_diagram_impl): Pass theme to
m_valid_region_spatial_item ctor.  Update for renaming of
m_svalue_spatial_item.
(access_diagram_impl::find_boundaries): Pass logger to boundaries.
Update for renaming of...
(access_diagram_impl::m_svalue_spatial_item): Rename to...
(access_diagram_impl::m_written_svalue_spatial_item): ...this.

gcc/testsuite/ChangeLog:
PR analyzer/111155
* c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c: New test.
* c-c++-common/analyzer/out-of-bounds-diagram-strcat.c: New test.
* gcc.dg/analyzer/out-of-bounds-diagram-17.c: Update expected
result to show the existing content of "buf" and the index at
which the write starts.
* gcc.dg/analyzer/out-of-bounds-diagram-18.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-19.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-6.c: Update expected
output.

gcc/ChangeLog:
PR analyzer/111155
* text-art/table.cc (table::maybe_set_cell_span): New.
(table::add_other_table): New.
* text-art/table.h (class table::cell_placement): Add class table
as a friend.
(table::add_rows): New.
(table::add_row): Reimplement in terms of add_rows.
(table::maybe_set_cell_span): New decl.
(table::add_other_table): New decl.
* text-art/types.h (operator+): New operator for rect + coord.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: eliminate COMBINE_LOCATION_DATA

This patch eliminates the function "COMBINE_LOCATION_DATA" (which hasn't
been a macro since r6-739-g0501dbd932a7e9) and the function
"get_combined_adhoc_loc" in favor of a new
line_maps::get_or_create_combined_loc member function.

No functional change intended.

gcc/cp/ChangeLog:
* module.cc (module_state::read_location): Update for renaming of
get_combined_adhoc_loc.

gcc/ChangeLog:
* genmatch.cc (main): Update for "m_" prefix of some fields of
line_maps.
* input.cc (make_location): Update for removal of
COMBINE_LOCATION_DATA.
(dump_line_table_statistics): Update for "m_" prefix of some
fields of line_maps.
(location_with_discriminator): Update for removal of
COMBINE_LOCATION_DATA.
(line_table_test::line_table_test): Update for "m_" prefix of some
fields of line_maps.
* toplev.cc (general_init): Likewise.
* tree.cc (set_block): Update for removal of
COMBINE_LOCATION_DATA.
(set_source_range): Likewise.

libcpp/ChangeLog:
* include/line-map.h (line_maps::reallocator): Rename to...
(line_maps::m_reallocator): ...this.
(line_maps::round_alloc_size): Rename to...
(line_maps::m_round_alloc_size): ...this.
(line_maps::location_adhoc_data_map): Rename to...
(line_maps::m_location_adhoc_data_map): ...this.
(line_maps::num_optimized_ranges): Rename to...
(line_maps::m_num_optimized_ranges): ..this.
(line_maps::num_unoptimized_ranges): Rename to...
(line_maps::m_num_unoptimized_ranges): ...this.
(get_combined_adhoc_loc): Delete decl.
(COMBINE_LOCATION_DATA): Delete.
* lex.cc (get_location_for_byte_range_in_cur_line): Update for
removal of COMBINE_LOCATION_DATA.
(warn_about_normalization): Likewise.
(_cpp_lex_direct): Likewise.
* line-map.cc (line_maps::~line_maps): Update for "m_" prefix of
some fields of line_maps.
(rebuild_location_adhoc_htab): Likewise.
(can_be_stored_compactly_p): Convert to...
(line_maps::can_be_stored_compactly_p): ...this private member
function.
(get_combined_adhoc_loc): Convert to...
(line_maps::get_or_create_combined_loc): ...this public member
function.
(line_maps::make_location): Update for removal of
COMBINE_LOCATION_DATA.
(get_data_from_adhoc_loc): Update for "m_" prefix of some fields
of line_maps.
(get_discriminator_from_adhoc_loc): Likewise.
(get_location_from_adhoc_loc): Likewise.
(get_range_from_adhoc_loc): Convert to...
(line_maps::get_range_from_adhoc_loc): ...this private member
function.
(line_maps::get_range_from_loc): Update for conversion of
get_range_from_adhoc_loc to a member function.
(linemap_init): Update for "m_" prefix of some fields of
line_maps.
(line_map_new_raw): Likewise.
(linemap_enter_macro): Likewise.
(linemap_get_statistics): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libcpp: "const" and other cleanups

No functional change intended.

gcc/ChangeLog:
* input.cc (make_location): Move implementation to
line_maps::make_location.

libcpp/ChangeLog:
* include/line-map.h (line_maps::pure_location_p): New decl.
(line_maps::get_pure_location): New decl.
(line_maps::get_range_from_loc): New decl.
(line_maps::get_start): New.
(line_maps::get_finish): New.
(line_maps::make_location): New decl.
(get_range_from_loc): Make line_maps param const.
(get_discriminator_from_loc): Likewise.
(pure_location_p): Likewise.
(get_pure_location): Likewise.
(linemap_check_files_exited): Likewise.
(linemap_tracks_macro_expansion_locs_p): Likewise.
(linemap_location_in_system_header_p): Likewise.
(linemap_location_from_macro_definition_p): Likewise.
(linemap_macro_map_loc_unwind_toward_spelling): Likewise.
(linemap_included_from_linemap): Likewise.
(first_map_in_common): Likewise.
(linemap_compare_locations): Likewise.
(linemap_location_before_p): Likewise.
(linemap_resolve_location): Likewise.
(linemap_unwind_toward_expansion): Likewise.
(linemap_unwind_to_first_non_reserved_loc): Likewise.
(linemap_expand_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(linemap_get_statistics): Likewise.
(linemap_dump_location): Likewise.
(linemap_dump): Likewise.
(line_table_dump): Likewise.
* internal.h (linemap_get_expansion_line): Likewise.
(linemap_get_expansion_filename): Likewise.
* line-map.cc (can_be_stored_compactly_p): Likewise.
(get_data_from_adhoc_loc): Drop redundant "class".
(get_discriminator_from_adhoc_loc): Likewise.
(get_location_from_adhoc_loc): Likewise.
(get_range_from_adhoc_loc): Likewise.
(get_range_from_loc): Make const and move implementation to...
(line_maps::get_range_from_loc): ...this new function.
(get_discriminator_from_loc): Make line_maps param const.
(pure_location_p): Make const and move implementation to...
(line_maps::pure_location_p): ...this new function.
(get_pure_location): Make const and move implementation to...
(line_maps::get_pure_location): ...this new function.
(linemap_included_from_linemap): Make line_maps param const.
(linemap_check_files_exited): Likewise.
(linemap_tracks_macro_expansion_locs_p): Likewise.
(linemap_macro_map_loc_unwind_toward_spelling): Likewise.
(linemap_get_expansion_line): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_location_in_system_header_p): Likewise.
(first_map_in_common_1): Likewise.
(linemap_compare_locations): Likewise.
(linemap_macro_loc_to_spelling_point): Likewise.
(linemap_macro_loc_to_def_point): Likewise.
(linemap_macro_loc_to_exp_point): Likewise.
(linemap_resolve_location): Likewise.
(linemap_location_from_macro_definition_p): Likewise.
(linemap_unwind_toward_expansion): Likewise.
(linemap_unwind_to_first_non_reserved_loc): Likewise.
(linemap_expand_location): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(linemap_get_statistics): Likewise.
(line_table_dump): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix ICE on sarif output when source file is unreadable [PR111700]

gcc/ChangeLog:
PR driver/111700
* input.cc (file_cache::add_file): Update leading comment to
clarify that it can fail.
(file_cache::lookup_or_add_file): Likewise.
(file_cache::get_source_file_content): Gracefully handle
lookup_or_add_file failing.

gcc/testsuite/ChangeLog:
PR driver/111700
* c-c++-common/diagnostic-format-sarif-file-pr111700.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_build_const_vector): Handle V2HF
and V4HFmode.
(ix86_build_signbit_mask): Ditto.
* config/i386/mmx.md (mmxintvecmode): Ditto.
(<code><mode>2): New define_expand.
(*mmx_<code><mode>): New define_insn_and_split.
(*mmx_nabs<mode>2): Ditto.
(*mmx_andnot<mode>3): New define_insn.
(<code><mode>3): Ditto.
(copysign<mode>3): New define_expand.
(xorsign<mode>3): Ditto.
(signbit<mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-absneghf.c: New test.
* gcc.target/i386/part-vect-copysignhf.c: New test.
* gcc.target/i386/part-vect-xorsignhf.c: New test.

Support smin/smax for V2HF/V4HF

gcc/ChangeLog:

* config/i386/mmx.md (VHF_32_64): New mode iterator.
(<insn><mode>3): New define_expand, merged from ..
(<insn>v4hf3): .. this and
(<insn>v2hf3): .. this.
(movd_v2hf_to_sse_reg): New define_expand, splitted from ..
(movd_v2hf_to_sse): .. this.
(<code><mode>3): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-vminmaxph-1.c: New test.
* gcc.target/i386/avx512fp16-64-32-vecop-1.c: Scan-assembler
only for { target { ! ia32 } }.

Fortran/OpenMP: Fix handling of strictly structured blocks

For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.

gcc/fortran/ChangeLog:

* parse.cc (parse_omp_structured_block): Make the user code end
up inside of BLOCK construct for strictly structured blocks;
fix fallout for 'section' and 'teams'.
* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
for teams in target checking.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/block_17.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.

rs6000: build constant via li/lis;rldic

This patch checks if a constant is possible to be built by "li;rldic".
Only need to take care of "negative li", other forms do not need to check.
For example, "negative lis" is just a "negative li" with an additional shift.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via li/lis;rldicl/rldicr

If a constant is possible left/right cleaned on a rotated value from
a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr"
to build the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New
function.
(can_be_built_by_li_lis_and_rldicr): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and
can_be_built_by_li_lis_and_rldicl.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via lis;rotldi

If a constant is possible to be rotated to/from a negative value from
"lis", then using "lis;rotldi" to build the constant.

The positive value of "lis" does not need to be analyzed. Because if a
constant can be rotated from the positive value of "lis", it also can be
rotated from a positive value of "li".

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New
function.
(can_be_built_by_li_and_rotldi): Rename to ...
(can_be_built_by_li_lis_and_rotldi): ... this function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: Add more tests.

rs6000: build constant via li;rotldi

If a constant is possible to be rotated to/from a positive or negative
value which "li" can generated, then "li;rotldi" can be used to build
the constant.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.

[i386] Fix apx test fails on 32bit target

Since -mapxf works similar as -muintr that will emit error for 32bit
target, add !ia32 target guard for apx related tests.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: Compile for non-ia32.
* gcc.target/i386/apx-inline-gpr-norex2.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: Likewise.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: Likewise.

RISC-V: add static-pie support

We only need to pass options to the linker when static-pie is passed.
There's another patch to enable static-pie in glibc. And we need to
enable in GCC first.

gcc/ChangeLog:

* config/riscv/linux.h: Pass the static-pie specific options to
the linker.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>

TEST: Fix XPASS of TSVC testsuites for RVV

Fix these following XPASS FAILs of TSVC for RVV:

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS.
* gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s271.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s272.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s273.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s274.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s276.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s278.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s279.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s3111.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s441.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-s443.c: Ditto.
* gcc.dg/vect/tsvc/vect-tsvc-vif.c: Ditto.

RISC-V: Enable more tests of "vect" for RVV

This patch enables almost full coverage vectorization tests for RVV, except these
following tests (not enabled yet):

1. Will enable soon:

check_effective_target_vect_call_lrint
check_effective_target_vect_call_btrunc
check_effective_target_vect_call_btruncf
check_effective_target_vect_call_ceil
check_effective_target_vect_call_ceilf
check_effective_target_vect_call_floor
check_effective_target_vect_call_floorf
check_effective_target_vect_call_lceil
check_effective_target_vect_call_lfloor
check_effective_target_vect_call_nearbyint
check_effective_target_vect_call_nearbyintf
check_effective_target_vect_call_round
check_effective_target_vect_call_roundf

2. Not sure we will need to enable or not:

check_effective_target_vect_complex_*
check_effective_target_vect_simd_clones
check_effective_target_vect_bswap
check_effective_target_vect_widen_shift
check_effective_target_vect_widen_mult_*
check_effective_target_vect_widen_sum_*
check_effective_target_vect_unpack
check_effective_target_vect_interleave
check_effective_target_vect_extract_even_odd
check_effective_target_vect_pack_trunc
check_effective_target_vect_check_ptrs
check_effective_target_vect_sdiv_pow2_si
check_effective_target_vect_usad_*
check_effective_target_vect_udot_*
check_effective_target_vect_sdot_*
check_effective_target_vect_gather_load_ifn

After this patch, we will have these following additional FAILs:
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1

FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1
FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1
FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector shifts"

FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_ADD"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_SUB"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_MUL"
FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects  scan-tree-dump optimized " = \\.COND_RDIV"

They are all dump FAILs (No more ICE and execution FAILs).

Fixing those FAILs will be another separate patch.

But I think we should commit this patch first.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable more vect tests for RVV.

Daily bump.