git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

hppa: Fix (plus (plus (mult (a) (mem_shadd_constant)) (b)) (c)) optimization

The constant C must be an integral multiple of the shift value in
the above optimization. Non integral values can occur evaluating
IMAGPART_EXPR when the shadd constant is 8 and we have SFmode.

2024-08-06 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/113384
* config/pa/pa.cc (hppa_legitimize_address): Add check to
ensure constant is an integral multiple of shift the value.

sh: Don't call make_insn_raw in sh_recog_treg_set_expr [PR116189]

This was an interesting compare debug failure to debug. The first symptom
was in gcse which would produce different order of creating psedu-registers. This
was caused by a different order of a hashtable walk, due to the hash table having different
number of entries. Which in turn was due to the number of max insn being different between
the 2 runs. The place max insn uid comes from was in sh_recog_treg_set_expr which is called
via rtx_costs and fwprop would cause rtx_costs in some cases for debug insn related stuff.

Build and tested for sh4-linux-gnu.

PR target/116189

gcc/ChangeLog:

* config/sh/sh.cc (sh_recog_treg_set_expr): Don't call make_insn_raw,
make the insn with a fake uid.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr116189-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 0355c943b9e954e8f59068971d934f1b91ecb729)

Daily bump.

libgomp: Remove bogus warnings from privatized-ref-2.f90.

2024-07-19 Paul Thomas <pault@gcc.gnu.org>

libgomp/ChangeLog

* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Cut
dg-note about 'a' and remove bogus warnings about its array
descriptor components being used uninitialized.

(cherry picked from commit 8d6994f33a98a168151a57a3d21395b19196cd9d)

Fortran: Suppress bogus used uninitialized warnings [PR108889].

2024-07-18 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/108889
* gfortran.h: Add bit field 'allocated_in_scope' to gfc_symbol.
* trans-array.cc (gfc_array_allocate): Set 'allocated_in_scope'
after allocation if not a component reference.
(gfc_alloc_allocatable_for_assignment): If 'allocated_in_scope'
not set, not a component ref and not allocated, set the array
bounds and offset to give zero length in all dimensions. Then
set allocated_in_scope.

gcc/testsuite/
PR fortran/108889
* gfortran.dg/pr108889.f90: New test.

(cherry picked from commit c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc)

Daily bump.

i386: Use _mm_setzero_ps/d instead of _mm_avx512_setzero_ps/d for GCC13/12

In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced
in GCC14.

gcc/ChangeLog:

* config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use
_mm_setzero_pd instead of _mm_avx512_setzero_pd.
(_mm_reduce_round_ss): Use _mm_setzero_ps instead of
_mm_avx512_setzero_ps.

i386: Fix AVX512 intrin macro typo

There are several typo in AVX512 intrins macro define. Correct them to solve
errors when compiled with -O0.

gcc/ChangeLog:

* config/i386/avx512dqintrin.h
(_mm_mask_fpclass_ss_mask): Correct operand order.
(_mm_mask_fpclass_sd_mask): Ditto.
(_mm256_maskz_reduce_round_ss): Use __builtin_ia32_reducess_mask_round
instead of __builtin_ia32_reducesd_mask_round.
(_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
(_mm_reduce_round_ss): Ditto.
* config/i386/avx512vlbwintrin.h
(_mm256_mask_alignr_epi8): Correct operand usage.
(_mm_mask_alignr_epi8): Ditto.
* config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-vpalignr-1b.c: New test.
* gcc.target/i386/avx512dq-vfpclasssd-1b.c: Ditto.
* gcc.target/i386/avx512dq-vfpclassss-1b.c: Ditto.
* gcc.target/i386/avx512dq-vreducesd-1b.c: Ditto.
* gcc.target/i386/avx512dq-vreducess-1b.c: Ditto.
* gcc.target/i386/avx512vl-valignq-1b.c: Ditto.

Daily bump.

rs6000: Catch unsupported ABI errors when using -mrop-protect [PR114759,PR115988]

2024-07-18 Peter Bergner <bergner@linux.ibm.com>

gcc/testsuite/
PR target/114759
PR target/115988
* gcc.target/powerpc/pr114759-3.c: Catch unsupported ABI errors.

(cherry picked from commit b2f47a5c1d5204131660ea0372a08e692df8844e)

rs6000: Error on CPUs and ABIs that don't support the ROP protection insns [PR114759]

We currently silently ignore the -mrop-protect option for old CPUs we don't
support with the ROP hash insns, but we throw an error for unsupported ABIs.
This patch treats unsupported CPUs and ABIs similarly by throwing an error
both both. This matches clang behavior and allows us to simplify our tests
in the code that generates our prologue and epilogue code.

2024-06-26 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Disallow
CPUs and ABIs that do no support the ROP protection insns.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
unneeded tests.
(rs6000_emit_prologue): Likewise.
Remove unneeded gcc_assert.
(rs6000_emit_epilogue): Likewise.
* config/rs6000/rs6000.md: Likewise.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-3.c: New test.

(cherry picked from commit 6f2bab9b5d1ce1914c748b7dcd8638dafaa98df7)

rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

We currently only emit the ROP-protect hash* insns for Power10, where the
insns were added to the architecture.  We want to emit them for earlier
cpus (where they operate as NOPs), so that if those older binaries are
ever executed on a Power10, then they'll be protected from ROP attacks.
Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
them for Power8 and later.  This matches clang's behavior.

2024-06-19  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/114759
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Use TARGET_POWER8.
(rs6000_emit_prologue): Likewise.
* config/rs6000/rs6000.md (hashchk): Likewise.
(hashst): Likewise.
Fix whitespace.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-2.c: New test.
* lib/target-supports.exp (rop_ok): Use
check_effective_target_has_arch_pwr8.

(cherry picked from commit a05c3d23d1e1c8d2971b123804fc7a61a3561adb)

rs6000: Compute rop_hash_save_offset for non-Altivec compiles [PR115389]

We currently only compute the offset for the ROP hash save location in
the stack frame for Altivec compiles.  For non-Altivec compiles when we
emit ROP mitigation instructions, we use a default offset of zero which
corresponds to the backchain save location which will get clobbered on
any call.  The fix is to compute the ROP hash save location for all
compiles.

2024-06-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/115389
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Compute
rop_hash_save_offset for non-Altivec compiles.

gcc/testsuite
PR target/115389
* gcc.target/powerpc/pr115389.c: New test.

(cherry picked from commit c70eea0dba5f223d49c80cfb3e80e87b74330aac)

rs6000: Update ELFv2 stack frame comment showing the correct ROP save location

The ELFv2 stack frame layout comment in rs6000-logue.cc shows the ROP
hash save slot in the wrong location. Update the comment to show the
correct ROP hash save location in the frame.

2024-06-07 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Update comment.

(cherry picked from commit e91cf26a954a5c1bf431e36f3a1e69f94e9fa4fe)

Daily bump.

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

[powerpc] [testsuite] reorder dg directives [PR106069]

The dg-do directive appears after dg-require-effective-target in
g++.target/powerpc/pr106069.C.  That doesn't work the way that was
presumably intended.  Both of these directives set dg-do-what, but
dg-do does so fully and unconditionally, overriding any decisions
recorded there by earlier directives.  Reorder the directives more
canonically, so that both take effect.

for  gcc/testsuite/ChangeLog

PR target/106069
* g++.target/powerpc/pr106069.C: Reorder dg directives.

(cherry picked from commit ad65caa332bc7600caff6b9b5b29175b40d91e67)

Daily bump.

[PR115565] cse: Don't use a valid regno for non-register in comparison_qty

Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.

Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.

This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.

The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
<https://gcc.gnu.org/ml/gcc-patches/2004-10/msg02027.html>, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
<https://gcc.gnu.org/ml/gcc-patches/2006-12/msg00431.html>, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore. However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.

Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.

gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.

(cherry picked from commit 69bc5fb97dc3fada81869e00fa65d39f7def6acf)

Daily bump.

Fortran: character array constructor with >= 4 constant elements [PR103115]

gcc/fortran/ChangeLog:

PR fortran/103115
* trans-array.cc (gfc_trans_array_constructor_value): If the first
element of an array constructor is deferred-length character and
therefore does not have an element size known at compile time, do
not try to collect subsequent constant elements into a constructor
for optimization.

gcc/testsuite/ChangeLog:

PR fortran/103115
* gfortran.dg/string_array_constructor_4.f90: New test.

(cherry picked from commit c93be1606ecf8e0f65b96b67aa023fb456ceb3a3)

Daily bump.

Avoid undefined behaviour in build_option_suggestions

The inner loop in build_option_suggestions uses OPTION to take the
address of OPTB and use it across iterations, which is undefined
behaviour since OPTB is defined within the loop. Pull it outside the
loop to make this defined.

gcc/ChangeLog:

* opt-suggestions.cc
(option_proposer::build_option_suggestions): Pull OPTB
definition out of the innermost loop.

(cherry picked from commit e0d997e913f811ecf4b3e10891e6a4aab5b38a31)

rs6000: Fix .machine cpu selection w/ altivec [PR97367]

There are various non-IBM CPUs with altivec, so we cannot use that
flag to determine which .machine cpu to use, so ignore it.
Emit an additional ".machine altivec" if Altivec is enabled so
that the assembler doesn't require an explicit -maltivec option
to assemble any Altivec instructions for those targets where
the ".machine cpu" is insufficient to enable Altivec.  For example,
-mcpu=G5 emits a ".machine power4".

2024-07-18  René Rebe  <rene@exactcode.de>
    Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/97367
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Do not consider
OPTION_MASK_ALTIVEC.
(emit_asm_machine): For Altivec compiles, emit a ".machine altivec".

gcc/testsuite/
PR target/97367
* gcc.target/powerpc/pr97367.c: New test.

Signed-off-by: René Rebe <rene@exactcode.de>
(cherry picked from commit 6962835bca3e6bef0f6ceae84a7814138b08b8a5)

s390: Fix unresolved iterators bhfgq and xdee

Code attribute bhfgq is missing a mapping for TF. This results in
unresolved iterators in assembler templates for *bswaptf.

With the TF mapping added the base mnemonics vlbr and vstbr are not
"used" anymore but only the extended mnemonics (vlbr<bhfgq> was
interpreted as vlbr; likewise for vstbr). Therefore, remove the base
mnemonics from the scheduling description, otherwise, genattrtab would
error about unknown mnemonics.

Likewise, for movtf_vr only the extended mnemonics for vrepi are used,
now, which means the base mnemonic is "unused" and has to be removed
from the scheduling description.

Similarly, we end up with unresolved iterators in assembler templates
for mulfprx23 since code attribute xdee is missing a mapping for FPRX2.

Note, this is basically a cherry pick of commit r15-2060-ga4abda934aa426
with the addition that vrepi is removed from the scheduling description,
too.

gcc/ChangeLog:

* config/s390/3931.md (vlbr, vstbr, vrepi): Remove.
* config/s390/s390.md (xdee): Add FPRX2 mapping.
* config/s390/vector.md (bhfgq): Add TF mapping.

Daily bump.

Do not use caller-saved registers for COMDAT functions

A reference to a COMDAT function may be resolved to another definition
outside the current translation unit, so it's not eligible for `-fipa-ra`.

In `decl_binds_to_current_def_p()` there is already a check for weak
symbols. This commit checks for COMDAT functions that are not implemented
as weak symbols, for example, on *-*-mingw32.

gcc/ChangeLog:

PR rtl-optimization/115049
* varasm.cc (decl_binds_to_current_def_p): Add a check for COMDAT
declarations too, like weak ones.

(cherry picked from commit 5080840d8fbf25a321dd27543a1462d393d338bc)

Daily bump.

alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_<tls>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

(cherry picked from commit 0841fd4c42ab053be951b7418233f0478282d020)

Daily bump.

s390: Fix output template for movv1qi

Although for instructions MVI and MVIY it does not make a difference
whether the immediate is interpreted as signed or unsigned, GAS expects
unsigned immediates for instruction format SI_URD.

gcc/ChangeLog:

* config/s390/vector.md (mov<mode>): Fix output template for
movv1qi.

(cherry picked from commit e6680d3f392f7f7cc2a1515276213e21e9eeab1c)

s390: Align *cjump_64 and *icjump_64

During machine reorg we optimize backward jumps and transform insns as
e.g.

(jump_insn 118 117 119 (set (pc)
        (if_then_else (ne (reg:CCRAW 33 %cc)
                (const_int 8 [0x8]))
            (label_ref 134)
            (pc))) "dec_math_1.f90":204:8 discrim 1 2161 {*cjump_64}
     (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
        (int_list:REG_BR_PROB 719407028 (nil)))
-> 134)

into

(jump_insn 118 117 432 (set (pc)
        (if_then_else (ne (reg:CCRAW 33 %cc)
                (const_int 8 [0x8]))
            (pc)
            (label_ref 433))) "dec_math_1.f90":204:8 discrim 1 -1
     (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
        (int_list:REG_BR_PROB 719407028 (nil)))
-> 433)

The latter is not recognized anymore since *icjump_64 only matches
CC_REGNUM against zero.  Fixed by aligning *cjump_64 and *icjump_64.

gcc/ChangeLog:

* config/s390/s390.md (*icjump_64): Allow raw CC comparisons,
i.e., any constant integer between 0 and 15 for CC comparisons.

(cherry picked from commit 56de68aba6cb9cf3022d9e303eec6c6cdb49ad4d)

Daily bump.

Fix SSA_NAME leak due to def_stmt is removed before use_stmt.

-  _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0);
-  # DEBUG old => (long int) _5
+  _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0, __atomic_fetch_or_8);
+  # DEBUG old => NULL
   # DEBUG BEGIN_STMT
-  # DEBUG D#2 => _5 & 1
+  # DEBUG D#2 => NULL
...
-  _10 = ~_5;
-  _8 = (_Bool) _10;
-  # DEBUG ret => _8
+  _8 = _6 == 0;
+  # DEBUG ret => (_Bool) _10

confirmed.  convert_atomic_bit_not does this, it checks for single_use
and removes the def, failing to release the name (which would fix this up
IIRC).

Note the function removes stmts in "wrong" order (before uses of LHS
are removed), so it requires larger surgery.  And it leaks SSA names.

gcc/ChangeLog:

PR target/115872
* tree-ssa-ccp.cc (convert_atomic_bit_not): Remove use_stmt after use_nop_stmt is removed.
(optimize_atomic_bit_test_and): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115872.c: New test.

(cherry picked from commit a8209237dc46dc4db7d9d8e3807e6c93734c64b5)

Daily bump.

mve: Fix vsetq_lane for 64-bit elements with lane 1 [PR 115611]

This patch fixes the backend pattern that was printing the wrong input
scalar register pair when inserting into lane 1.

Added a new test to force float-abi=hard so we can use scan-assembler to check
correct codegen.

gcc/ChangeLog:

PR target/115611
* config/arm/mve.md (mve_vec_setv2di_internal): Fix printing of input
scalar register pair when lane = 1.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vsetq_lane_su64.c: New test.

(cherry picked from commit 7c11fdd2cc11a7058e9643b6abf27831970ad2c9)

Daily bump.

middle-end: Fix stalled swapped condition code value [PR115836]

emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.

Move calculation of scode value just before its only usage site to
avoid stalled scode value.

PR middle-end/115836

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.

(cherry picked from commit 44933fdeb338e00c972e42224b9a83d3f8f6a757)

Daily bump.

libstdc++: Fix _Atomic(T) macro in <stdatomic.h> [PR115807]

The definition of the _Atomic(T) macro needs to refer to ::std::atomic,
not some other std::atomic relative to the current namespace.

libstdc++-v3/ChangeLog:

PR libstdc++/115807
* include/c_compatibility/stdatomic.h (_Atomic): Ensure it
refers to std::atomic in the global namespace.
* testsuite/29_atomics/headers/stdatomic.h/115807.cc: New test.

(cherry picked from commit 40d234dd6439e8c8cfbf3f375a61906aed35c80d)

Daily bump.

AArch64: Fix strict-align cpymem/setmem [PR103100]

The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.

gcc/ChangeLog/
PR target/103100
* config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
(setmemdi): Likewise.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
strict-align. Cleanup condition for using MOPS.
(aarch64_expand_setmem): Likewise.

(cherry picked from commit 318f5232cfb3e0c9694889565e1f5424d0354463)

AVR: target/87376 - Use nop_general_operand for DImode inputs.

The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference. Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.

This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.

PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.

gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.

(cherry picked from commit 23a0935262d6817097406578b1c70563f424804b)

Daily bump.

aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macro

The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/

PR target/115475
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.

gcc/testsuite/

PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit 6492c7130d6ae9992298fc3d072e2589d1131376)

aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro

The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
<arm_bf16.h> header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.

This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/

PR target/115457
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.

gcc/testsuite/

PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit c10942134fa759843ac1ed1424b86fcb8e6368ba)

Daily bump.

hppa: Fix ICE caused by mismatched predicate and constraint in xmpyu patterns

2024-06-30 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/115691
* config/pa/pa.md: Remove incorrect xmpyu patterns.

AVR: target/98762 - Handle partial clobber in movqi output.

PR target/98762
gcc/
* config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly
restore the base register when it is partially clobbered.
gcc/testsuite/
* gcc.target/avr/torture/pr98762.c: New test.

(cherry picked from commit e9fb6efa1cf542353fd44ddcbb5136344c463fd0)

rs6000: Fix wrong RTL patterns for vector merge high/low short on LE

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-2.c: New test.

(cherry picked from commit 812c70bf4981958488331d4ea5af8709b5321da1)

rs6000: Fix wrong RTL patterns for vector merge high/low char on LE

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-1.c is a typical example for this issue.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-1.c: New test.

(cherry picked from commit 62520e4e9f7e2fe8a16ee57a4bd35da2e921ae22)

Daily bump.

AVR: target/88236, target/115726 - Fix __memx code generation.

PR target/88236
PR target/115726
gcc/
* config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a
way that the destination does not overlap with any hard register
clobbered / used by xload8qi_A resp. xload<mode>_A.
* config/avr/avr.cc (avr_out_xload): Avoid early-clobber
situation for Z by executing just one load when the output register
overlaps with Z.
gcc/testsuite/
* gcc.target/avr/torture/pr88236-pr115726.c: New test.

(cherry picked from commit 3d23abd3dd9c8c226ea302203b214b346f4fe8d7)

Daily bump.

rs6000: Fix wrong RTL patterns for vector merge high/low word on LE

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low word, which are altivec_vmrg[hl]w,
vsx_xxmrg[hl]w_<VSX_W:mode>.  These defines are mainly for
built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si and some internal gen function
needs.  These functions should consider endianness, taking
vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges
the first halves (in element order) of two vectors", it does
note it's in element order.  So it's mapped into vmrghw on
BE while vmrglw on LE respectively.  Although the mapped
insns are different, as the discussion in PR106069, the RTL
pattern should be still the same, it is conformed before
commit r12-4496, define_expand altivec_vmrghw got expanded
into:

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 0) (const_int 4)
                   (const_int 1) (const_int 5)])))]

on both BE and LE then.  But commit r12-4496 changed it to
expand into:

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 0) (const_int 4)
                   (const_int 1) (const_int 5)])))]

on BE, and

  (vec_select:VSX_W
     (vec_concat:<VS_double>
        (match_operand:VSX_W 1 "register_operand" "wa,v")
        (match_operand:VSX_W 2 "register_operand" "wa,v"))
        (parallel [(const_int 2) (const_int 6)
                   (const_int 3) (const_int 7)])))]

on LE, although the mapped insn are still vmrghw on BE and
vmrglw on LE, the associated RTL pattern is completely
wrong and inconsistent with the mapped insn.  If optimization
passes leave this pattern alone, even if its pattern doesn't
represent its mapped insn, it's still fine, that's why simple
testing on bif doesn't expose this issue.  But once some
optimization pass such as combine does some changes basing
on this wrong pattern, because the pattern doesn't match the
semantics that the expanded insn is intended to represent,
it would cause the unexpected result.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghw expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghw_direct_<VSX_W:mode>): Rename
to ...
(altivec_vmrghw_direct_<VSX_W:mode>_be): ... this.  Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrghw_direct_<VSX_W:mode>_le): New define_insn.
(altivec_vmrglw_direct_<VSX_W:mode>): Rename to ...
(altivec_vmrglw_direct_<VSX_W:mode>_be): ... this.  Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrglw_direct_<VSX_W:mode>_le): New define_insn.
(altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be
for BE and gen_altivec_vmrglw_direct_v4si_le for LE.
(altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be
for BE and gen_altivec_vmrghw_direct_v4si_le for LE.
(vec_widen_umult_hi_v8hi): Adjust the call to
gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE
and by gen_altivec_vmrglw for LE.
(vec_widen_smult_hi_v8hi): Likewise.
(vec_widen_umult_lo_v8hi): Adjust the call to
gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE
and by gen_altivec_vmrghw for LE
(vec_widen_smult_lo_v8hi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghw_direct_v4si by
CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and
CODE_FOR_altivec_vmrghw_direct_v4si_le for LE.  And replace
CODE_FOR_altivec_vmrglw_direct_v4si by
CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and
CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
* config/rs6000/vsx.md (vsx_xxmrghw_<VSX_W:mode>): Adjust by calling
gen_altivec_vmrghw_direct_v4si_be for BE and
gen_altivec_vmrglw_direct_v4si_le for LE.
(vsx_xxmrglw_<VSX_W:mode>): Adjust by calling
gen_altivec_vmrglw_direct_v4si_be for BE and
gen_altivec_vmrghw_direct_v4si_le for LE.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr106069.C: New test.
* gcc.target/powerpc/pr115355.c: New test.

(cherry picked from commit 52c112800d9f44457c4832309a48c00945811313)

Daily bump.

[libstdc++] [testsuite] defer to check_vect_support* [PR115454]

The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.

Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.

for libstdc++-v3/ChangeLog

PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.

(cherry picked from commit 95faa1bea7bdc7f92fcccb3543bfcbc8184c5e5b)

Add support for -mcpu=grace

This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.

This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.

Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/

* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

Daily bump.

libstdc++: Remove confusing text from status tables for release branch

When I tried to make the release branch versions of these docs refer to
the release branch instead of "mainline GCC", for some reason I left the
text "not any particular release" there. That's just confusing, because
the docs are for a particular release, the latest on that branch. Remove
that confusing text in several places.

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2023.xml: Change reference from
mainline GCC to the release branch.
* doc/xml/manual/status_cxx1998.xml: Remove confusing "not in
any particular release" text.
* doc/xml/manual/status_cxx2011.xml: Likewise.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/xml/manual/status_cxx2020.xml: Likewise.
* doc/xml/manual/status_cxxtr1.xml: Likewise.
* doc/xml/manual/status_cxxtr24733.xml: Likewise.
* doc/html/manual/status.html: Regenerate.

Daily bump.

rs6000: Don't clobber return value when eh_return called [PR114846]

As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value. Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.

PR target/114846

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr114846.c: New test.

(cherry picked from commit e5fc5d42d25c86ae48178db04ce64d340a834614)

Daily bump.

libstdc++: Fix test on x86_64 and non-simd targets

* Running a test compiled with AVX512 instructions requires
avx512f_runtime not just avx512f.

* The 'reduce2' test violated an invariant of fixed_size_simd_mask and
thus failed on all targets without 16-Byte vector builtins enabled (in
bits/simd.h).

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/115575
* testsuite/experimental/simd/pr115454_find_last_set.cc: Require
avx512f_runtime. Don't memcpy fixed_size masks.

(cherry picked from commit 77f321435b4ac37992c2ed6737ca0caa1dd50551)

libstdc++: Fix find_last_set(simd_mask) to ignore padding bits

With the change to the AVX512 find_last_set implementation, the change
to AVX512 operator!= is unnecessary. However, the latter was not
producing optimal code and unnecessarily set the padding bits. In
theory, the compiler could determine that with the new !=
implementation, the bit operation for clearing the padding bits is a
no-op and can be elided.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/115454
* include/experimental/bits/simd_x86.h (_S_not_equal_to): Use
neq comparison instead of bitwise negation after eq.
(_S_find_last_set): Clear unused high bits before computing
bit_width.
* testsuite/experimental/simd/pr115454_find_last_set.cc: New
test.

(cherry picked from commit 1340ddea0158de3f49aeb75b4013e5fc313ff6f4)

Daily bump.

libstdc++: Fix simd<char> conversion for -fno-signed-char for Clang

The special case for Clang in the trait producing a signed integer type
lead to the trait returning 'char' where it should have been 'signed
char'. This workaround was introduced because on Clang the return type
of vector compares was not convertible to '_SimdWrapper<
__int_for_sizeof_t<...' unless '__int_for_sizeof_t<char>' was an alias
for 'char'. In order to not rewrite the complete mask type code (there
is code scattered around the implementation assuming signed integers),
this needs to be 'signed char'; so the special case for Clang needs to
be removed.
The conversion issue is now solved in _SimdWrapper, which now
additionally allows conversion from vector types with compatible
integral type.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/115308
* include/experimental/bits/simd.h (__int_for_sizeof): Remove
special cases for __clang__.
(_SimdWrapper): Change constructor overload set to allow
conversion from vector types with integral conversions via bit
reinterpretation.

(cherry picked from commit 8e36cf4c5c9140915d0019999db132a900b48037)

libstdc++: Avoid MMX return types from __builtin_shufflevector

This resolves a regression on i686 that was introduced with
r15-429-gfb1649f8b4ad50.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/115247
* include/experimental/bits/simd.h (__as_vector): Don't use
vector_size(8) on __i386__.
(__vec_shuffle): Never return MMX vectors, widen to 16 bytes
instead.
(concat): Fix padding calculation to pick up widening logic from
__as_vector.

(cherry picked from commit 241a6cc88d866fb36bd35ddb3edb659453d6322e)

libstdc++: Use __builtin_shufflevector for simd split and concat

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:

PR libstdc++/114958
* include/experimental/bits/simd.h (__as_vector): Return scalar
simd as one-element vector. Return vector from single-vector
fixed_size simd.
(__vec_shuffle): New.
(__extract_part): Adjust return type signature.
(split): Use __extract_part for any split into non-fixed_size
simds.
(concat): If the return type stores a single vector, use
__vec_shuffle (which calls __builtin_shufflevector) to produce
the return value.
* include/experimental/bits/simd_builtin.h
(__shift_elements_right): Removed.
(__extract_part): Return single elements directly. Use
__vec_shuffle (which calls __builtin_shufflevector) to for all
non-trivial cases.
* include/experimental/bits/simd_fixed_size.h (__extract_part):
Return single elements directly.
* testsuite/experimental/simd/pr114958.cc: New test.

(cherry picked from commit fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9)

diagnostics: Fix add_misspelling_candidates [PR115440]

The option_map array for most entries contains just non-NULL opt0
    { "-Wno-", NULL, "-W", false, true },
    { "-fno-", NULL, "-f", false, true },
    { "-gno-", NULL, "-g", false, true },
    { "-mno-", NULL, "-m", false, true },
    { "--debug=", NULL, "-g", false, false },
    { "--machine-", NULL, "-m", true, false },
    { "--machine-no-", NULL, "-m", false, true },
    { "--machine=", NULL, "-m", false, false },
    { "--machine=no-", NULL, "-m", false, true },
    { "--machine", "", "-m", false, false },
    { "--machine", "no-", "-m", false, true },
    { "--optimize=", NULL, "-O", false, false },
    { "--std=", NULL, "-std=", false, false },
    { "--std", "", "-std=", false, false },
    { "--warn-", NULL, "-W", true, false },
    { "--warn-no-", NULL, "-W", false, true },
    { "--", NULL, "-f", true, false },
    { "--no-", NULL, "-f", false, true }
and so add_misspelling_candidates works correctly for it, but 3 out of
these,
    { "--machine", "", "-m", false, false },
    { "--machine", "no-", "-m", false, true },
and
    { "--std", "", "-std=", false, false },
use non-NULL opt1.  That says that
--machine foo
should map to
-mfoo
and
--machine no-foo
should map to
-mno-foo
and
--std c++17
should map to
-std=c++17
add_misspelling_canidates was not handling this, so it hapilly
registered say
--stdc++17
or
--machineavx512
(twice) as spelling alternatives, when those options aren't recognized.
Instead we support
--std c++17
or
--machine avx512
--machine no-avx512

The following patch fixes that.  On this particular testcase, we no longer
suggest anything, even when among the suggestion is say that
--std c++17
or
-std=c++17
etc.

2024-06-17  Jakub Jelinek  <jakub@redhat.com>

PR driver/115440
* opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL,
add a space and opt1 to the alternative suggestion text.

* g++.dg/cpp1z/pr115440.C: New test.

(cherry picked from commit 96db57948b50f45235ae4af3b46db66cae7ea859)

c-family: Fix -Warray-compare warning ICE [PR115290]

The warning code uses %D to print the ARRAY_REF first operands.
That works in the most common case where those operands are decls, but
as can be seen on the following testcase, they can be other expressions
with array type.
Just changing %D to %E isn't enough, because then the diagnostics can
suggest something like
note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0]' to compare the addresses
which is a bad suggestion, the %E printing doesn't know that the
warning code will want to add & before it and [0] after it.
So, the following patch adds ()s around the operand as well, but does
that only for non-decls, for decls keeps it as &arr[0] like before.

2024-06-17 Jakub Jelinek <jakub@redhat.com>

PR c/115290
* c-warn.cc (do_warn_array_compare): Use %E rather than %D for
printing op0 and op1; if those operands aren't decls, also print
parens around them.

* c-c++-common/Warray-compare-3.c: New test.

(cherry picked from commit b63c7d92012f92e0517190cf263d29bbef8a06bf)

Bump BASE-VER.

2024-06-20 Jakub Jelinek <jakub@redhat.com>

* BASE-VER: Set to 14.4.1.

Update ChangeLog and version files for release

Daily bump.