git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

Fortran: fix for absent array argument passed to optional dummy [PR101135]

gcc/fortran/ChangeLog:

PR fortran/101135
* trans-array.cc (gfc_get_dataptr_offset): Check for optional
arguments being present before dereferencing data pointer.

gcc/testsuite/ChangeLog:

PR fortran/101135
* gfortran.dg/missing_optional_dummy_6a.f90: Adjust diagnostic pattern.
* gfortran.dg/ubsan/missing_optional_dummy_8.f90: New test.

Fortran: no size check passing NULL() without MOLD argument [PR55978]

gcc/fortran/ChangeLog:

PR fortran/55978
* interface.cc (gfc_compare_actual_formal): Skip size check for
NULL() actual without MOLD argument.

gcc/testsuite/ChangeLog:

PR fortran/55978
* gfortran.dg/null_actual_5.f90: New test.

Fortran: fix FE memleak

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_get_nodesc_array_type): Clear used gmp
variables.

Daily bump.

Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.

Also fixed a typo in the testcase.

gcc/testsuite/ChangeLog:

PR tree-optimization/114396
* gcc.target/i386/pr114396.c: Move to...
* gcc.c-torture/execute/pr114396.c: ...here.

(cherry picked from commit 9a6c7aa1b011b77fcd9b19f7b8d7ff0fc823cdb2)

Daily bump.

tree-optimization/114231 - use patterns for BB SLP discovery root stmts

The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery. We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.

PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.

* gcc.dg/vect/pr114231.c: New testcase.

tree-optimization/112793 - SLP of constant/external code-generated twice

The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.

PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.

* g++.dg/vect/pr112793.cc: New testcase.

(cherry picked from commit d782ec8362eadc3169286eb1e39c631effd02323)

tree-optimization/113670 - gather/scatter to/from hard registers

The following makes sure we're not taking the address of hard
registers when vectorizing appearant gathers or scatters to/from
them.

PR tree-optimization/113670
* tree-vect-data-refs.cc (vect_check_gather_scatter):
Make sure we can take the address of the reference base.

* gcc.target/i386/pr113670.c: New testcase.

(cherry picked from commit 924137b9012cee5603482242de08fbf0b2030f6a)

middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs

The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers. This can avoid some ICEs later and create
better code.

PR middle-end/113622
* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
Also allow DECL_HARD_REGISTER variables.

* gcc.target/i386/pr113622-1.c: New testcase.

(cherry picked from commit 96bc048d78f804bac0fa7b2ca3b6dd3a04c68217)

tree-optimization/114203 - wrong CLZ niter computation

For precision less than int we apply the adjustment to make it defined
at zero after the adjustment to make it compute CLZ rather than CTZ.
That's wrong.

PR tree-optimization/114203
* tree-ssa-loop-niter.cc (build_cltz_expr): Apply CTZ->CLZ
adjustment before making the result defined at zero.

* gcc.dg/torture/pr114203.c: New testcase.

(cherry picked from commit cde50296a19b109909089b91d532d2c8455f5f10)

middle-end/114070 - VEC_COND_EXPR folding

The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering. This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via vcond{,u,eq}, most
susceptible is the loop unrolling pass which applies VN and thus
possibly folding to the unrolled body of a vectorized loop.

This gets back the folding for targets that cannot do vectorization.
It doesn't get back the folding for x86 with AVX512 for example
since that can handle the original IL but not the folded since
it misses some vcond_mask expanders.

PR middle-end/114070
* match.pd ((c ? a : b) op d --> c ? (a op d) : (b op d)):
Allow the folding if before lowering and the current IL
isn't supported with vcond_mask.

(cherry picked from commit f9c30ea737b806caac917d8f501305151a2cbd57)

middle-end/114070 - folding breaking VEC_COND expansion

The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.

This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.

PR middle-end/114070
* genmatch.cc (parser::parse_c_expr): Do not record operand
lists but only mark operators used.
* match.pd ((c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e)):
Properly guard the case of tcc_comparison changing the VEC_COND
value operand type.

* gcc.dg/torture/pr114070.c: New testcase.

(cherry picked from commit af66ad89e8169f44db723813662917cf4cbb78fc)

tree-optimization/114027 - conditional reduction chain

When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.

PR tree-optimization/114027
* tree-vect-loop.cc (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.

* gcc.dg/vect/pr114027.c: New testcase.

(cherry picked from commit 549f251f055e3a0b0084189a3012c4f15d635e75)

tree-optimization/113910 - huge compile time during PTA

For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members. This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits.

The major problem with it is that it simply truncates the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits.

This reduces the compile-time for the testcase from tens of minutes
to 42 seconds and PTA time from 99% to 46%.

PR tree-optimization/113910
* bitmap.cc (bitmap_hash): Mix the full element "hash" to
the hashval_t hash.

(cherry picked from commit ad7a365aaccecd23ea287c7faaab9c7bd50b944a)

debug/112718 - reset all type units with -ffat-lto-objects

When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.

PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.

* gcc.dg/debug/pr112718.c: New testcase.

(cherry picked from commit 7218f5050cb7163edae331f54ca163248ab48bfa)

tree-optimization/111736 - avoid address sanitizing of __seg_gs

The following more thoroughly avoids address sanitizing accesses
to non-generic address-spaces.

PR tree-optimization/111736
* asan.cc (instrument_derefs): Do not instrument accesses
to non-generic address-spaces.

* gcc.target/i386/pr111736.c: New testcase.

(cherry picked from commit 134ef2a8cac1a5cc718739bd7d3b3472947c80d6)

Fix runtime error for nonlinear iv vectorization(step_mult).

wi::from_mpz doesn't take a sign argument, we want it to be wrapped
instead of saturation, so pass utype and true to it, and it fixes the
bug.

gcc/ChangeLog:

PR tree-optimization/114396
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Pass utype
and true to wi::from_mpz.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114396.c: New test.

(cherry picked from commit ac2f8c2a367151fc0410f904339c475a953cffc8)

rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di [PR113950]

When we expand the __builtin_vsx_splat_2di built-in, we were allowing immediate
value for second operand which causes an unrecognizable insn ICE. Even though
the immediate value was forced into a register, it wasn't correctly assigned
to the second operand. So corrected the assignment of op1 to operands[1].

2024-03-07 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_<mode>): Correct assignment to operand1
and simplify else if with else.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

(cherry picked from commit fa0468877869f52b05742de6deef582e4dd296fc)

Daily bump.

libstdc++: [_GLIBCXX_DEBUG] Define __cpp_lib_null_iterators

_GLIBCXX_DEBUG has now fully N3344 compliant iterator checks, we can define
__cpp_lib_null_iterators macros like the normal mode.

libstdc++-v3/ChangeLog:

* include/std/iterator (__cpp_lib_null_iterators): Define regardless of
_GLIBCXX_DEBUG.
* include/std/version (__cpp_lib_null_iterators): Likewise.

libstdc++: Fix N3344 behavior on _Safe_iterator::_M_can_advance

We shall be able to advance from a 0 offset a value-initialized iterator.

libstdc++-v3/ChangeLog:

* include/debug/safe_iterator.tcc (_Safe_iterator<>::_M_can_advance):
Accept 0 offset advance on value-initialized iterator.
* testsuite/23_containers/vector/debug/n3644.cc: New test case.

(cherry picked from commit dda96a9d942d73a587e174dd5efe061208a195af)

libstdc++: Fix _Safe_local_iterator<>::_M_valid_range

Unordered container local_iterator range shall not contain any singular
iterator unless both iterators are both value-initialized.

libstdc++-v3/ChangeLog:

* include/debug/safe_local_iterator.tcc
(_Safe_local_iterator::_M_valid_range): Add _M_value_initialized and
_M_singular checks.
* testsuite/23_containers/unordered_set/debug/114316.cc: New test case.

(cherry picked from commit 5f6e0853c30fec72d977afaa6f7a5633a8d910be)

Daily bump.

Fortran: fix IS_CONTIGUOUS for polymorphic dummy arguments [PR114001]

gcc/fortran/ChangeLog:

PR fortran/114001
* expr.cc (gfc_is_simply_contiguous): Adjust logic so that CLASS
symbols are also handled.

gcc/testsuite/ChangeLog:

PR fortran/114001
* gfortran.dg/is_contiguous_4.f90: New test.

(cherry picked from commit 11caf47b599568c6c6f5a12cf8e21f50778176d3)

Fortran: error recovery in frontend optimization [PR103715]

gcc/fortran/ChangeLog:

PR fortran/103715
* frontend-passes.cc (check_externals_expr): Prevent invalid read
in case of mismatch of external subroutine with function.

gcc/testsuite/ChangeLog:

PR fortran/103715
* gfortran.dg/pr103715.f90: New test.

(cherry picked from commit 3be2b8f475f22c531d6cef1b041c0573b3ea5133)

i386: Unify {general,timode}_scalar_chain::convert_op [PR111822]

Recent PR111822 fix implemented REG_EH_REGION note copying to a STV converted
preload instruction in general_scalar_chain::convert_op. However, the same
issue remains in timode_scalar_chain::convert_op. Instead of copying the
newly introduced code to timode_scalar_chain::convert_op, the patch unifies
both functions to a common function.

PR target/111822

gcc/ChangeLog:

* config/i386/i386-features.cc (smode_convert_cst): New function
to handle SImode, DImode and TImode immediates.
(scalar_chain::convert_op): Unify from
general_scalar_chain::convert_op and timode_scalar_chain::convert_op.
(general_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_op): Remove.
* config/i386/i386-features.h (class scalar_chain):
Redeclare convert_op as protected class member.
(class general_calar_chain): Remove convert_op.
(class timode_scalar_chain): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C (dg-do): Compile only for ia32 targets.
(dg-options): Add -march=x86-64.

Daily bump.

libstdc++: Correct notes about std::call_once in manual [PR66146]

The bug with exceptions thrown during a std::call_once call affects all
targets, so fix the docs that say it only affects non-Linux targets.

libstdc++-v3/ChangeLog:

PR libstdc++/66146
* doc/xml/manual/status_cxx2011.xml: Remove mention of Linux in
note about std::call_once.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/html/manual/status.html: Regenerate.

(cherry picked from commit e6836bbbd7a01af0791c02087e568b4822418c0d)

libstdc++: Move test error_category to global scope

A recent GDB change causes this test to fail due to missing RTTI for the
custom_cast type. This is presumably because the custom_cat type was
defined as a local class, so has no linkage. Moving it to local scope
seems to fix the test regressions, and probably makes the test more
realistic as a local class with no linkage isn't practical to use as an
error category that almost certainly needs to be referred to in other
scopes.

libstdc++-v3/ChangeLog:

* testsuite/libstdc++-prettyprinters/cxx11.cc: Move custom_cat
to namespace scope.

(cherry picked from commit a8c7c3a40953e34f57278d224a07dc3698c64a84)

riscv: xtheadmempair: Fix CFA reg notes

The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.

gcc/ChangeLog:

PR target/114160
* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.

(cherry picked from commit 93973e4c5d3bcde1f84cad3b42a8c36e23900d19)

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Daily bump.

libstdc++: Implement N3644 on _Safe_iterator<> [PR114316]

Consider range of value-initialized iterators as valid and empty.

libstdc++-v3/ChangeLog:

PR libstdc++/114316
* include/debug/safe_iterator.tcc (_Safe_iterator<>::_M_valid_range):
First check if both iterators are value-initialized before checking if
singular.
* testsuite/23_containers/set/debug/114316.cc: New test case.
* testsuite/23_containers/vector/debug/114316.cc: New test case.

(cherry picked from commit 07fad7a7fc245369989e9ca746728ea78b924715)

Daily bump.

libstdc++: Simplify chrono::__units_suffix using std::format

For std::chrono formatting we can simplify __units_suffix by using
std::format_to to generate the "[n/m]s" suffix with the correct
character type and write directly to the output iterator, so it doesn't
need to be widened using ctype. We can't remove the use of ctype::widen
for formatting a time zone abbreviation as a wide string, because that
can contain arbitrary characters that can't be widened by
__to_wstring_numeric.

This also fixes a bug in the chrono formatter for %Z which created a
dangling wstring_view.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__units_suffix_misc): Remove.
(__units_suffix): Return a known suffix as string view, do not
write unknown suffixes to a buffer.
(__fmt_units_suffix): New function that formats the suffix using
std::format_to.
(operator<<, __chrono_formatter::_M_q): Use __fmt_units_suffix.
(__chrono_formatter::_M_Z): Correct lifetime of wstring.

(cherry picked from commit c992acdc6774ef3d566fab5f324d254bed1b9d4b)

libstdc++: Add missing std::tuple constructor [PR114147]

I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.

libstdc++-v3/ChangeLog:

PR libstdc++/114147
* include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)):
Add missing overload of allocator-extended default constructor.
(tuple<T1,T2>::tuple(allocator_arg_t, const Alloc&)): Likewise.
* testsuite/20_util/tuple/cons/114147.cc: New test.

(cherry picked from commit 0a545ac7000501844670add0b3560ebdbcb123c6)

Daily bump.

gimple-iterator: Some gsi_safe_insert_*before fixes

When trying to use the gsi_safe_insert*before APIs in bitint lowering,
I've discovered 3 issues and the following patch addresses those:

1) both split_block and split_edge update CDI_DOMINATORS if they are
   available, but because edge_before_returns_twice_call first splits
   and then adds an extra EDGE_ABNORMAL edge and then removes another
   one, the immediate dominators of both the new bb and the bb with
   returns_twice call need to change
2) the new EDGE_ABNORMAL edge had uninitialized probability; this patch
   copies the probability from the edge that is going to be removed
   and similarly copies other flags (EDGE_EXECUTABLE, EDGE_DFS_BACK,
   EDGE_IRREDUCIBLE_LOOP etc.)
3) if edge_before_returns_twice_call splits a block, then the bb with
   returns_twice call changes, so the gimple_stmt_iterator for it is
   no longer accurate, it points to the right statement, but gsi_bb
   and gsi_seq are no longer correct; the patch updates it

2024-03-14  Jakub Jelinek  <jakub@redhat.com>

* gimple-iterator.cc (edge_before_returns_twice_call): Copy all
flags and probability from ad_edge to e edge.  If CDI_DOMINATORS
are computed, recompute immediate dominator of other_edge->src
and other_edge->dest.
(gsi_safe_insert_before, gsi_safe_insert_seq_before): Update *iter
for the returns_twice call case to the gsi_for_stmt (stmt) to deal
with update it for bb splitting.

(cherry picked from commit 8f6e0814b4bfd0a399055e9214562aebfcd902ad)

asan: Fix ICE during instrumentation of returns_twice calls [PR112709]

The following patch on top of the previously posted ubsan/gimple-iterator
one handles asan the same. While the case of returning by hidden reference
is handled differently because of the first recently posted asan patch,
this deals with instrumentation of the aggregates returned in registers
case as well as instrumentation of loads from aggregate memory in the
function arguments of returns_twice calls.

2024-03-13 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/112709
* asan.cc (maybe_create_ssa_name, maybe_cast_to_ptrmode,
build_check_stmt, maybe_instrument_call, asan_expand_mark_ifn): Use
gsi_safe_insert_before instead of gsi_insert_before.

* gcc.dg/asan/pr112709-2.c: New test.

(cherry picked from commit 6586359e8e4c611dd96129b5d4f24023949ac3fc)

gimple-iterator, ubsan: Fix ICE during instrumentation of returns_twice calls [PR112709]

ubsan, asan (both PR112709) and _BitInt lowering (PR113466) want to
insert some instrumentation or adjustment statements before some statement.
This unfortunately creates invalid IL if inserting before a returns_twice
call, because we require that such calls are the first statement in a basic
block and that we have an edge from the .ABNORMAL_DISPATCHER block to
the block containing the returns_twice call (in addition to other edge(s)).

The following patch adds helper functions for such insertions and uses it
for now in ubsan (I'll post a follow up which uses it in asan and will
work later on the _BitInt lowering PR).

In particular, if the bb with returns_twice call at the start has just
2 edges, one EDGE_ABNORMAL from .ABNORMAL_DISPATCHER and another
(non-EDGE_ABNORMAL/EDGE_EH) from some other bb, it just inserts the
statement or sequence on that other edge.
If the bb has more predecessor edges or the one not from
.ABNORMAL_DISPATCHER is e.g. an EH edge (this latter case likely shouldn't
happen, one would need labels or something like that), the patch splits the
block with returns_twice call such that there is just one edge next to
.ABNORMAL_DISPATCHER edge and adjusts PHIs as needed to make it happen.
The functions also replace uses of PHIs from the returns_twice bb with
the corresponding PHI arguments, because otherwise it would be invalid IL.

E.g. in ubsan/pr112709-2.c (qux) we have before the ubsan pass
  <bb 10> :
  # .MEM_5(ab) = PHI <.MEM_4(9), .MEM_25(ab)(11)>
  # _7(ab) = PHI <_20(9), _8(ab)(11)>
  # .MEM_21(ab) = VDEF <.MEM_5(ab)>
  _22 = bar (*_7(ab));
where bar is returns_twice call and bb 11 has .ABNORMAL_DISPATCHER call,
this patch instruments it like:
  <bb 9> :
  # .MEM_4 = PHI <.MEM_17(ab)(4), .MEM_10(D)(5), .MEM_14(ab)(8)>
  # DEBUG BEGIN_STMT
  # VUSE <.MEM_4>
  _20 = p;
  # .MEM_27 = VDEF <.MEM_4>
  .UBSAN_NULL (_20, 0B, 0);
  # VUSE <.MEM_27>
  _2 = __builtin_dynamic_object_size (_20, 0);
  # .MEM_28 = VDEF <.MEM_27>
  .UBSAN_OBJECT_SIZE (_20, 1024, _2, 0);

  <bb 10> :
  # .MEM_5(ab) = PHI <.MEM_28(9), .MEM_25(ab)(11)>
  # _7(ab) = PHI <_20(9), _8(ab)(11)>
  # .MEM_21(ab) = VDEF <.MEM_5(ab)>
  _22 = bar (*_7(ab));
The edge from .ABNORMAL_DISPATCHER is there just to represent the
returning for 2nd and later times, the instrumentation can't be
done at that point as there is no code executed during that point.
The ubsan/pr112709-1.c testcase includes non-virtual PHIs to cover
the handling of those as well.

2024-03-13  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/112709
* gimple-iterator.h (gsi_safe_insert_before,
gsi_safe_insert_seq_before): Declare.
* gimple-iterator.cc: Include gimplify.h.
(edge_before_returns_twice_call, adjust_before_returns_twice_call,
gsi_safe_insert_before, gsi_safe_insert_seq_before): New functions.
* ubsan.cc (instrument_mem_ref, instrument_pointer_overflow,
instrument_nonnull_arg, instrument_nonnull_return): Use
gsi_safe_insert_before instead of gsi_insert_before.
(maybe_instrument_pointer_overflow): Use force_gimple_operand,
gimple_seq_add_seq_without_update and gsi_safe_insert_seq_before
instead of force_gimple_operand_gsi.
(instrument_object_size): Likewise.  Use gsi_safe_insert_before
instead of gsi_insert_before.

* gcc.dg/ubsan/pr112709-1.c: New test.
* gcc.dg/ubsan/pr112709-2.c: New test.

(cherry picked from commit 364c684c474841e3c9c04e025a5c1bca49705c86)

i386: Fix a pasto in ix86_expand_int_sse_cmp [PR114339]

In r13-3803-gfa271afb58 I've added an optimization for LE/LEU/GE/GEU
comparison against CONST_VECTOR.  As the comments say:
         /* x <= cst can be handled as x < cst + 1 unless there is
            wrap around in cst + 1.  */
...
                     /* For LE punt if some element is signed maximum.  */
...
                 /* For LEU punt if some element is unsigned maximum.  */
and
         /* x >= cst can be handled as x > cst - 1 unless there is
            wrap around in cst - 1.  */
...
                     /* For GE punt if some element is signed minimum.  */
...
                 /* For GEU punt if some element is zero.  */
Apparently I wrote the GE/GEU (second case) first and then
copied/adjusted it for LE/LEU, most of the adjustments look correct, but
I've left if (code == GE) comparison when testing if it should punt for
signed maximum.  That condition is never true, because this is in
switch (code) { ... case LE: case LEU: block and we really meant to
be what the comment says, for LE punt if some element is signed maximum,
as then cst + 1 wraps around.

The following patch fixes the pasto.

2024-03-15  Jakub Jelinek  <jakub@redhat.com>

PR target/114339
* config/i386/i386-expand.cc (ix86_expand_int_sse_cmp) <case LE>: Fix
a pasto, compare code against LE rather than GE.

* gcc.target/i386/pr114339.c: New test.

(cherry picked from commit ab2da8fb67b1aa0557a16b62689a888730dba610)

icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that.  There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.

Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies.  I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.

So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).

Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge.  I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info.  And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.

2024-03-12  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/113907
* ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.

* gcc.dg/pr113907-1.c: New test.

(cherry picked from commit 7580e39452b65ab5fb5a06f3f1ad7d59720269b5)

aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

2024-03-14 Jakub Jelinek <jakub@redhat.com>

PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.

* gcc.dg/pr114310.c: New test.

(cherry picked from commit 9349aefa1df7ae36714b7b9f426ad46e314892d1)

contrib: Improve dg-extract-results.sh's Python detection [PR109668]

'python' on some systems (e.g. SLES 15) might be Python 2. Prefer python3,
then python, then python2 (as the script still tries to work there).

PR other/109668
* dg-extract-results.sh: Check for python3 before python. Check for
python2 last.

(cherry picked from commit 64273a7e6bd8ba60058174d147521dd65d705637)

bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
  /* If the architecture does not have unconditional branches that
     can span all of memory, convert crossing unconditional branches
     into indirect jumps.  Since adding an indirect jump also adds
     a new register usage, update the register usage information as
     well.  */
  if (!HAS_LONG_UNCOND_BRANCH)
    fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.

I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.

2024-03-07  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/110079
* bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
asm goto.

* gcc.dg/pr110079.c: New test.

(cherry picked from commit b209d905f5ce1fa9d76ce634fd54245ff340960b)

lower-subreg: Fix ROTATE handling [PR114211]

On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
        (rotate:TI (reg/v:TI 106 [ h ])
            (const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
     (nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move.  The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.

* gcc.dg/pr114211.c: New test.

(cherry picked from commit aed445b0fd0c7ed16124c61e7eb732992426f103)

i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.

As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.

The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.

2024-03-04 Jakub Jelinek <jakub@redhat.com>

PR target/114184
* config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.

* gcc.target/i386/pr114184.c: New test.

(cherry picked from commit ea1c16f95b8fbaba4a7f3663ff9933ebedfb92a5)

Fortran: handle procedure pointer component in DT array [PR110826]

gcc/fortran/ChangeLog:

PR fortran/110826
* array.cc (gfc_array_dimen_size): When walking the ref chain of an
array and the ultimate component is a procedure pointer, do not try
to figure out its dimension even if it is a array-valued function.

gcc/testsuite/ChangeLog:

PR fortran/110826
* gfortran.dg/proc_ptr_comp_53.f90: New test.

(cherry picked from commit 81ee1298b47d3f3b3712ef3f3b2929ca26c4bcd2)

Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]

gcc/fortran/ChangeLog:

PR fortran/100988
* gfortran.h (IS_PROC_POINTER): New macro.
* trans-types.cc (gfc_sym_type): Use macro in determination if the
restrict qualifier can be used for a dummy variable. Fix logic to
allow the restrict qualifier also for optional arguments, and to
not apply it to pointer or proc_pointer arguments.

gcc/testsuite/ChangeLog:

PR fortran/100988
* gfortran.dg/coarray_poly_6.f90: Adjust pattern.
* gfortran.dg/coarray_poly_7.f90: Likewise.
* gfortran.dg/coarray_poly_8.f90: Likewise.
* gfortran.dg/missing_optional_dummy_6a.f90: Likewise.
* gfortran.dg/pr100988.f90: New test.

Co-authored-by: Tobias Burnus <tobias@codesourcery.com>
(cherry picked from commit 9c3a880feecf81c310b4ade210fbd7004c9aece7)

Fortran: improve checks of NULL without MOLD as actual argument [PR104819]

gcc/fortran/ChangeLog:

PR fortran/104819
* check.cc (gfc_check_null): Handle nested NULL()s.
(is_c_interoperable): Check for MOLD argument of NULL() as part of
the interoperability check.
* interface.cc (gfc_compare_actual_formal): Extend checks for NULL()
actual arguments for presence of MOLD argument when required by
Interp J3/22-146.

gcc/testsuite/ChangeLog:

PR fortran/104819
* gfortran.dg/assumed_rank_9.f90: Adjust testcase use of NULL().
* gfortran.dg/pr101329.f90: Adjust testcase to conform to interp.
* gfortran.dg/null_actual_4.f90: New test.

(cherry picked from commit db0b6746be075e43c8142585968483e125bb52d0)

testsuite: fortran: fix invalid testcases (missing MOLD argument to NULL)

The Fortran standard requires that NULL() passed to an assumed-rank
dummy argument has a MOLD argument.

gcc/testsuite/ChangeLog:

PR fortran/104819
* gfortran.dg/assumed_rank_10.f90: Add MOLD argument to NULL().
* gfortran.dg/assumed_rank_8.f90: Likewise.

(cherry picked from commit 7646b5d88056cf269ff555afe95bc361dcf5e5c0)

libstdc++: Fix typo in C++20 status table

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2023.xml: Close parenthesis.
* doc/html/manual/status.html: Regenerate.

testsuite: Added missing } in the dg-bogus comment [PR114343]

gcc/testsuite/ChangeLog:

PR testsuite/114343
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added missing } in the dg-bogus comment.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Daily bump.

i386[stv]: Handle REG_EH_REGION note

When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
        (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84 {*movdi_internal}
     (expr_list:REG_EH_REGION (const_int -11 [0xfffffffffffffff5])

into

(insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
        (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
            (const_int 0 [0]))) "test.C":22:42 -1
        (nil)))
(insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
        (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
     (expr_list:REG_EH_REGION (const_int -11 [0xfffffffffffffff5])
        (nil)))

we must copy the REG_EH_REGION note to the first insn and split the block
after the newly added insn.  The REG_EH_REGION on the second insn will be
removed later since it no longer traps.

gcc/ChangeLog:

* config/i386/i386-features.cc
(general_scalar_chain::convert_op): Handle REG_EH_REGION note.
(convert_scalars_to_vector): Ditto.
* config/i386/i386-features.h (class scalar_chain): New
memeber control_flow_insns.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C: New test.

(cherry picked from commit 618e34d56cc38e9c3ae95a413228068e53ed76bb)

Daily bump.

ada: Fix error message for Aggregate aspect

The error message was wrongly using % instead of & in the format string,
causing the displayed message to refer to incorrect names in some cases.

gcc/ada/

* sem_ch13.adb (Check_Aspect_At_Freeze_Point): fix format string,
use existing local Ident.

ada: Fix (again) incorrect handling of Aggregate aspect

Previous fix stopped the processing of the Aggregate aspect early,
skipping the call to Record_Rep_Item, making later call to
Resolve_Container_Aggregate fail.

Also, the previous fix would not handle correctly the case where the
type is private and the check for non-array type can only be done at the
freeze point with the full type.

Adapt the resolving of the aspect when the input is not correct and the
parameters can't be resolved.

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Call Record_Rep_Item.
(Check_Aspect_At_Freeze_Point): Check the aspect is specified on
non-array type only...
(Analyze_One_Aspect): ... instead of doing it too early here.
* sem_aggr.adb (Resolve_Container_Aggregate): Do nothing in case
the parameters failed to resolve.

ada: Fix incorrect handling of Aggregate aspect

This change fixes 2 incorrect handlings of the aspect.
The arguments are now correctly resolved and the aspect is rejected on
non array types.

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Mark Aggregate aspect as
needing delayed resolution and reject the aspect on non-array
type.

testsuite: xfail test for short_enums

On arm-none-eabi, the test case fails with
.../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct connection' pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]

The error was fixed in basepoints/gcc-14-6517-gb7e4a4c626e, but it
was considered to be a too big change to be backported and thus, the
failing test is marked xfail in GCC13.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added dg-bogus with xfail on offending line for short_enums.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Daily bump.

libstdc++: Update expiry times for leap seconds lists

The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in <chrono> (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.

libstdc++-v3/ChangeLog:

* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.

(cherry picked from commit ddd347fca0685804bf68d6c768282573f3ea6442)

libstdc++: Fix std::basic_format_arg::handle for BasicFormatters

std::basic_format_arg::handle is supposed to format its value as const
if that is valid, to reduce the number of instantiations of the
formatter's format function. I made a silly typo so that it checks
formattable_with<TD, Context> not formattable_with<const TD, Context>,
which breaks support for BasicFormatters i.e. ones that can only format
non-const types.

There's a static_assert in the handle constructor which is supposed to
improve diagnostics for trying to format a const argument with a
formatter that doesn't support it. That condition can't fail, because
the std::basic_format_arg constructor is already constrained to check
that the argument type is formattable. The static_assert can be removed.

libstdc++-v3/ChangeLog:

* include/std/format (basic_format_arg::handle::__maybe_const_t):
Fix condition to check if const type is formattable.
(basic_format_arg::handle::handle(T&)): Remove redundant
static_assert.
* testsuite/std/format/formatter/basic.cc: New test.

(cherry picked from commit 02ca9d3f0c5d2b0255df28f021834dd67ad79bc2)

libstdc++: Implement P2905R2 "Runtime format strings" for C++20

This change makes std::make_format_args refuse to create dangling
references to temporaries. This makes the std::vformat API safer. This
was approved in Kona 2023 as a DR for C++20 so the change is implemented
unconditionally.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono): Always use
lvalue arguments to make_format_args.
* include/std/format (make_format_args): Change parameter pack
from forwarding references to lvalue references. Remove use of
remove_reference_t which is now unnecessary.
(format_to, formatted_size): Remove incorrect forwarding of
arguments.
* testsuite/20_util/duration/io.cc: Use lvalues as arguments to
make_format_args.
* testsuite/std/format/arguments/args.cc: Likewise.
* testsuite/std/format/arguments/lwg3810.cc: Likewise.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/vformat_to.cc: Likewise.
* testsuite/std/format/string.cc: Likewise.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/format/arguments/args_neg.cc: New test.

(cherry picked from commit 2a8ee2592e48735d88df786cbafa6b0da39fc4d6)

libstdc++: Remove UB from month and weekday additions and subtractions.

The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementation follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast<long long>(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intention is for these additions/subtractions to
be performed in modulus (12 or 7) arithmetic so that no overflow is expected.

To fix these UB, this patch implements:

  template <unsigned __d, typename _T>
  unsigned __add_modulo(unsigned __x, _T __y);

  template <unsigned __d, typename _T>
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + __y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before __modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allow better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.

(cherry picked from commit 2cb3d42d3f3e7a5345ee7a6f3676a10c84864d72)

libstdc++: Improve operator-(weekday x, weekday y)

The current implementation calls __detail::__modulo which is relatively
expensive.

A better implementation is possible if we assume that x.ok() && y.ok() == true,
so that n = x.c_encoding() - y.c_encoding() is in [-6, 6]. In this case, it
suffices to return n >= 0 ? n : n + 7.

The above is allowed by [time.cal.wd.nonmembers]/5: the returned value is
unspecified when x.ok() || y.ok() == false.

The assembly emitted for x86-64 and ARM can be seen in:
https://godbolt.org/z/nMdc5vv9n.

libstdc++-v3/ChangeLog:

* include/std/chrono (operator-(const weekday&, const weekday&)):
Optimize.

(cherry picked from commit f71352c71d78ac977ea0e71a6900699a8cf09219)

libstdc++: Simplify year::is_leap()

The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's correct to replace the divisibility check by
100 with the one by 25. It turns out that _M_y % 25 == 0 also saves the ror
instruction [2]. Therefore, the obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono (year::is_leap): Clear code.

(cherry picked from commit 86a0df1a6c7fe4a835620b868e76ea78d42d6620)

libstdc++: Remove unnecessary "& 1" from year_month_day_last::day()

When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day_last::day): Remove &1.

(cherry picked from commit b011535456396a6846ff24fb5b1baea8fe0a33b1)

libstdc++: Fix UB in weekday::weekday(sys_days) and add test

The following has undefined behaviour (signed overflow) [1]:
weekday max{sys_days{days{numeric_limits<days::rep>::max()}}};

The issue is in this line when __n is very large and __n + 4 overflows:
return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6);

In addition to fixing this bug, the new implementation makes the compiler emit
shorter and branchless code for x86-64 and ARM [2].

[1] https://godbolt.org/z/1s5bv7KfT
[2] https://godbolt.org/z/zKsabzrhs

libstdc++-v3/ChangeLog:

* include/std/chrono (weekday::_S_from_days): Fix UB.
* testsuite/std/time/weekday/1.cc: Add test for overflow.

(cherry picked from commit f6ce081d0ffb5f25d71eb2f30fcfdff7f20dba22)

libstdc++: Add [[nodiscard]] to std::span members

All std::span member functions are pure functions that have no side
effects. They are only useful for their return value, so they should all
warn if that value is not used.

libstdc++-v3/ChangeLog:

* include/std/span (span, as_bytes, as_writable_bytes): Add
[[nodiscard]] attribute on all non-void functions.
* testsuite/23_containers/span/back_assert_neg.cc: Suppress
nodiscard warning.
* testsuite/23_containers/span/back_neg.cc: Likewise.
* testsuite/23_containers/span/first_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/first_assert_neg.cc: Likewise.
* testsuite/23_containers/span/first_neg.cc: Likewise.
* testsuite/23_containers/span/front_assert_neg.cc: Likewise.
* testsuite/23_containers/span/front_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_assert_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_neg.cc: Likewise.
* testsuite/23_containers/span/last_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_2_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_3_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_4_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_5_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_6_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_neg.cc: Likewise.
* testsuite/23_containers/span/nodiscard.cc: New test.

(cherry picked from commit a92a434024c59f57dc24328d946f97a5e71cee94)

libstdc++: Fix a -Wsign-compare warning in std::list

libstdc++-v3/ChangeLog:

* include/bits/list.tcc (list::sort(Cmp)): Fix -Wsign-compare
warning for loop condition.

(cherry picked from commit 9bd194434acb47fac80aad45ed04039e0535d1fe)

libstdc++: Optimize std::to_array for trivial types [PR110167]

As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence<N...> and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.

As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).

libstdc++-v3/ChangeLog:

PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.

(cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)

Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/95351

gcc/ChangeLog:

* fold-const.cc (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/float_opposite_arm-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 31ce2e993d09dcad1ce139a2848a28de5931056d)

Reject -fno-multiflags [PR114314]

When -fmultiflags option support was added in r13-3693-g6b1a2474f9e422,
it accidently allowed -fno-multiflags which then would pass on to cc1.
This fixes that oversight.

Committed as obvious after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

PR driver/114314
* common.opt (fmultiflags): Add RejectNegative.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit c4e5789cede6974b6483c0f82069ff80b5a547e4)

Daily bump.

libgfortran: [PR114304] Revert portion of PR105347 change.

PR libfortran/105437
PR libfortran/114304

libgfortran/ChangeLog:

* io/list_read.c (eat_separator): Remove check for decimal
point mode and semicolon used as a seprator. Removes
the regression.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105473.f90: Add additional checks to address
the case of semicolon at the end of a line.

(cherry picked from commit 0c179654c3170749f3fb3232f2442fcbc99bffbb)

Daily bump.

d: Fix -fpreview=in ICEs with forward referenced parameter [PR112285]

The way that the target hook preferPassByRef is implemented, it relied
on the GCC "back-end" tree type to determine whether or not to use `ref'
ABI for D `in' parameters; e.g: prefer by value if it is expected that
the target will pass the type around in registers.

Building the GCC tree type depends on the AST type being complete - all
semantic processing is finished - but as this hook is called from the
front-end, this will not be the case for forward referenced or
self-referencing types.

The consensus in upstream is that `in' parameters should always be
implicitly `ref', but as the front-end does not yet support all types
being rvalue references, limit this just static arrays and structs.

PR d/112285
PR d/112290

gcc/d/ChangeLog:

* d-target.cc (Target::preferPassByRef): Return true for all static
array and struct types.

gcc/testsuite/ChangeLog:

* gdc.dg/pr112285.d: New test.
* gdc.dg/pr112290.d: New test.
* gdc.test/compilable/previewin.d: Adjust testcase.

(cherry picked from commit a84b98c62d90bf9e8b01038f624a62725e6a44db)

Daily bump.

LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
In loongarch64, a sign extension operation is added when
operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/atomic-cas-int.C: New test.

(cherry picked from commit 3a3fbec0a4d3f36de58df9ef0b3992a3ffb359c2)

Daily bump.

libgfortran: [PR105473] Fix checks for decimal='comma'.

PR libfortran/105473

libgfortran/ChangeLog:

* io/list_read.c (eat_separator): Reject comma as a
separator when it is being used as a decimal point.
(parse_real): Reject a '.' when it should be a comma.
(read_real): Likewise.
* io/read.c (read_f): Add more checks for ',' and '.'
conditions.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr105473.f90: New test.

(cherry picked from commit a71d87431d0c4e04a402ef6566be090c470b2b53)

Daily bump.

Fix bogus error on allocator for array type with Dynamic_Predicate

This is a regression present on all active branches: the compiler gives
a bogus error on an allocator for an unconstrained array type declared
with a Dynamic_Predicate because Apply_Predicate_Check is invoked directly
on a subtype reference, which it cannot handle.

This moves the check to the resulting access value (after dereference) like
in Expand_Allocator_Expression.

gcc/ada/
PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
remove call to Apply_Predicate_Check.

gcc/testsuite/
* gnat.dg/predicate15.adb: New test.

Daily bump.

Fortran: do not evaluate polymorphic functions twice in assignment [PR114012]

PR fortran/114012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Evaluate non-trivial
arguments just once before assigning to an unlimited polymorphic
dummy variable.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114012.f90: New test.

(cherry picked from commit 2f71e801ad0bb1f620334aadbd7c99cc4efe6309)

Fortran: ALLOCATE statement, SOURCE/MOLD expressions with subrefs [PR114024]

PR fortran/114024

gcc/fortran/ChangeLog:

* trans-stmt.cc (gfc_trans_allocate): When a source expression has
substring references, part-refs, or %re/%im inquiries, wrap the
entity in parentheses to force evaluation of the expression.

gcc/testsuite/ChangeLog:

* gfortran.dg/allocate_with_source_27.f90: New test.
* gfortran.dg/allocate_with_source_28.f90: New test.

Co-Authored-By: Harald Anlauf <anlauf@gmx.de>
(cherry picked from commit 80d126ba99f4b9bc64d4861b3c4bae666497f2d4)

Daily bump.

SH: Fix 101737

gcc/ChangeLog:
PR target/101737
* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
is not an insn, but e.g. a code label.

d: Fix gdc -O2 -mavx generates misaligned vmovdqa instruction [PR114171]

PR d/114171

gcc/d/ChangeLog:

* d-codegen.cc (lower_struct_comparison): Keep alignment of original
type in reinterpret cast for comparison.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr114171.d: New test.

(cherry picked from commit 623f52775e677bb3d6e9e7ef97196741dd904b1e)

Daily bump.

d: Fix callee destructor call invalidates the live object [PR113758]

When generating the argument, check the isCalleeDestroyingArgs hook, and
force a TARGET_EXPR to be created if true, so that a reference to the
live object isn't passed directly to the function that runs dtors.

When instead dealing with caller running destructors, two temporaries
were being generated, one explicit temporary generated by the D
front-end, and another implicitly by the code generator. This has been
reduced to one by setting DECL_VALUE_EXPR on the explicit temporary to
bind it to the implicit slot created for the TARGET_EXPR, as that has
the shorter lifetime of the two.

PR d/113758

gcc/d/ChangeLog:

* d-codegen.cc (d_build_call): Force a TARGET_EXPR when callee
destorys its arguments.
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Set
SET_DECL_VALUE_EXPR on the temporary variable to make it a placeholder
for the TARGET_EXPR_SLOT.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr113758.d: New test.

(cherry picked from commit 3c57b1c12a8e34d50bdf6aaf44146760db6d1b33)

d: Fix internal compiler error: in make_import, at d/imports.cc:48 [PR113125]

The cause of the ICE was that TYPE_DECLs were only being generated for
structs with members, not opaque structs.

PR d/113125

gcc/d/ChangeLog:

* types.cc (TypeVisitor::visit (TypeStruct *)): Generate TYPE_DECL and
apply UDAs to opaque struct declarations.

gcc/testsuite/ChangeLog:

* gdc.dg/imports/pr113125.d: New test.
* gdc.dg/pr113125.d: New test.

(cherry picked from commit b0efb1c35724e3332ee5993976efb98200c1a154)

calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR114136]

On Tue, Feb 27, 2024 at 04:41:32PM +0000, Richard Earnshaw wrote:
> On Arm the PR107453 change is causing all anonymous arguments to be passed on the
> stack, which is incorrect per the ABI.  On a target that uses
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
> zero?  Is it enough to guard both the statements you've added with
> !targetm.calls.pretend_outgoing_args_named?

The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
it, but it should be handled as if it was non-NULL but list length was 0.

So, for the
  if (type_arg_types != 0)
    n_named_args
      = (list_length (type_arg_types)
         /* Count the struct value address, if it is passed as a parm.  */
         + structure_value_addr_parm);
  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
    n_named_args = 0;
  else
    /* If we know nothing, treat all args as named.  */
    n_named_args = num_actuals;
case, I think guarding it by any target hooks is wrong, although
I guess it should have been
    n_named_args = structure_value_addr_parm;
instead of
    n_named_args = 0;

For the second
  if (type_arg_types != 0
      && targetm.calls.strict_argument_naming (args_so_far))
    ;
  else if (type_arg_types != 0
           && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
    /* Don't include the last named arg.  */
    --n_named_args;
  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
    n_named_args = 0;
  else
    /* Treat all args as named.  */
    n_named_args = num_actuals;
I think we should treat those as if type_arg_types was non-NULL
with 0 elements in the list, except the --n_named_args would for
!structure_value_addr_parm lead to n_named_args = -1, I think we want
0 for that case.

2024-03-01  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/114136
* calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
n_named_args initially before INIT_CUMULATIVE_ARGS to
structure_value_addr_parm rather than 0, after it don't modify
it if strict_argument_naming and clear only if
!pretend_outgoing_varargs_named.

(cherry picked from commit b5377928a2a5cd2a79eda59e2eba7d0511bf7566)

function: Fix another TYPE_NO_NAMED_ARGS_STDARG_P spot

When looking at PR114175 (although that bug seems to be now a riscv backend
bug), I've noticed that for the TYPE_NO_NAMED_ARGS_STDARG_P functions which
return value through hidden reference, like
  #include <stdarg.h>

  struct S { char a[64]; };
  int n;

  struct S
  foo (...)
  {
    struct S s = {};
    va_list ap;
    va_start (ap);
    for (int i = 0; i < n; ++i)
      if ((i & 1))
        s.a[0] += va_arg (ap, double);
      else
        s.a[0] += va_arg (ap, int);
    va_end (ap);
    return s;
  }
we were incorrectly calling assign_parms_setup_varargs twice, once
at the start of the function and once in
      if (cfun->stdarg && !DECL_CHAIN (parm))
        assign_parms_setup_varargs (&all, &data, false);
where parm is the last and only "named" parameter.

The first call, guarded with TYPE_NO_NAMED_ARGS_STDARG_P, was added in
r13-3549 and is needed for int bar (...) etc. functions using
va_start/va_arg/va_end, otherwise the
  FOR_EACH_VEC_ELT (fnargs, i, parm)
in which the other call is will not iterate at all.  But we shouldn't
be doing that if we have the hidden return pointer.

With the following patch on the above testcase with -O0 -std=c23 the
assembly difference is:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $192, %rsp
.cfi_offset 3, -24
- movq %rdi, -192(%rbp)
- movq %rsi, -184(%rbp)
- movq %rdx, -176(%rbp)
- movq %rcx, -168(%rbp)
- movq %r8, -160(%rbp)
- movq %r9, -152(%rbp)
- testb %al, %al
- je .L2
- movaps %xmm0, -144(%rbp)
- movaps %xmm1, -128(%rbp)
- movaps %xmm2, -112(%rbp)
- movaps %xmm3, -96(%rbp)
- movaps %xmm4, -80(%rbp)
- movaps %xmm5, -64(%rbp)
- movaps %xmm6, -48(%rbp)
- movaps %xmm7, -32(%rbp)
-.L2:
movq %rdi, -312(%rbp)
movq %rdi, -192(%rbp)
movq %rsi, -184(%rbp)
movq %rdx, -176(%rbp)
movq %rcx, -168(%rbp)
movq %r8, -160(%rbp)
movq %r9, -152(%rbp)
testb %al, %al
- je .L13
+ je .L12
movaps %xmm0, -144(%rbp)
movaps %xmm1, -128(%rbp)
movaps %xmm2, -112(%rbp)
movaps %xmm3, -96(%rbp)
movaps %xmm4, -80(%rbp)
movaps %xmm5, -64(%rbp)
movaps %xmm6, -48(%rbp)
movaps %xmm7, -32(%rbp)
-.L13:
+.L12:
plus some renumbering of labels later on which clearly shows
that because of this bug, we were saving all the registers twice
rather then once.  With -O2 -std=c23 some of it is DCEd, but we still get
subq $160, %rsp
.cfi_def_cfa_offset 168
- testb %al, %al
- je .L2
- movaps %xmm0, 24(%rsp)
- movaps %xmm1, 40(%rsp)
- movaps %xmm2, 56(%rsp)
- movaps %xmm3, 72(%rsp)
- movaps %xmm4, 88(%rsp)
- movaps %xmm5, 104(%rsp)
- movaps %xmm6, 120(%rsp)
- movaps %xmm7, 136(%rsp)
-.L2:
movq %rdi, -24(%rsp)
movq %rsi, -16(%rsp)
movq %rdx, -8(%rsp)
movq %rcx, (%rsp)
movq %r8, 8(%rsp)
movq %r9, 16(%rsp)
testb %al, %al
- je .L13
+ je .L12
movaps %xmm0, 24(%rsp)
movaps %xmm1, 40(%rsp)
movaps %xmm2, 56(%rsp)
movaps %xmm3, 72(%rsp)
movaps %xmm4, 88(%rsp)
movaps %xmm5, 104(%rsp)
movaps %xmm6, 120(%rsp)
movaps %xmm7, 136(%rsp)
-.L13:
+.L12:
difference, i.e. this time not all, but the floating point args
were conditionally all saved twice.

2024-03-01  Jakub Jelinek  <jakub@redhat.com>

* function.cc (assign_parms): Only call assign_parms_setup_varargs
early for TYPE_NO_NAMED_ARGS_STDARG_P functions if fnargs is empty.

(cherry picked from commit c6f5f773323ab689a665bc208c3b221db42fe624)

graphite: Fix non-INTEGER_TYPE integral comparison handling [PR114041]

The following testcases are miscompiled, because graphite ignores boolean,
enumerated or _BitInt comparisons, rewrites the code as if the comparisons
were always true or always false.

The INTEGER_TYPE checks were initially added in r6-2239 but at that point
it was both in add_conditions_to_domain and in parameter_index_in_region.
Later on the check was also added to stmt_simple_for_scop_p, and finally
r8-3931 changed the stmt_simple_for_scop_p check to INTEGRAL_TYPE_P
and turned the parameter_index_in_region -> assign_parameter_index_in_region
into INTEGRAL_TYPE_P assertion, but the add_conditions_to_domain check
for INTEGER_TYPE remained.

The following patch uses INTEGRAL_TYPE_P to complete the change.

2024-02-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/114041
* graphite-sese-to-poly.cc (add_conditions_to_domain): Check for
INTEGRAL_TYPE_P check rather than INTEGER_TYPE.

* gcc.dg/graphite/run-id-pr114041-2.c: New test.

(cherry picked from commit d6479050ecef10fd5e67b4da989229e4cfac53ee)

testsuite: Add c23-stdarg-4.c test variant where all functions return large struct

I think we have no coverage for the case where structure_value_addr_parm and
TYPE_NO_NAMED_ARGS_STDARG_P are both true.  The
  if (type_arg_types != 0)
    n_named_args
      = (list_length (type_arg_types)
         /* Count the struct value address, if it is passed as a parm.  */
         + structure_value_addr_parm);
  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
    n_named_args = 0;
  else
    /* If we know nothing, treat all args as named.  */
    n_named_args = num_actuals;
code should probably have n_named_args = structure_value_addr_parm;
instead of n_named_args = 0;, this testcase is an attempt to see if
it is broken on any target.

2024-02-28  Jakub Jelinek  <jakub@redhat.com>

* gcc.dg/c23-stdarg-6.c: New test.

(cherry picked from commit dc30e24b76d570e13a71567a38f7594b104736bf)