git.ipfire.org Git - thirdparty/gcc.git/log

tree-optimization/118717 - store commoning vs. abnormals

When we sink common stores in cselim or the sink pass we have to
make sure to not introduce overlapping lifetimes for abnormals
used in the ref. The easiest is to avoid sinking stmts which
reference abnormals at all which is what the following does.

PR tree-optimization/118717
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1):
Do not common stores referencing abnormal SSA names.
* tree-ssa-sink.cc (sink_common_stores_to_bb): Likewise.

* gcc.dg/torture/pr118717.c: New testcase.

Add a unit test for random access in the file cache

v2: Remove extra {}

gcc/ChangeLog:

* input.cc (check_line): New.
(test_replacement): New function to test line caching.
(input_cc_tests): Call test_replacement

Size input line cache based on file size

While the input line cache size now tunable it's better if the compiler
auto tunes it. Otherwise large files needing random file access will
still have to search many lines to find the right lines.

Add support for allocating one line anchor per hundred input lines.
This means an overhead of ~235k per 1M input lines on 64bit, which
seems reasonable.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache_slot::get_next_line): Implement
dynamic sizing of m_line_record based on input length.
* params.opt: (param_file_cache_lines): Set to 0 to size
dynamically.

Remove m_total_lines support from input cache

With the new cache maintenance algorithm we don't need the
maximum number of lines anymore. Remove all the code for that.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (total_lines_num): Remove.
(file_cache_slot::evict): Ditto.
(file_cache_slot::create): Ditto.
(file_cache_slot::set_content): Ditto.
(file_cache_slot::file_cache_slot): Ditto.
(file_cache_slot::dump): Ditto.

Rebalance file_cache input line cache dynamically

The input context file_cache maintains an array of anchors
to speed up accessing lines before the previous line.
The array has a fixed upper size and the algorithm relies
on the linemap reporting the maximum number of lines in the file
in advance to compute the position of each anchor in the cache.

This doesn't work for C which doesn't know the maximum number
of lines before the files has finished parsing. The code
has a fallback for this, but it is quite inefficient and
effectively defeats the cache, so many accesses have to
go through most of the input buffer to compute line
boundaries. For large files this can be very costly
as demonstrated in PR118168.

Use a different algorithm to maintain the cache without
needing the maximum number of lines in advance. When the cache
runs out of entries and the gap to the last line anchor gets
too large, prune every second entry in the cache. This maintains
even spacing of the line anchors without requiring the maximum
index.

For the original PR this moves the overhead of enabling
-Wmisleading-indentation to 32% with the default cache size.
With a 10k entry cache it becomes noise.

  cc1 -O0 -fsyntax-only mypy.c   -quiet  ran
    1.03 ± 0.05 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation --param=file-cache-lines=10000
    1.09 ± 0.08 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation --param=file-cache-lines=1000
    1.32 ± 0.07 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation

The code could be further optimized, e.g. use the vectorized
line search functions the preprocessor uses.

Also it seems the input cache always reads the whole file into
memory, so perhaps it should just be using file mmap if possible.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache_slot::get_next_line): Use new algorithm
to maintain
(file_cache_slot::read_line_num): Use binary search for lookup.

Add tunables for input buffer

The input machinery to read the source code independent of the lexer
has a range of hard coded maximum array sizes that can impact performance.
Make them tunable.

input.cc is part of libcommon so it cannot direct access params
without a level of indirection.

gcc/ChangeLog:

PR preprocessor/118168
* input.cc (file_cache::tune): New function.
* input.h (class file_cache): Make tunables non const.
* params.opt: Add new tunables.
* toplev.cc (toplev::main): Initialize input buffer context
tunables.

Daily bump.

PR modula2/117411 Request for documentation to include exception example

This patch adds a new section to the gm2 documentation and new
corresponding testcode to the regression testsuite.

gcc/ChangeLog:

PR modula2/117411
* doc/gm2.texi (Exception handling): New section.
(The ISO system module): Add description of COFF_T.
(Assembler language): Tidy up last sentance.

gcc/testsuite/ChangeLog:

PR modula2/117411
* gm2/iso/run/pass/except9.mod: New test.
* gm2/iso/run/pass/lazyunique.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]

At the end of a sequence like:
#pragma GCC push_options
...
#pragma GCC pop_options

the handler for pop_options calls cl_optimization_compare() (as generated by
optc-save-gen.awk) to make sure that all global state has been restored to
the value it had prior to the push_options call. The verification is
performed for almost all entries in the global_options struct. This leads to
unexpected checking asserts, as discussed in the PR, in case the state of
warnings-related options has been intentionally modified in between
push_options and pop_options via a call to #pragma GCC diagnostic. Address
that by skipping the verification for CL_WARNING-flagged options.

gcc/ChangeLog:

PR middle-end/115913
* optc-save-gen.awk (cl_optimization_compare): Skip options with
CL_WARNING flag.

gcc/testsuite/ChangeLog:

PR middle-end/115913
* c-c++-common/cpp/pr115913.c: New test.

Daily bump.

x86: Add a test for PR rtl-optimization/111673

Add a test for the target independent bug, PR rtl-optimization/111673.

PR rtl-optimization/111673
* gcc.target/i386/pr111673.c: New file.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

x86: Change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)"

Update

commit dd6247cb8fc11a15e23e949092f89d24ff329209
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Jan 31 12:29:04 2025 +0800

x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt

to change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)".

PR target/118713
* config/i386/i386-expand.cc (ix86_expand_call): Change "if
(TARGET_X32 ...)" back to "else if (TARGET_X32 ...)".

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

PR modula2/118703 Abort compiling m2pim_NumberIO_BinToStr

This patch builds access to the gcc builtins clz, clzl, clzll,
ctz, ctzl and ctzll within m2builtins.cc. The patch provides
modula2 api access to clz, clzll, ctz and ctzll though the
Builtins definition module. This PR was raised because of
PR118689.

gcc/m2/ChangeLog:

PR modula2/118703
* gm2-gcc/m2builtins.cc (define_builtin_gcc): New function.
(m2builtins_init): Call define_builtin_gcc.
* gm2-libs/Builtins.def (clz): New procedure function.
(clzll): Ditto.
(ctz): Ditto.
(ctzll): Ditto.
* gm2-libs/Builtins.mod (clz): New procedure function.
(clzll): Ditto.
(ctz): Ditto.
(ctzll): Ditto.
* gm2-libs/cbuiltin.def (clz): New procedure function.
(clzll): Ditto.
(ctz): Ditto.
(ctzll): Ditto.

gcc/testsuite/ChangeLog:

PR modula2/118703
* gm2/builtins/run/pass/testbitfns.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt

If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should
use register, not memory. Update Bs, Bw and Bz constraints to disable
indirect call over memmory if TARGET_INDIRECT_BRANCH_REGISTER true, change
x32 call over GOT slot to call over register and also disable sibcall
over memory.

gcc/

PR target/118713
* config/i386/constraints.md (Bs): Always disable if
TARGET_INDIRECT_BRANCH_REGISTER is true.
(Bw): Likewise.
* config/i386/i386-expand.cc (ix86_expand_call): Force indirect
call via register for x32 GOT slot call if
TARGET_INDIRECT_BRANCH_REGISTER is true.
* config/i386/i386-protos.h (ix86_nopic_noplt_attribute_p): New.
* config/i386/i386.cc (ix86_nopic_noplt_attribute_p): Make it
global.
* config/i386/i386.md (*call_got_x32): Disable indirect call via
memory for TARGET_INDIRECT_BRANCH_REGISTER.
(*call_value_got_x32): Likewise.
(*sibcall_value_pop_memory): Likewise.
* config/i386/predicates.md (constant_call_address_operand):
Return false if both TARGET_INDIRECT_BRANCH_REGISTER and
ix86_nopic_noplt_attribute_p are true.

gcc/testsuite/

PR target/118713
* gcc.target/i386/pr118713-1-x32.c: New test.
* gcc.target/i386/pr118713-1.c: Likewise.
* gcc.target/i386/pr118713-2-x32.c: Likewise.
* gcc.target/i386/pr118713-2.c: Likewise.
* gcc.target/i386/pr118713-3-x32.c: Likewise.
* gcc.target/i386/pr118713-3.c: Likewise.
* gcc.target/i386/pr118713-4-x32.c: Likewise.
* gcc.target/i386/pr118713-4.c: Likewise.
* gcc.target/i386/pr118713-5-x32.c: Likewise.
* gcc.target/i386/pr118713-5.c: Likewise.
* gcc.target/i386/pr118713-6-x32.c: Likewise.
* gcc.target/i386/pr118713-6.c: Likewise.
* gcc.target/i386/pr118713-7-x32.c: Likewise.
* gcc.target/i386/pr118713-7.c: Likewise.
* gcc.target/i386/pr118713-8-x32.c: Likewise.
* gcc.target/i386/pr118713-8.c: Likewise.
* gcc.target/i386/pr118713-9-x32.c: Likewise.
* gcc.target/i386/pr118713-9.c: Likewise.
* gcc.target/i386/pr118713-10-x32.c: Likewise.
* gcc.target/i386/pr118713-10.c: Likewise.
* gcc.target/i386/pr118713-11-x32.c: Likewise.
* gcc.target/i386/pr118713-11.c: Likewise.
* gcc.target/i386/pr118713-12-x32.c: Likewise.
* gcc.target/i386/pr118713-12.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

sarif-replay: support "cached" logical locations [§3.33.3]

Some SARIF files offload most of the properties within logical locations
in the results to an array of "cached" instances in
theRun.logicalLocations, so the information can be consolidated (and to
support the "parentIndex" property, which is PR 116176).

Support such files in sarif-replay.

gcc/ChangeLog:
* libsarifreplay.cc (sarif_replayer::handle_run_obj): Pass run to
handle_result_obj.
(sarif_replayer::handle_result_obj): Add run_obj param and pass it
to handle_location_object and handle_thread_flow_object.
(sarif_replayer::handle_thread_flow_object): Add run_obj param and
pass it to handle_thread_flow_location_object.
(sarif_replayer::handle_thread_flow_location_object): Add run_obj
param and pass it to handle_location_object.
(sarif_replayer::handle_location_object): Add run_obj param and
pass it to handle_logical_location_object.
(sarif_replayer::handle_logical_location_object): Add run_obj
param. If the run_obj is non-null and has "logicalLocations",
then use these "cached" logical locations if we see an "index"
property, as per §3.33.3

gcc/testsuite/ChangeLog:
* sarif-replay.dg/2.1.0-invalid/3.33.3-index-out-of-range.sarif:
New test.
* sarif-replay.dg/2.1.0-valid/spec-example-4.sarif: Update expected
output to reflect that we now find the function name for the
events in the path.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Ada: Fix segfault on uninitialized variable as operand of primitive operator

...of derived real type. It comes from an unexpected internal adjustment.

gcc/ada/
PR ada/118712
* sem_warn.adb (Check_References): Deal with small adjustments of
references.

gcc/testsuite/
* gnat.dg/warn33.adb: New test.
* gnat.dg/warn33_pkg.ads: New helper.

x86: Add a -mstack-protector-guard=global test

Verify that -mstack-protector-guard=global works on x86. Default stack
protector uses TLS. -mstack-protector-guard=global uses a global variable,
__stack_chk_guard, instead of TLS.

* gcc.target/i386/ssp-global.c: New file.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

[committed][PR tree-optimization/114277] Fix missed optimization for multiplication against boolean value

Andrew, Raphael and I have all poked at it in various ways over the last year
or so.  I think when Raphael and I first looked at it I sent us down a bit of
rathole.

In particular it's odd that we're using a multiply to implement a select and it
seemed like recognizing the idiom and rewriting into a conditional move was the
right path.  That looked reasonably good for the test, but runs into problems
with min/max detection elsewhere.

I think that initial investigation somewhat polluted our thinking.  The
regression can be fixed with a fairly simple match.pd pattern.

Essentially we want to handle

x * (x || b) -> x
x * !(x || b) -> 0

There's simplifications that can be made for "&&" cases, but I haven't seen
them in practice.  Rather than drop in untested patterns, I'm leaving that as a
future todo.

My original was two match.pd patterns.  Andrew combined them into a single
pattern.  I've made this conditional on GIMPLE as an earlier version that
simplified to a conditional move showed that when applied on GENERIC we could
drop an operand with a side effect which is clearly not good.

I've bootstrapped and regression tested this on x86.  I've also tested on the
various embedded targets in my tester.

PR tree-optimization/114277
gcc/
* match.pd (a * (a || b) -> a): New pattern.
(a * !(a || b) -> 0): Likewise.

gcc/testsuite
* gcc.target/i386/pr114277.c: New test.
* gcc.target/riscv/pr114277.c: Likewise.

Co-author:  Andrew Pinski <quic_apinski@quicinc.com>

icf: Compare call argument types in certain cases and asm operands [PR117432]

compare_operand uses operand_equal_p under the hood, which e.g. for
INTEGER_CSTs will just match the values rather regardless of their types.
Now, in many comparing the type is redundant, if we have
  x_2 = y_3 + 1;
we've already compared the type for the lhs and also for rhs1, there won't
be any surprises on rhs2.
As noted in the PR, there are cases where the type of the operand is the
sole place of information and we don't want to ICF merge functions if the
types differ.
One case is stdarg functions, arguments passed to ..., it is different
if we pass 1, 1L, 1LL.
Another case are the K&R unprototyped functions (sure, gone in C23).
And yet another case are inline asm operands, "r" (1) is different from "r"
(1L) from "r" (1LL).

So, the following patch determines based on lack of fntype (e.g. for
internal functions), or on !prototype_p, or on stdarg_p (in that case
using number of named arguments) which arguments need to have type checked
and does that, plus compares types on inline asm operands (maybe it would be
enough to do that just for input operands but we have just a routine to
handle both and I didn't feel we need to differentiate).

Furthermore, I've noticed fntype{1,2} isn't actually compared if it is a
direct call (gimple_call_fndecl is non-NULL).  That is wrong too, we could
have
  void (*fn) (int, long long) = (void (*) (int, long long)) foo;
  fn (1, 1LL);
in one case and
  void (*fn) (long long, int) = (void (*) (long long, int)) foo;
  fn (1LL, 1);
in another, both folded into a direct call of foo with different
gimple_call_fntype.  Sure, one of them would be UB at runtime (or both), but
what if we ICF merge it into something that into the one UB at runtime
and the program actually calls the correct one only?

2025-02-01  Jakub Jelinek  <jakub@redhat.com>

PR ipa/117432
* ipa-icf-gimple.cc (func_checker::compare_asm_inputs_outputs):
Also return_false if operands have incompatible types.
(func_checker::compare_gimple_call): Check fntype1 vs. fntype2
compatibility for all non-internal calls and assume fntype1 and
fntype2 are non-NULL for those.  For calls to non-prototyped
calls or for stdarg_p functions after the last named argument (if any)
check type compatibility of call arguments.

* gcc.c-torture/execute/pr117432.c: New test.
* gcc.target/i386/pr117432.c: New test.

c++: check_flexarray fixes [PR117516]

On the pr117516.C testcase check_flexarrays and its helper functions
have exponential complexity, plus it reports the same bug over and over
again in some cases instead of reporting perhaps other bugs.
The functions want to diagnose flexible array member (and strangely [0]
arrays too) followed by some other non-empty or array members in the same
strcuture, or followed by other non-empty or array members in a containing
structure (any of them), or flexible array members/[0] arrays in structures
with no other non-empty members, or those nested in other structures.
Strangely it doesn't complain if flexible array member is in a structure
used in an array.

As can be seeen on e.g. the flexary41.C test, it keeps reporting the
same bug over and over:
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct A’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct B’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct C’
flexary41.C:5:24: error: flexible array member ‘A::b’ not at end of ‘struct D’
flexary41.C:13:39: error: flexible array member ‘E::<unnamed struct>::n’ not at end of ‘struct E’
flexary41.C:18:23: error: flexible array member ‘H::t’ not at end of ‘struct K’
flexary41.C:25:36: note: next member ‘int K::ab’ declared here
flexary41.C:25:8: note: in the definition of ‘struct K’
The bug that A::b is followed by A::c is one bug reported 4 times, while it
doesn't report the other bugs, that B::e flexarray is followed by B::f
and that C::h flexarray is followed by C::i.
That is because it always walks all the structures/unions of all the members
and just finds the first flexarray in there.

Now, this has horrible complexity plus it doesn't seem really useful to
users.  So, for cases where a flexible array member is followed by a
non-empty other member in the same structure, the following patch just
reports it once when finalizing that structure, and otherwise just recurses
in structures solely into the last member, so that it can report cases like
struct X { int a; int b[]; };
struct Y { X c; int d; };
or
struct Z { X c; };
i.e. correct use of flexarray in X but following it by another member in Y
or just nesting it (the former is error, the latter pedwarn as before).
By only looking at the last member for structures we get rid of the complexity.

Note, the patch doesn't do anything about unions, I think we still could
spend a lot of time compiling.
struct S { char s; };
union U0 { S a, b; };
union U1 { union U0 a, b; };
union U2 { union U1 a, b; };
...
union U32 { union U31 a, b; };
struct T { union U32 a; int b; };
Not really sure what we could do about that, all the elements are "last"
(but admittedly I haven't studied in detail how the original code worked
in union, there is fmem->after[pun] where pun is whether it is somewhere
inside of a union).  Perhaps in a hash table marking unions which don't have
any flexarrays at the end, nested or not, so that we don't walk them again?
Plus if we find some with flexarray at the end, maybe there is no point
to look other union members?  In any case, I think that is less severe,
because people usually don't nest unions deeply.

2025-02-01  Jakub Jelinek  <jakub@redhat.com>

PR c++/117516
* class.cc (field_nonempty_p): Formatting fixes.  Use
integer_zerop instead of tree_int_cst_equal with size_zero_node.
(struct flexmems_t): Change type of first member from tree to bool.
(find_flexarrays): Add nested_p argument.  Change pun argument type
from tree to bool, adjust uses.  Formatting fixes.  If BASE_P or
NESTED_P and T is RECORD_TYPE, start looking only at the last
non-empty or array FIELD_DECL.  Adjust recursive call, set first
if it was a nested call and found an array.
(diagnose_invalid_flexarray, diagnose_flexarrays, check_flexarrays):
Formatting fixes.

* g++.dg/ext/flexary9.C: Expect different wording of one of the
warnings and at a different line.
* g++.dg/ext/flexary19.C: Likewise.
* g++.dg/ext/flexary42.C: New test.
* g++.dg/other/pr117516.C: New test.

libstdc++: Fix flat_foo::insert_range for non-common ranges [PR118156]

This fixes flat_map/multimap::insert_range by just generalizing the
insert implementation to handle heterogenous iterator/sentinel pair.
I'm not sure we can do better than this, e.g. we can't implement it in
terms of the adapted containers' insert_range because that'd require two
passes over the range.

For flat_set/multiset, we can implement insert_range directly in terms
of the adapted container's insert_range. A fallback implementation
is also provided if insert_range isn't available, as is the case for
std::deque currently.

PR libstdc++/118156

libstdc++-v3/ChangeLog:

* include/std/flat_map (_Flat_map_impl::_M_insert): Generalized
version of insert taking heterogenous iterator/sentinel pair.
(_Flat_map_impl::insert): Dispatch to _M_insert.
(_Flat_map_impl::insert_range): Likewise.
(flat_map): Export _Flat_map_impl::insert_range.
(flat_multimap): Likewise.
* include/std/flat_set (_Flat_set_impl::insert_range):
Reimplement directly, not in terms of insert.
(flat_set): Export _Flat_set_impl::insert_range.
(flat_multiset): Likewise.
* testsuite/23_containers/flat_map/1.cc (test06): New test.
* testsuite/23_containers/flat_multimap/1.cc (test06): New test.
* testsuite/23_containers/flat_multiset/1.cc (test06): New test.
* testsuite/23_containers/flat_set/1.cc (test06): New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Fix return value of vector::insert_range

In some cases we're wrongly returning an iterator to (one past) the last
element inserted instead of to the first element inserted.

libstdc++-v3/ChangeLog:

* include/bits/stl_bvector.h (vector<bool>::insert_range):
Consistently return an iterator pointing to the first element
inserted.
* include/bits/vector.tcc (vector::insert_range): Likewise.
* testsuite/23_containers/vector/bool/modifiers/insert/insert_range.cc:
Verify insert_range return values.
* testsuite/23_containers/vector/modifiers/insert/insert_range.cc:
Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

Fortran: host association issue with symbol in COMMON block [PR108454]

When resolving a flavorless symbol that is already registered with a COMMON
block, and which neither has the intrinsic, generic, or external attribute,
skip searching among interfaces to avoid false resolution to a derived type
of the same name.

PR fortran/108454

gcc/fortran/ChangeLog:

* resolve.cc (resolve_common_blocks): Initialize variable.
(resolve_symbol): If a symbol is already registered with a COMMON
block, do not search for an interface with the same name.

gcc/testsuite/ChangeLog:

* gfortran.dg/common_29.f90: New test.

OpenMP/Fortran: Add missing pop_state in parse_omp_dispatch [PR118714]

When the ST_NONE case is taken, the function returns immediately. Not calling
pop_state causes a dangling pointer.

PR fortran/118714

gcc/fortran/ChangeLog:

* parse.cc (parse_omp_dispatch): Add missing pop_state.

c++: wrong-code with consteval constructor [PR117501]

We've had a wrong-code problem since r14-4140, due to which we
forget to initialize a variable.

In consteval39.C, we evaluate

    struct QQQ q;
  <<cleanup_point <<< Unknown tree: expr_stmt
    QQQ::QQQ (&q, TARGET_EXPR <D.2687, <<< Unknown tree: aggr_init_expr
      5
      __ct_comp
      D.2687
      (struct basic_string_view *) <<< Unknown tree: void_cst >>>
      (const char *) "" >>>>) >>>>>;

into

    struct QQQ q;
  <<cleanup_point <<< Unknown tree: expr_stmt
    {.data={._M_len=42, ._M_str=0}} >>>>>;

and then the useless expr_stmt is dropped on the floor, so q isn't
initialized.  As pre-r14-4140, we need to handle constructors specially.

With this patch, we generate:

    struct QQQ q;
  <<cleanup_point <<< Unknown tree: expr_stmt
    q = {.data={._M_len=42, ._M_str=0}} >>>>>;

initializing q properly.

PR c++/117501

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_build_init_expr_for_ctor): New.
(cp_fold_immediate_r): Call it.
(cp_fold): Break out code into cp_build_init_expr_for_ctor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval39.C: New test.
* g++.dg/cpp2a/consteval40.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

[PR116234][LRA]: Check debug insn when looking at one insn pseudo occurrence

  LRA can change reg class to NO_REGS when pseudo referred in one
insn.  Checking the references did not take into account that referring
insn can be a debug insn.  This resulted in different code generation
with and without debug info generation.  The patch fixes this pitfall.

gcc/ChangeLog:

PR rtl-optimization/116234
* lra-constraints.cc (multiple_insn_refs_p): New function.
(curr_insn_transform): Use it.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116234
* gfortran.target/aarch64/aarch64.exp: New.
* gfortran.target/aarch64/pr116234.f: New.

Fix wrong elaboration for allocator at library level of dynamic library

The problem was preexisting for class-wide allocators, but now occurs for
allocators of controlled types too, because of the recent overhaul of the
finalization machinery.

gcc/ada/
* gcc-interface/utils.cc (gnat_pushdecl): Clear TREE_PUBLIC on
functions really nested in another function.

testsuite: Add testcase for already fixed PR [PR117498]

This wrong-code issue has been fixed with r15-7249.
We still emit warnings which are questionable and perhaps we'd
get better generated code if niters determined the loop has only a single
iteration without UB and we'd punt on vectorizing it (or unrolling).

2025-01-31 Jakub Jelinek <jakub@redhat.com>

PR middle-end/117498
* gcc.c-torture/execute/pr117498.c: New test.

force-indirect-call-2.c: Allow indirect branch via GOT

r15-1619-g3b9b8d6cfdf593 changed the codegen from

f2:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq f1@GOTPCREL(%rip), %rbx
call *%rbx
leaq f3(%rip), %rax
call *%rax
movq %rbx, %rax
popq %rbx
.cfi_def_cfa_offset 8
jmp *%rax
.cfi_endproc

to

f2:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call *f1@GOTPCREL(%rip)
leaq f3(%rip), %rax
call *%rax
addq $8, %rsp
.cfi_def_cfa_offset 8
jmp *f1@GOTPCREL(%rip)
.cfi_endproc

Since it is OK to indirect call via memory for -mforce-indirect-call,
allow indirect branch via GOT.

PR target/115673
* gcc.target/i386/force-indirect-call-2.c: Allow indirect branch
via GOT.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

debug/100530 - Revert QUAL_ADDR_SPACE handling from dwarf2out.cc

The bug clearly shows that r8-4385-ga297ccb52e0c89 was wrong in
enabling handling of address-space qualification as DWARF type
qualifiers as the code isn't prepared to it actually be not handled
and ends up changing a lesser qualified (without address-space)
type DIE in ways tripping asserts. The following reverts that
part which then causes the DIE for the same type with address-space
qualifiers removed to be re-used since there's currently no code
to encode address-spaces within dwarf2out.cc or in the DWARF spec.

r8-4385-ga297ccb52e0c89 did not come with a testcase nor a good
description of the bug fixed - I've verified const qualification
mixed with address-spaces creates the expected DWARF.

PR debug/100530
* dwarf2out.cc (modified_type_die): Do not claim we handle
address-space qualification with dwarf_qual_info[].

* gcc.target/i386/pr100530.c: New testcase.

niter: Make build_cltz_expr more robust [PR118689]

Since my r15-7223 the niter analysis can recognize one loop during bootstrap
as being ctz like.
The patch just turned
@@ -2173,7 +2173,7 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
   _T535_44 = &buf[i.40_2]{lb: 1 sz: 4};
   _T536_45 = x_21 & 1;
   *_T535_44 = _T536_45;
-  _T537_47 = x_21 / 2;
+  _T537_47 = x_21 >> 1;
   x_48 = _T537_47;
   # DEBUG x => x_48
   if (x_48 != 0)
which is not a big deal for the number_of_iterations_cltz optimization, it
recognizes both right shift by 1 and unsigned division by 2 (and similarly
for clz left shift by 1 or multiplication by 2).
But starting with forwprop1 that change also resulted in
@@ -1875,9 +1875,9 @@ PROC m2pim_NumberIO_BinToStr (CARDINAL x
   i.40_2 = (INTEGER) _T530_34;
   _T536_45 = x_21 & 1;
   MEM <CARDINAL[1:64]> [(CARDINAL *)&buf][i.40_2]{lb: 1 sz: 4} = _T536_45;
-  _T537_47 = x_21 / 2;
+  _T537_47 = x_21 >> 1;
   # DEBUG x => _T537_47
-  if (x_21 > 1)
+  if (_T537_47 != 0)
     goto <bb 3>; [INV]
   else
     goto <bb 8>; [INV]
and apparently it is only the latter form that number_of_iterations_cltz
pattern matches, not the former (after all, that was the exact reason
for r15-7223).
The problem is that build_cltz_expr assumes if IFN_C[LT]Z can't be used it
can use the __builtin_c[lt]z{,l,ll} builtins, and while most of the FEs do
create them, modula 2 does not.

The following patch just lets us punt if the FE doesn't build those builtins.
I've filed a PR against modula2 so that they add the builtins too.

2025-01-31  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118689
PR modula2/115032
* tree-ssa-loop-niter.cc (build_cltz_expr): Return NULL_TREE if fn is
NULL and use_ifn is false.

Do not rely on non-SLP analysis for SLP outer loop vectorization

We end up relying on non-SLP analysis of the inner loop LC PHI to
set the vectorizationb method for SLP since vectorizable_reduction
claims responsibility. The following fixes this.

* tree-vect-loop.cc (vect_analyze_loop_operations): Only
call vectorizable_lc_phi when not PURE_SLP.
(vectorizable_reduction): Do not claim having handled
the inner loop LC PHI for outer loop vectorization.

Daily bump.

libbacktrace: add casts to avoid undefined shifts

Patch from pgerell@github.

* elf.c (elf_fetch_bits): Add casts to avoid potentially shifting
a value farther than its type size.
(elf_fetch_bits_backward): Likewise.
(elf_uncompress_lzma_block): Likewise.
(elf_uncompress_lzma): Likewise.

[testsuite] require profiling support [PR113689]

pr113689 testcases use -fprofile without testing for profiling
support. Fix them.

for gcc/testsuite/ChangeLog

PR target/113689
* gcc.target/i386/pr113689-1.c: Require profiling support.
* gcc.target/i386/pr113689-2.c: Likewise.
* gcc.target/i386/pr113689-3.c: Likewise.

[testsuite] require -Ofast for vect-ifcvt-18 even without avx

The test expects transformations that depend on -Ofast on x86*, but
that option is only passed when the avx_runtime is available.

Split -Ofast out of the avx conditional, so that it is passed on the
same targets that expect the transformation.

for gcc/testsuite/ChangeLog

* gcc.dg/vect/vect-ifcvt-18.c: Split -Ofast out of
avx_runtime.

AVR: Provide built-ins for strlen where the string lives in some AS.

This patch adds built-in functions __builtin_avr_strlen_flash,
__builtin_avr_strlen_flashx and __builtin_avr_strlen_memx.
Purpose is that higher-level functions can use __builtin_constant_p
on strlen without raising a diagnostic due to -Waddr-space-convert.

gcc/
* config/avr/builtins.def (STRLEN_FLASH, STRLEN_FLASHX)
(STRLEN_MEMX): New DEF_BUILTIN's.
* config/avr/avr.cc (avr_ftype_strlen): New static function.
(avr_builtin_supported_p): New built-ins are not for AVR_TINY.
(avr_init_builtins) <strlen_flash_node, strlen_flashx_node,
strlen_memx_node>: Provide new fntypes.
(avr_fold_builtin) [AVR_BUILTIN_STRLEN_FLASH]
[AVR_BUILTIN_STRLEN_FLASHX, AVR_BUILTIN_STRLEN_MEMX]: Fold if
possible.
* doc/extend.texi (AVR Built-in Functions): Document
__builtin_avr_strlen_flash, __builtin_avr_strlen_flashx,
__builtin_avr_strlen_memx.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _strlen_memx.
* config/avr/lib1funcs.S <L_strlen_memx, __strlen_memx>: Implement.

AVR: Only provide a built-in when it is available.

Some built-ins are not available for C++ since they are using
named address-spaces or fixed-point types.

gcc/
* config/avr/builtins.def (AVR_FIRST_C_ONLY_BUILTIN_ID): New macro.
* config/avr/avr-protos.h (avr_builtin_supported_p): New.
* config/avr/avr.cc (avr_builtin_supported_p): New function.
(avr_init_builtins): Only provide a built-in when it is supported.
* config/avr/avr-c.cc (avr_cpu_cpp_builtins): Only define the
__BUILTIN_AVR_<NAME> build-in defines when the associated built-in
function is supported.
* doc/extend.texi (AVR Built-in Functions): Add a note that
following built-ins are supported for only for GNU-C.

OpenMP: Update documentation of metadirective implementation status.

libgomp/ChangeLog
* libgomp.texi (OpenMP 5.0): Mark metadirective and declare variant
as implemented.
(OpenMP 5.1): Mark target_device as supported.
Add changed interaction between declare target and OpenMP context
and dynamic selector support.
(OpenMP 5.2): Mark otherwise clause as supported, note that
default is also still accepted.

OpenMP: Fortran support for metadirectives and dynamic selectors

gcc/fortran/ChangeLog
PR middle-end/112779
PR middle-end/113904
* decl.cc (gfc_match_end): Handle COMP_OMP_BEGIN_METADIRECTIVE and
COMP_OMP_METADIRECTIVE.
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_METADIRECTIVE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_METADIRECTIVE,
ST_OMP_BEGIN_METADIRECTIVE, and ST_OMP_END_METADIRECTIVE.
(struct gfc_omp_clauses): Rename target_first_st_is_teams to
target_first_st_is_teams_or_meta.
(struct gfc_omp_variant): New.
(gfc_get_omp_variant): New.
(struct gfc_st_label): Add omp_region field.
(enum gfc_exec_op): Add EXEC_OMP_METADIRECTIVE.
(struct gfc_code): Add omp_variants fields.
(gfc_free_omp_variants): Declare.
(match_omp_directive): Declare.
(is_omp_declarative_stmt): Declare.
* io.cc (format_asterisk): Adjust initializer.
* match.h (gfc_match_omp_begin_metadirective): Declare.
(gfc_match_omp_metadirective): Declare.
* openmp.cc (gfc_omp_directives): Uncomment metadirective.
(gfc_match_omp_eos): Adjust to match context selectors.
(gfc_free_omp_variants): New.
(gfc_match_omp_clauses): Remove context_selector parameter and adjust
to use gfc_match_omp_eos instead.
(match_omp): Adjust call to gfc_match_omp_clauses.
(gfc_match_omp_context_selector): Add metadirective_p parameter and
adjust error-checking.  Adjust matching of simd clauses.
(gfc_match_omp_context_selector_specification): Adjust parameters
so it can be used for metadirective as well as declare variant.
(match_omp_metadirective): New.
(gfc_match_omp_begin_metadirective): New.
(gfc_match_omp_metadirective): New.
(resolve_omp_metadirective): New.
(resolve_omp_target): Handle metadirectives.
(gfc_resolve_omp_directive): Handle EXEC_OMP_METADIRECTIVE.
* parse.cc (gfc_matching_omp_context_selector): New.
(gfc_in_omp_metadirective_body): New.
(gfc_omp_region_count): New.
(decode_omp_directive): Handle ST_OMP_BEGIN_METADIRECTIVE and
ST_OMP_METADIRECTIVE.
(match_omp_directive): New.
(case_omp_structured_block): Define.
(case_omp_do): Define.
(gfc_ascii_statement): Handle ST_OMP_BEGIN_METADIRECTIVE,
ST_OMP_END_METADIRECTIVE, and ST_OMP_METADIRECTIVE.
(accept_statement):  Handle ST_OMP_METADIRECTIVE and
ST_OMP_BEGIN_METADIRECTIVE.
(gfc_omp_end_stmt): New, split from...
(parse_omp_do): ...here, and...
(parse_omp_structured_block): ...here.  Handle metadirectives,
plus "allocate", "atomic", and "dispatch" which were missing.
(parse_omp_oacc_atomic): Handle "end metadirective".
(parse_openmp_allocate_block): Likewise.
(parse_omp_dispatch): Likewise.
(parse_omp_metadirective_body): New.
(parse_executable): Handle metadirective.  Use new case macros
defined above.
(gfc_parse_file): Initialize metadirective state.
(is_omp_declarative_stmt): New.
* parse.h (enum gfc_compile_state): Add COMP_OMP_METADIRECTIVE
and COMP_OMP_BEGIN_METADIRECTIVE.
(gfc_omp_end_stmt): Declare.
(gfc_matching_omp_context_selector): Declare.
(gfc_in_omp_metadirective_body): Declare.
(gfc_omp_metadirective_region_count): Declare.
* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_METADIRECTIVE.
* st.cc (gfc_free_statement): Likewise.
* symbol.cc (compare_st_labels): Handle labels within a metadirective
body.
(gfc_get_st_label): Likewise.
* trans-decl.cc (gfc_get_label_decl): Encode the metadirective region
in the label_name.
* trans-openmp.cc (gfc_trans_omp_directive): Handle
EXEC_OMP_METADIRECTIVE.
(gfc_trans_omp_set_selector): New, split/adapted from code....
(gfc_trans_omp_declare_variant): ...here.
(gfc_trans_omp_metadirective): New.
* trans-stmt.h (gfc_trans_omp_metadirective): Declare.
* trans.cc (trans_code): Handle EXEC_OMP_METADIRECTIVE.

gcc/testsuite/ChangeLog
PR middle-end/112779
PR middle-end/113904
* gfortran.dg/gomp/metadirective-1.f90: New.
* gfortran.dg/gomp/metadirective-10.f90: New.
* gfortran.dg/gomp/metadirective-11.f90: New.
* gfortran.dg/gomp/metadirective-12.f90: New.
* gfortran.dg/gomp/metadirective-13.f90: New.
* gfortran.dg/gomp/metadirective-2.f90: New.
* gfortran.dg/gomp/metadirective-3.f90: New.
* gfortran.dg/gomp/metadirective-4.f90: New.
* gfortran.dg/gomp/metadirective-5.f90: New.
* gfortran.dg/gomp/metadirective-6.f90: New.
* gfortran.dg/gomp/metadirective-7.f90: New.
* gfortran.dg/gomp/metadirective-8.f90: New.
* gfortran.dg/gomp/metadirective-9.f90: New.
* gfortran.dg/gomp/metadirective-construct.f90: New.
* gfortran.dg/gomp/metadirective-no-score.f90: New.
* gfortran.dg/gomp/pure-1.f90 (func_metadirective): New.
(func_metadirective_2): New.
(func_metadirective_3): New.
* gfortran.dg/gomp/pure-2.f90 (func_metadirective): Delete.

libgomp/ChangeLog
PR middle-end/112779
PR middle-end/113904
* testsuite/libgomp.fortran/metadirective-1.f90: New.
* testsuite/libgomp.fortran/metadirective-2.f90: New.
* testsuite/libgomp.fortran/metadirective-3.f90: New.
* testsuite/libgomp.fortran/metadirective-4.f90: New.
* testsuite/libgomp.fortran/metadirective-5.f90: New.
* testsuite/libgomp.fortran/metadirective-6.f90: New.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
Co-Authored-By: Paul-Antoine Arras <pa@codesourcery.com>

s390: Fix up *vec_cmpgt{,u}<mode><mode>_nocc_emu splitters [PR118696]

The following testcase is miscompiled on s390x-linux with e.g. -march=z13
(both -O0 and -O2) starting with r15-7053.
The problem is in the splitters which emulate TImode/V1TImode GT and GTU
comparisons.
For GT we want to do
(ior (gt (hi op1) (hi op2))
     (and (eq (hi op1) (hi op2)) (gtu (lo op1) (lo op2))))
and for GTU similarly except for gtu instead of gt in there.
Now, the splitter emulation is using V2DImode comparisons where on s390x
the hi part is in the first element of the vector, lo part in the second,
and for the gtu case it swaps the elements of the vector.
So, we get the right result in the first element of the result vector.
But vrepg was then broadcasting the second element of the result vector
rather than the first, and the value of the second element of the vector
is instead
(ior (gt (lo op1) (lo op2))
     (and (eq (lo op1) (lo op2)) (gtu (hi op1) (hi op2))))
so something not really usable for the emulated comparison.

The following patch fixes that.  The testcase tries to test behavior of
double-word smin/smax/umin/umax with various cases of the halves of both
operands (one that is sometimes EQ, sometimes GT, sometimes LT, sometimes
GTU, sometimes LTU).

2025-01-30  Jakub Jelinek  <jakub@redhat.com>
    Stefan Schulze Frielinghaus  <stefansf@gcc.gnu.org>

PR target/118696
* config/s390/vector.md (*vec_cmpgt<mode><mode>_nocc_emu,
*vec_cmpgtu<mode><mode>_nocc_emu): Duplicate the first rather than
second V2DImode element.

* gcc.dg/pr118696.c: New test.
* gcc.target/s390/vector/pr118696.c: New test.
* gcc.target/s390/vector/vec-abs-emu.c: Expect vrepg with 0 as last
operand rather than 1.
* gcc.target/s390/vector/vec-max-emu.c: Likewise.
* gcc.target/s390/vector/vec-min-emu.c: Likewise.

c++: remove LAMBDA_EXPR_CAPTURES_THIS_P

This unused accessor is just a simple alias of LAMBDA_EXPR_THIS_CAPTURE
and contrary to its documentation doesn't use TREE_LANG_FLAG_0. Might
as well remove it.

gcc/cp/ChangeLog:

* cp-tree.h (LAMBDA_EXPR_CAPTURES_THIS_P): Remove.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: Update const_decl handling after r15-7259 [PR118673].

Objective-C++ uses CONST_DECLs to hold constant string objects
these should also be treated as mergable lvalues.

PR c++/118673

gcc/cp/ChangeLog:

* tree.cc (lvalue_kind): Mark CONST_DECLs as mergable
when they are also TREE_STATIC.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

middle-end/118695 - missed misalign handling in MEM_REF expansion

When MEM_REF expansion of a non-MEM falls back to a stack temporary
we fail to handle the case where the offset adjusted reference to
the temporary is not aligned according to the requirement of the
mode. We have to go through bitfield extraction or movmisalign
in this case. Fortunately there's a helper for this.

This fixes an ICE observed on arm which has sanity checks in its
move patterns for this.

PR middle-end/118695
* expr.cc (expand_expr_real_1): When expanding a MEM_REF
to a non-MEM by committing it to a stack temporary make
sure to handle misaligned accesses correctly.

* gcc.dg/pr118695.c: New testcase.

libstdc++: Use safe integer comparisons in std::latch [PR98749]

The std::latch::max() function assumes that the returned value can be
represented by ptrdiff_t, which is true when __platform_wait_t is int
(e.g. on Linux) but not when it's unsigned long, which is the case for
most other 64-bit targets. We should use the smaller of PTRDIFF_MAX and
std::numeric_limits<__platform_wait_t>::max(). Use std::cmp_less to do a
safe comparison that works for all types. We can also use std::cmp_less
and std::cmp_equal in std::latch::count_down so that we don't need to
deal with comparisons between signed and unsigned.

Also add a missing precondition check to constructor and fix the
existing check in count_down which was duplicated by mistake.

libstdc++-v3/ChangeLog:

PR libstdc++/98749
* include/std/latch (latch::max()): Ensure the return value is
representable as the return type.
(latch::latch(ptrdiff_t)): Add assertion.
(latch::count_down): Fix copy & pasted duplicate assertion. Use
std::cmp_equal to compare __platform_wait_t and ptrdiff_t
values.
(latch::_M_a): Use defined constant for alignment.
* testsuite/30_threads/latch/1.cc: Check max(). Check constant
initialization works for values in the valid range. Check
alignment.

OpenMP: append_args clause fixes + Fortran support

This fixes a large number of smaller and larger issues with the append_args
clause to 'declare variant' and adds Fortran support for it; it also contains
a larger number of testcases.

In particular, for Fortran, it also handles passing allocatable, pointer,
optional arguments to an interop dummy argument with or without value
attribute. And it changes the internal representation such that dumping the
tree does not lead to an ICE.

gcc/c/ChangeLog:

* c-parser.cc (c_finish_omp_declare_variant): Modify how
append_args is saved internally.

gcc/cp/ChangeLog:

* parser.cc (cp_finish_omp_declare_variant): Modify how append_args
is saved internally.
* pt.cc (tsubst_attribute): Likewise.
(tsubst_omp_clauses): Remove C_ORT_OMP_DECLARE_SIMD from interop
handling as no longer called for it.
* decl.cc (omp_declare_variant_finalize_one): Update append_args
changes; fixes for ADL input.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_omp_declare_variant): Add append_args_list.
* openmp.cc (gfc_parser_omp_clause_init_modifiers): New;
splitt of from ...
(gfc_match_omp_init): ... here; call it.
(gfc_match_omp_declare_variant): Update to handle append_args
clause; some syntax handling fixes.
* trans-openmp.cc (gfc_trans_omp_declare_variant): Handle
append_args clause; add some diagnostic.

gcc/ChangeLog:

* gimplify.cc (gimplify_call_expr): For OpenMP's append_args clause
processed by 'omp dispatch', update for internal-representation
changes; fix handling of hidden arguments, add some comments and
handle Fortran's value dummy and optional/pointer/allocatable actual
args.

libgomp/ChangeLog:

* libgomp.texi (Impl. Status): Update for accumpulated changes
related to 'dispatch' and interop.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/append-args-1.c: Update dg-*.
* c-c++-common/gomp/append-args-3.c: Likewise.
* g++.dg/gomp/append-args-1.C: Likewise.
* gfortran.dg/gomp/adjust-args-1.f90: Likewise.
* gfortran.dg/gomp/adjust-args-3.f90: Likewise.
* gfortran.dg/gomp/declare-variant-2.f90: Likewise.
* c-c++-common/gomp/append-args-6.c: New test.
* c-c++-common/gomp/append-args-7.c: New test.
* c-c++-common/gomp/append-args-8.c: New test.
* c-c++-common/gomp/append-args-9.c: New test.
* g++.dg/gomp/append-args-4.C: New test.
* g++.dg/gomp/append-args-5.C: New test.
* g++.dg/gomp/append-args-6.C: New test.
* g++.dg/gomp/append-args-7.C: New test.
* gcc.dg/gomp/append-args-1.c: New test.
* gfortran.dg/gomp/append_args-1.f90: New test.
* gfortran.dg/gomp/append_args-2.f90: New test.
* gfortran.dg/gomp/append_args-3.f90: New test.
* gfortran.dg/gomp/append_args-4.f90: New test.

middle-end/118692 - ICE with out-of-bound ref expansion

The following guards the BIT_FIELD_REF expansion fallback for
MEM_REFs of entities expanded to register (or constant) further,
avoiding large out-of-bound offsets by, when the access does not
overlap the base object, expanding the offset as if it were zero.

PR middle-end/118692
* expr.cc (expand_expr_real_1): When expanding a MEM_REF
as BIT_FIELD_REF avoid large offsets for accesses not
overlapping the base object.

* gcc.dg/pr118692.c: New testcase.

tree-optimization/114052 - consider infinite sub-loops when lowering iter bound

When we walk stmts to find always executed stmts with UB in the last
iteration to be able to reduce the iteration count by one we fail
to consider infinite subloops in the last iteration that would make
such stmt not execute. The following adds this.

PR tree-optimization/114052
* tree-ssa-loop-niter.cc (maybe_lower_iteration_bound): Check
for infinite subloops we might not exit.

* gcc.dg/pr114052-1.c: New testcase.

pair-fusion: Check for invalid use arrays [PR118320]

As Andrew says in the bugzilla comments, this PR is about a case where
we tried to fuse two stores of x0, one in which x0 was defined and one
in which it was undefined.  merge_access_arrays failed on the conflict,
but the failure wasn't caught.

Normally the hazard detection code would fail if the instructions
had incompatible uses.  However, an undefined use doesn't impose
many restrictions on movements.  I think this is likely to be the
only case where hazard detection isn't enough.

As Andrew notes in bugzilla, it might be possible to allow uses
of defined and undefined values to be merged to the defined value.
But that sounds dangerous in the general case, as an rtl-ssa-level
decision.  We might run the risk of turning conditional UB into
unconditional UB.  And LLVM proves that the definition of "undef"
isn't simple.

gcc/
PR rtl-optimization/118320
* pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Commonize
the merge of input_uses and return early if it fails.

gcc/testsuite/
PR rtl-optimization/118320
* g++.dg/torture/pr118320.C: New test.

[PR testsuite/116860] Testsuite adjustment for recently added tests

There's two new tests that are dependent on logical-op-non-short-circuit
settings. The BZ is reported against ppc64 and ppc64le, but also applies to a
goodly number of the other targets.

The "regression" fix is trivial, just add the appropriate param to force the
behavior we're expecting. I'm committing that fix momentarily. It's been
verified on ppc64, ppc64le and x86_64 as well as the various embedded targets
in my tester where many FAILS flip to PASS.

I'm leaving the bug open without the regression marker as Jakub has noted a
couple of improvements that we can and probably should make.

PR target/116860
gcc/testsuite
* gcc.dg/tree-ssa/fold-xor-and-or.c: Set logical-op-non-short-circuit.
* gcc.dg/tree-ssa/fold-xor-or.c: Similarly.

Daily bump.

PR modula2/116073 invalid rtl sharing compiling FileSystem.mod caused by ext-dce

The bug fixes to PR modula2/118010 and PR modula2/118183 uncovered a bug
in the procedure interface to lseek which uses SYSTEM.COFF_T rather than
SYSTEM.CSSIZE_T. This patch sets the default size for COFF_T to the same
as CSSIZE_T.

gcc/ChangeLog:
PR modula2/118010
PR modula2/118183
PR modula2/116073
* doc/gm2.texi (-fm2-file-offset-bits=): Change the default size
description to CSSIZE_T.
Add COFF_T to the list of data types exported by SYSTEM.def.

gcc/m2/ChangeLog:
PR modula2/118010
PR modula2/118183
PR modula2/116073
* gm2-compiler/M2Options.mod (OffTBits): Assign to 0.
* gm2-gcc/m2type.cc (build_m2_specific_size_type): Ensure that
layout_type is called before returning c.
(build_m2_offt_type_node): If GetFileOffsetBits returns 0 then
use the type size of ssize_t.

gcc/testsuite/ChangeLog:

PR modula2/118010
PR modula2/118183
PR modula2/116073
* gm2/pim/run/pass/printtypesize.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

d: give dependency files better filenames [PR118477]

Currently, the dependency files for root-file.o and common-file.o were
both d/.deps/file.Po, which would cause parallel builds to fail
sometimes with:

  make[3]: Leaving directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
  make[3]: Entering directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
  mv: cannot stat 'd/.deps/file.TPo': No such file or directory
  make[3]: *** [/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/gcc-14-20240511/gcc/d/Make-lang.in:421: d/root-file.o] Error 1 shuffle=131581365

Also, this means that dependencies of one of root-file or common-file
are missing when developing.  After this patch, those two files get
assigned dependency files d/.deps/root-file.Po and
d/.deps/common-file.Po respectively, so match the actual object
files in the d/ subdirectory.

There are other files with similar conflicts (mangle-package.o,
visitor-package.o for instance).

2025-01-29  Arsen Arsenović  <arsen@aarsen.me>
    Jakub Jelinek  <jakub@redhat.com>

PR d/118477
* Make-lang.in (DCOMPILE, DPOSTCOMPILE): Use $(basename $(@F))
instead of $(*F).

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>

pair-fusion: A couple of fixes for sp updates [PR118429]

The PR showed two issues with pair-fusion.  The first is that the pass
treated stack pointer deallocations as ordinary register updates, and so
might move them earlier than another stack access (through a different
base register) that doesn't alias the pair candidate.

The simplest fix for that seems to be to prevent the stack deallocation
from being moved.  This part might (or might not) be a latent source of
wrong code and so worth backporting in some form.  (The patch as-is
won't work for GCC 14.)

The second issue only started with r15-6551, which added a memory
write to stack allocations and deallocations.  We should use the
existing tombstone mechanism to preserve the associated memory
definition.  (Deleting definitions immediately would have quadratic
complexity in the worst case.)

gcc/
PR rtl-optimization/118429
* pair-fusion.cc (latest_hazard_before): Add an extra parameter
to say whether the instruction is a load or a store.  If the
instruction is not a load or store and has memory side effects,
prevent it from being moved earlier.
(pair_fusion::find_trailing_add): Update call accordingly.
(pair_fusion_bb_info::fuse_pair): If the trailng addition had
a memory side-effect, use a tombstone to preserve it.

gcc/testsuite/
PR rtl-optimization/118429
* gcc.c-torture/compile/pr118429.c: New test.

AVR: Allow to share libgcc's __negsi2.

libgcc has a module for __negsi2: REG_22:SI := - REG_22:SI.
This patch adds a pattern that allows to share that function
provided optimize_size.

gcc/
* config/avr/avr.md (*negsi2.libgcc): New insn.

c++: add fixed test [PR57533]

Fixed by r11-2412.

PR c++/57533

gcc/testsuite/ChangeLog:

* g++.dg/eh/throw5.C: New test.

testsuite/118127: Pass fortran tests on ppc64le for IEEE128 long doubles

Denormal behaviour is well defined for IEEE128 long doubles, so
XFAIL some gfortran tests only for targets with the IBM128 long double
ABI.

gcc/testsuite/ChangeLog:

PR testsuite/118127
* lib/target-supports.exp
(check_effective_target_long_double_is_ibm128): New
procedure.
* gfortran.dg/default_format_2.f90: xfail for
long_double_is_ibm128.
* gfortran.dg/default_format_denormal_2.f90: Likewise.
* gfortran.dg/large_real_kind_form_io_2.f90: Likewise.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>

[PATCH] RX: Restrict displacement ranges in "Q" constraint

When using the "Q" constraint in the inline assembler, the displacement value
could exceed the range specified by the instruction.
To avoid this issue, a displacement range check is added to the "Q" constraint.

gcc/
* config/rx/constraints.md (Q): Also check that the address
passes rx_is_restricted_memory-address.

libstdc++: Fix views::transform(move_only_fn{}) forwarding [PR118413]

The range adaptor perfect forwarding simplification mechanism is currently
only enabled for trivially copyable bound arguments, to prevent undesirable
copies of complex objects. But "trivially copyable" is the wrong property
to check for here, since a move-only type with a trivial move constructor
is considered trivially copyable, and after P2492R2 we can't assume copy
constructibility of the bound arguments. This patch makes the mechanism
more specifically check for trivial copy constructibility instead so
that it's properly disabled for move-only bound arguments.

PR libstdc++/118413

libstdc++-v3/ChangeLog:

* include/std/ranges (views::__adaptor::_Partial): Adjust
constraints on the "simple" partial specializations to require
is_trivially_copy_constructible_v instead of
is_trivially_copyable_v.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc (test04):
Extend P2494R2 test.
* testsuite/std/ranges/adaptors/transform.cc (test09): Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

split-path: Small fix for poor_ifcvt_pred (tsvc s258) [PR118505]

After r15-3436-gb2b20b277988ab, poor_ifcvt_pred returns false for
the case where the statement could trap but in this case trapping
instructions cannot be made unconditional so it is a poor ifcvt.

This fixes a small preformance regression with TSVC s258 at
`-O3 -ftrapping-math` on aarch64 where ifcvt would not happen
and we would still have a branch.

On a specific aarch64, we go from 0.145s down to 0.118s.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/118505
* gimple-ssa-split-paths.cc (poor_ifcvt_pred): Return
true for trapping statements.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

split-path: CALL_EXPR can't show up in gimple_assign

While working on split path, I noticed that poor_ifcvt_candidate_code
would check for CALL_EXPR but that can't show up in gimple_assign
so this removes that check.

This could be a very very small compile time improvement.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (poor_ifcvt_candidate_code): Remove CALL_EXPR handling.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

tree-ssa-dce: Avoid creating invalid BBs with no outgoing edge (PR117892)

Zhendong Su and Michal Jireš found out that our gimple DSE pass can,
under fairly specific conditions, remove a noreturn call which then
leaves behind a "normal" BB with no successor edges which following
passes do not expect.  This patch simply tells the pass to leave such
calls alone even when they otherwise appear to be dead.

Interestingly, our CFG verifier does not report this.  I'll put on my
todo list to add a test for it in the next stage 1.

gcc/ChangeLog:

2025-01-28  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/117892
* tree-ssa-dse.cc (dse_optimize_call): Leave control-altering
noreturn calls alone.

gcc/testsuite/ChangeLog:

2025-01-27  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/117892
* gcc.dg/tree-ssa/pr117892.c: New test.
* gcc.dg/tree-ssa/pr118517.c: Likewise.

co-authored-by: Michal Jireš <mjires@suse.cz>

RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]

This patch would like to fix the wroing code generation for the scalar
signed SAT_TRUNC.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually
want compare -129 to -128, but if there is no sign extend like lbu, we will
compare 0xff7f to 0xffffffffffffff80(assum Xmode is DImode).  Thus, we have
to sign extend 0xff(Qmode) to 0xffffffffffffff7f(assume Xmode is DImode)
before compare in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macros.
* gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]

This patch would like to fix the wroing code generation for the scalar
signed SAT_SUB.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like sub.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually
want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get
0xff - 1 = 0xfe which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before sub in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688.h: Add test helper macro.
* gcc.target/riscv/pr117688-sub-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]

This patch would like to fix the wroing code generation for the scalar
signed SAT_ADD.  The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode.  Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add.  The gen_lowpart
will generate something like lbu which has all zero for highest bits.

For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually
want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get
0xff + 2 = 0x101 which is incorrect.  Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before plus in Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

PR target/117688

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr117688-add-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s8.c: New test.
* gcc.target/riscv/pr117688.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]

This patch would like to refactor the helper function of the SAT_*
scalar. The helper function will convert the define_pattern ops
to the xmode reg for the underlying code-gen. This patch add
new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int
or the mode is non-Xmode.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ...
(riscv_extend_to_xmode_reg): Rename to and add rtx_code for
zero/sign extend if non-Xmode.
(riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND.
(riscv_expand_ussub): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

middle-end/118684 - fix fallout of wrong stack local alignment fix

When we expand BIT_FIELD_REF <x_2(D), 8, 8> we can end up creating
a stack local, running into the fix. But get_object_alignment
will return 8 for any SSA_NAME because that's not an "object" we
handle. Deal with handled components on registers by singling out
SSA_NAME bases, using their type alignment instead of
get_object_alignment (I considered "robustifying" get_object_alignment,
but decided not to at this point).

This fixes an ICE on gcc.dg/pr41123.c on arm as reported by the CI.

PR middle-end/118684
* expr.cc (expand_expr_real_1): When creating a stack local
during expansion of a handled component, when the base is
a SSA_NAME use its type alignment and avoid calling
get_object_alignment.

* gcc.dg/pr118684.c: Require automatic_stack_alignment.

c++: Return false from __is_bounded_array for zero-sized arrays [PR118655]

This is basically Marek's PR114479 r14-9759 __is_array fix applied to
__is_bounded_array as well. Similarly to that trait, when not using
the builtin it returned false for zero sized arrays but when using
the builtin it returns true.

2025-01-29 Jakub Jelinek <jakub@redhat.com>

PR c++/118655
* semantics.cc (trait_expr_value) <case CPTK_IS_BOUNDED_ARRAY>: Return
false for zero-sized arrays.

* g++.dg/ext/is_bounded_array.C: Extend.

Fortran: fix passing of component ref to assumed-rank dummy [PR118683]

While the fix for pr117774 addressed the passing of an inquiry reference
to an assumed-rank dummy, it missed the similar case of passing a component
reference. The newer testcase gfortran.dg/pr81978.f90 uncovered this
latent issue with a UBSAN instrumented compiler.

PR fortran/118683

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): The bounds update for
passing to assumed-rank dummies shall also handle component
references besides inquiry references.

Daily bump.

c++: constexpr VEC_INIT_EXPR [PR118285]

cxx_eval_vec_init_1 was doing the wrong thing for an array of
self-referential class type; just evaluating the TARGET_EXPR initializer
creates a new object that refers to the TARGET_EXPR_SLOT, if we want it to
refer properly to the initialization target we need to provide it.

PR c++/118285

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_vec_init_1): Build INIT_EXPR for
initializing a class.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt7.C: New test.

c++: init-list opt and lvalue initializers [PR118673]

When fn returns {extension}, the ArrayRef in the initializer_list is
constructed to point to 'extension', the variable with static storage
duration. The optimization was copying extension's value into a temporary
array and constructing the ArrayRef to point to that temporary copy instead,
resulting in a dangling pointer. So suppress this optimization if the
element constructor takes a reference and the initializer is a non-mergeable
lvalue.

PR c++/118673

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Check for lvalue
initializers.
* cp-tree.h (enum cp_lvalue_kind_flags): Add clk_mergeable.
* tree.cc (lvalue_kind): Return it.
(non_mergeable_glvalue_p): New.
(test_lvalue_kind): Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt6.C: New test.

arm: libgcc: make -spec=sync-*.specs compatible with LTO [PR118642]

The arm-none-eabi port provides some alternative implementations of
__sync_synchronize for different implementations of the architecture.
These can be selected using one of -specs=sync-{none,dmb,cp15dmb}.specs.

These specs fragments fail, however, when LTO is used because they
unconditionally add a --defsym=__sync_synchronize=<implementation> to
the linker arguments and that fails if libgcc is not added to the list
of libraries.

Fix this by only adding the defsym if libgcc will be passed to the
linker.

libgcc/

PR target/118642
* config/arm/sync-none.specs (link): Only add the defsym if
libgcc will be used.
* config/arm/sync-dmb.specs: Likewise.
* config/arm/sync-cp15dmb.specs: Likewise.

middle-end/118684 - wrongly aligned stack local during expansion

The following fixes a not properly aligned stack temporary created
during RTL expansion of a MEM_REF that we handle as a BIT_FIELD_REF
whose base was allocated to a register but which was originally
aligned to allow a larger load not trapping. While probably UB
in C the vectorizer creates aligned accesses that might overread
a (static) allocation because it is then known not to trap.

PR middle-end/118684
* expr.cc (expand_expr_real_1): When expanding a reference
based on a register and we end up needing a MEM make sure
that's aligned as the original reference required.

* gcc.dg/pr118684.c: New testcase.

input.cc: show line record indices in file_cache_slot::dump

gcc/ChangeLog:
* input.cc (file_cache_slot::dump): Show indices within
m_line_record when dumping entries.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sarif output: escape braces in messages [PR118675]

gcc/ChangeLog:
PR other/118675
* diagnostic-format-sarif.cc: Define INCLUDE_STRING.
(escape_braces): New.
(set_string_property_escaping_braces): New.
(sarif_builder::make_message_object): Escape braces in the "text"
property.
(sarif_builder::make_message_object_for_diagram): Likewise, and
for the "markdown" property.
(sarif_builder::make_multiformat_message_string): Likewise for the
"text" property.
(xelftest::test_message_with_braces): New.
(selftest::diagnostic_format_sarif_cc_tests): Call it.

gcc/testsuite/ChangeLog:
PR other/118675
* gcc.dg/sarif-output/bad-binary-op.py: Update expected output for
escaping of braces in message text.
* gcc.dg/sarif-output/missing-semicolon.py: Likewise.
* gcc.dg/sarif-output/multiple-outputs.py: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

vect: Fix permutation counting in VLA-friendly path [PR117270]

vectorizable_slp_permutation_1 has two ways of generating the
permutations: one that looks for repeating patterns and one that
calculates the permutation index for every output element individually.
The former works for VLA and VLS whereas the latter only works for VLS.

There are two justifications for using the repeating code for VLS:
it gives more testing coverage, and it should reduce the analysis
overhead for common cases.  This PR kind-of demonstrates both:
the VLS coverage was showing a bug in the analysis shortcut.

The bug seems to go back to g:ab7e60cec1a6, which added the
repeating_p path.  It generated N copies of the permutation vector
in the repeating case, but didn't multiply the number of permutation
instructions for costing purposes by N.  So we seem to have been
undercounting ncopies>1 permutations all this time...

The problem became more visible with g:8157f3f2d211, which extended
the repeating code to handle more cases.

In the patch, I think noutputs is in practice always a multiple
of unpack_factor, but it seemed more future-proof to handle the
general case.

gcc/
PR tree-optimization/117270
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Make nperms
account for the number of times that each permutation will be used
during transformation.

c++: friend vs inherited guide confusion [PR117855]

We recently started using the lang_decl_fn::context field to track
inheritedness of a deduction guide (for C++23 inherited CTAD). This
new overloading of the field accidentally made DECL_FRIEND_CONTEXT
return non-NULL for inherited guides, which breaks the below testcase
during overload resolution with an inherited guide.

This patch fixes this by refining DECL_FRIEND_CONTEXT appropriately.

PR c++/117855

gcc/cp/ChangeLog:

* cp-tree.h (DECL_FRIEND_CONTEXT): Exclude deduction guides.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

tree-optimization/112859 - add comment

This adds a comment before the workaround, indicating flaky
dependence analysis.

PR tree-optimization/112859
* tree-loop-distribution.cc
(loop_distribution::pg_add_dependence_edges): Add comment.

arm: libbacktrace: Check if the compiler supports __sync atomics

Older versions of the Arm architecture lack support for __sync
operations directly in hardware and require calls into appropriate
operating-system hooks. But such hooks obviously don't exist in a
freestanding environment.

Consquently, it is incorrect to assume during configure that such
functions will exist and we need a configure-time check to determine
whether or not these routines will work.

libbacktrace:

* configure.ac: Always check if the compiler supports __sync
operations.
* configure: Regenerated.

[PR118663][LRA]: Change secondary memory mode only if there are regs holding the changed mode

My recent patch for PR118067 changes the secondary memory mode if
all regs of the pseudo reg class are prohibited in the secondary mode.
But the patch does not check a special case when the
corresponding target hook returns this mode although there are no hard
regs of pseudo class holding value of the mode at all. This results
in given PR and this patch fixes it.

gcc/ChangeLog:

PR target/118663
* lra-constraints.cc (invalid_mode_reg_p): Check empty
reg_class_contents.

gcc/testsuite/ChangeLog:

PR target/118663
* gcc.target/powerpc/pr118663.c: New.

tree-optimization/117424 - invalid LIM of trapping ref

The following addresses a bug in tree_could_trap_p leading to
hoisting of a possibly trapping, because of out-of-bound, access.
We only ensured the first accessed byte is within a decl there,
the patch makes sure the whole base of the reference is within it.
This is pessimistic if a handled component would then subset to
a sub-object within the decl but upcasting of a decl to larger
types should be uncommon, questionable, and wrong without
-fno-strict-aliasing.

The testcase is a bit fragile, but I could not devise a (portable)
way to ensure an out-of-bound access to a decl would fault.

PR tree-optimization/117424
* tree-eh.cc (tree_could_trap_p): Verify the base is
fully contained within a decl.

* gcc.dg/tree-ssa/ssa-lim-25.c: New testcase.

Clarify 'OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P' in 'gcc/tree-pretty-print.cc:dump_omp_clause'

In commit b7e20480630e3eeb9eed8b3941da3b3f0c22c969
"openmp: Relax handling of implicit map vs. existing device mappings",
'OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P' was added next to 'OMP_CLAUSE_MAP_IMPLICIT'
with comment: "NOTE: this is different than OMP_CLAUSE_MAP_IMPLICIT". However,
dumping it as '[implicit]' doesn't exactly help for telling the two apart; make
that '[runtime_implicit]'. Also, prepend a space character, similar to how
we're doing with other such attributes.

gcc/
* tree-pretty-print.cc (dump_omp_clause): Clarify
'OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P'.
gcc/testsuite/
* c-c++-common/gomp/defaultmap-4.c: Adjust.
* c-c++-common/gomp/defaultmap-5.c: Likewise.
* c-c++-common/gomp/target-implicit-map-1.c: Likewise.
* c-c++-common/gomp/target-implicit-map-2.c: Likewise.
* gfortran.dg/gomp/defaultmap-8.f90: Likewise.
* gfortran.dg/gomp/defaultmap-9.f90: Likewise.
* gfortran.dg/gomp/map-subarray.f90: Likewise.
* gfortran.dg/gomp/target-enter-exit-data.f90: Likewise.

Remove ChangeLog entry that shouldn't be there.

combine: Fix up make_extraction [PR118638]

The following testcase is miscompiled at -Os on x86_64-linux.
The problem is during make_compound_operation of
(ashiftrt:SI (ashift:SI (mult:SI (reg:SI 107 [ a_5 ])
            (const_int 3 [0x3]))
        (const_int 31 [0x1f]))
    (const_int 31 [0x1f]))
where it incorrectly returns
(mult:SI (sign_extract:SI (reg:SI 107 [ a_5 ])
        (const_int 2 [0x2])
        (const_int 0 [0]))
    (const_int 3 [0x3]))
which isn't obviously true, the former returns either 0 or -1 depending
on the least significant bit of the multiplication,
the latter returns either 0 or -3 depending on the second least significant
bit of the multiplication argument.

The bug has been introduced in PR96998 r11-4563, which added handling of x
* (2^N) similar to x << N.  In the above case, pos is 0 and len is 1,
sign extracting a single least significant bit of the multiplication.
As 3 is not a power of 2, shift_amt is -1.
But IN_RANGE (-1, 1, 1 - 1) is still true, because the basic requirement of
IN_RANGE that LOWER is not greater than UPPER is violated.
The intention of using 1 as LOWER is to avoid matching multiplication by 1,
that really shouldn't appear in the IL.  But to avoid violating IN_RANGE
requirement, we need to verify that len is at least 2.

I've added this len > 1 check to the inner if rather than outer because I
think for GCC 16 we should add a further optimization.
In the particular case of 1 least significant bit sign extraction from
multiplication by 3, we could actually say it is equivalent to
(sign_extract:SI (reg:SI 107 [ a_5 ])
        (const_int 1 [0x2])
        (const_int 0 [0]))
That is because 3 is an odd number and multiplication by 2 will yield the
least significant bit 0 (we are sign extracting just one) and so the
multiplication doesn't change anything on the outcome.
More generally, even for larger len, multiplication by C which is
(1 << X) + 1 where X is >= len should be optimizable just to extraction
of the multiplicand's least significant len bits.

2025-01-28  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/118638
* combine.cc (make_extraction): Only optimize (mult x 2^n) if len is
larger than 1.

* gcc.c-torture/execute/pr118638.c: New test.

Add tests for implied copy of variables in reduction clause.

The OpenACC reduction clause on compute construct implies a copy clause
for each reduction variable [1]. This patch adds tests to check if the
implied copy is being generated. The check covers various types and
operators as described in the specification.

[1] OpenACC 2.7 Specification section 2.5.13

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/implied-copy-1.c: New test.
* c-c++-common/goacc/implied-copy-2.c: New test.
* g++.dg/goacc/implied-copy.C: New test.
* gcc.dg/goacc/implied-copy.c: New test.
* gfortran.dg/goacc/implied-copy-1.f90: New test.
* gfortran.dg/goacc/implied-copy-2.f90: New test.

vect: Remove extra newline from dump message

Noticed while working PR117270, where it was a distracting
difference for before-after comparisons.

gcc/
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
extra newline from dump message.

c: For array element type drop qualifiers but keep other properties of the element type [PR116357]

In the following testcase we error on the first case because it is
trying to construct an array from overaligned type, but if there are
qualifiers, we accept it silently (unlike in C++ which diagnoses all 3).

The problem is that grokdeclarator if TYPE_QUALS (element_type) is
non-zero just uses TYPE_MAIN_VARIANT; that loses not just the qualifiers
but also attributes, alignment etc.

The following patch uses c_build_qualified_type with TYPE_UNQUALIFIED instead,
which will be in the common case the same as TYPE_MAIN_VARIANT if the
checks are satisfied for it, but if not, will look up different unqualified
type or even create it if there is none.

2025-01-28 Jakub Jelinek <jakub@redhat.com>

PR c/116357
* c-decl.cc (grokdeclarator): Use c_build_qualified_type with
TYPE_UNQUALIFIED instead of TYPE_MAIN_VARIANT.

* gcc.dg/pr116357.c: New test.

[PR target/114085] Fix H8 constraint issue which led to ICE

Nowhere near the top of my list, but a quick looksie Sunday led to an easy to
fix backend bug.  It's not a regression, but given its the H8 backend I think
we've safely got a degree of freedom here.

The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, so it
wasn't marked as a memory constraint.  LRA doesn't really handle this well -- a
pseudo which didn't get a hard reg was replaced by its MEM.  The stack slot
doesn't fit the limited addressing forms available and LRA didn't know it just
needed to reload the address into a reg.

Fixed by removing REG from the "U" constraint, turning "U" into a memory
constraint and adjusting a few patterns to allow "rU" instead of "U".

We don't really support C++ on the H8 and as a result libstdc++ won't build.
Interestingly enough that also keeps the C++ tests from working, even for a
compile-only test.  So no testcase.  Though I did check the reduced and
original test manually and ran it through my tester without any regressions.

PR target/114085
gcc/
* config/h8300/constraints.md (U): No longer accept REGs.
* config/h8300/logical.md (andqi3_2): Use "rU" rather than "U".
(andqi3_2_clobber_flags, andqi3_1, <code>qi3_1): Likewise.
* config/h8300/testcompare.md (tst_extzv_1_n): Likewise.

Daily bump.

c++: only strip conversions for deduction [PR118632]

In r15-2761 I changed unify to always strip IMPLICIT_CONV_EXPR from PARM.
In this testcase that leads to comparing nullptr to (int*)0, and failing
because they aren't the same. Let's only strip conversions if we're
actually going to try to deduce template arguments.

While we're at it, let's move this after the early exits.

And with this adjustment we can remove the workaround for mangle57.C.

PR c++/118632

gcc/cp/ChangeLog:

* pt.cc (unify): Only strip conversion if deducible_expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nontype7.C: New test.

RISC-V: Add another test for FRM elimination bug [PR118646]

The issue is same as PR118103 and fixed by commit 55d288d4ff53
("RISC-V: Make FRM as global register [PR118103]").

Essentially FRM save/restore were getting eliminated because FRM reg
semantics were not being modelled correctly.

In this case it showed up as SPEC2017 527.cam4 runtime aborts in
glibc:round_away() due to non-canonical rounding mode showing up,
"leaking" earlier in the call chain because such rounding mode
save/restore was getting eliminated.

PR target/118646

gcc/testsuite/ChangeLog:
* gfortran.target/riscv/rvv/pr118646.f90 (New Test).

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

c++: Don't prune constant capture proxies only used in array dimensions [PR114292]

We currently ICE upon the following valid (under -Wno-vla) code

=== cut here ===
void f(int c) {
constexpr int r = 4;
[&](auto) { int t[r * c]; }(0);
}
=== cut here ===

When parsing the lambda body, and more specifically the multiplication,
we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which indicates to
prune_lambda_captures that it might be possible to optimize out some
captures.

The problem is that prune_lambda_captures then misses the use of the r
capture (because neither walk_tree_1 nor cp_walk_subtrees walks the
dimensions of array types - here "r * c"), hence believes the capture
can be pruned... and we trip on an assert when instantiating the lambda.

This patch changes cp_walk_subtrees so that (1) when walking a
DECL_EXPR, it also walks the DECL's type, and (2) when walking an
INTEGER_TYPE, it also walks its TYPE_{MIN,MAX}_VALUE. Note that #2 makes
a <case INTEGER_TYPE> redundant in for_each_template_parm_r, and removes
it.

PR c++/114292

gcc/cp/ChangeLog:

* pt.cc (for_each_template_parm_r) <INTEGER_TYPE>: Remove case
now handled by cp_walk_subtrees.
* tree.cc (cp_walk_subtrees): Walk the type of DECL_EXPR
declarations, as well as the TYPE_{MIN,MAX}_VALUE of
INTEGER_TYPEs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-ice4.C: New test.

RISC-V: testsuite: Fix reduc-8.c and reduc-9.c

In both tests we expect a VEC_SHL_INSERT expression but we now add the
initial value at the end. Just remove that scan check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Remove
VEC_SHL_INSERT check.
* gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Ditto.

RISC-V: testsuite: Fix gather_load_64-12-zvbb.c

The test fails with _zvfh because we vectorize more. Just adjust the
test expectations.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c:
Distinguish between zvfh and !zvfh.

RISC-V: Disable two-source permutes for now [PR117173].

After testing on the BPI (4.2% improvement for x264 input 1, 4.4% for
input 2) and the discussion in PR117173 I figured it's best to disable
the two-source permutes by default for now.

The patch adds a parameter "riscv-two-source-permutes" which restores
the old behavior.

PR target/117173

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_generic_patterns): Only
support single-source permutes by default.
* config/riscv/riscv.opt: New param "riscv-two-source-permutes".

gcc/testsuite/ChangeLog:

* gcc.dg/fold-perm-2.c: Run with two-source permutes.
* gcc.dg/pr54346.c: Ditto.

Fortran: fix bogus diagnostics on renamed interface import [PR110993]

PR fortran/110993

gcc/fortran/ChangeLog:

* frontend-passes.cc (check_externals_procedure): Do not compare
interfaces of a non-bind(C) procedure against a bind(C) global one.
(check_against_globals): Use local name from rename-on-use in the
search for interfaces.

gcc/testsuite/ChangeLog:

* gfortran.dg/use_rename_14.f90: New test.

c++: Use mapped reads and writes when munmap and msync are available

Module support is broken when MAPPED_READING and MAPPED_WRITING
are defined to 0.  This causes internal compiler errors in the
permissive-error-1.C and permissive-error-2.C tests.

HP-UX 11.11 doesn't define _POSIX_MAPPED_FILES but it does have
munmap and msync.  Testing indicates support is sufficient for
c++ modules, so use checks for these functions instead of
_POSIX_MAPPED_FILES check.

2025-01-27  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR c++/116524
* configure.ac: Check for munmap and msync.
* configure: Regenerate.
* config.in: Regenerate.

gcc/cp/ChangeLog:
* module.cc: Test HAVE_MUNMAP and HAVE_MSYNC instead of
_POSIX_MAPPED_FILES > 0.

Remove mistakenly committed files

Sorry.

2025-01-27 Jakub Jelinek <jakub@redhat.com>

* g++.dg/modules/dr2867-1_a.H.jj1: Remove.
* g++.dg/modules/dr2867-1_b.C.jj1: Remove.
* g++.dg/modules/dr2867-2_a.H.jj1: Remove.
* g++.dg/modules/dr2867-2_b.C.jj1: Remove.
* g++.dg/modules/dr2867-3_a.H.jj1: Remove.
* g++.dg/modules/dr2867-3_b.C.jj1: Remove.
* g++.dg/modules/dr2867-4_a.H.jj1: Remove.
* g++.dg/modules/dr2867-4_b.C.jj1: Remove.