git.ipfire.org Git - thirdparty/gcc.git/log

Update ChangeLog and version files for release

Revert "libcpp: Fix up ##__VA_OPT__ handling [PR89971]"

This reverts commit 86b98701bf927ea438354723932e623557c04e42.

Daily bump.

Fortran: fix error recovery on invalid array section

gcc/fortran/ChangeLog:

PR fortran/105230
* expr.c (find_array_section): Correct logic to avoid NULL
pointer dereference on invalid array section.

gcc/testsuite/ChangeLog:

PR fortran/105230
* gfortran.dg/pr105230.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit 0acdbe29f66017fc5cca40dcbd72a0dd41491d07)

Fortran: improve error recovery on invalid array section

gcc/fortran/ChangeLog:

PR fortran/104849
* expr.c (find_array_section): Avoid NULL pointer dereference on
invalid array section.

gcc/testsuite/ChangeLog:

PR fortran/104849
* gfortran.dg/pr104849.f90: New test.

(cherry picked from commit 22015e77d3e45306077396b9de8a8a28bb67fb20)

Fortran: a RECURSIVE procedure cannot be an INTRINSIC

gcc/fortran/ChangeLog:

PR fortran/105138
* intrinsic.c (gfc_is_intrinsic): When a symbol refers to a
RECURSIVE procedure, it cannot be an INTRINSIC.

gcc/testsuite/ChangeLog:

PR fortran/105138
* gfortran.dg/recursive_reference_3.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit d46685b04071a485b56de353d997a866bfc8caba)

libstdc++: Fix status docs for <bit> support

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2020.xml: Fix supported version for
C++20 bit operations and power-of-two operations.

(cherry picked from commit 64648821f151b0f86e16185bef7f0be5635fd737)

[AArch64] add barriers to ool __sync builtins

2022-05-13 Sebastian Pop <spop@amazon.com>

gcc/
PR target/105162
* config/aarch64/aarch64-protos.h (atomic_ool_names): Increase dimension
of str array.
* config/aarch64/aarch64.c (aarch64_atomic_ool_func): Call
memmodel_from_int and handle MEMMODEL_SYNC_*.
(DEF0): Add __aarch64_*_sync functions.

gcc/testsuite/
PR target/105162
* gcc.target/aarch64/sync-comp-swap-ool.c: New.
* gcc.target/aarch64/sync-op-acquire-ool.c: New.
* gcc.target/aarch64/sync-op-full-ool.c: New.
* gcc.target/aarch64/target_attr_20.c: Update check.
* gcc.target/aarch64/target_attr_21.c: Same.

libgcc/
PR target/105162
* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
* config/aarch64/t-lse: Add a 5th memory model for _sync functions.

Daily bump.

c++: static memfn from non-dependent base [PR101078]

After my patch for PR91706, or before that with the qualified call,
tsubst_baselink returned a BASELINK with BASELINK_BINFO indicating a base of
a still-dependent derived class. We need to look up the relevant base binfo
in the substituted class.

PR c++/101078

gcc/cp/ChangeLog:

* pt.c (tsubst_baselink): Update binfos in non-dependent case.

gcc/testsuite/ChangeLog:

* g++.dg/template/access39.C: New test.

c++: extern template and static data member [PR99066]

'extern template' should mean that the relevant symbols are never emitted.
But in this case we were assuming that DECL_EXTERNAL was already set on the
variable, so we just needed to clear DECL_NOT_REALLY_EXTERN. Since
DECL_EXTERNAL was not set, we emitted a definition of npos.

gcc/cp/ChangeLog:

PR c++/99066
* pt.c (mark_decl_instantiated): Set DECL_EXTERNAL.

gcc/testsuite/ChangeLog:

PR c++/99066
* g++.dg/cpp0x/extern_template-6.C: New test.

c++: missing dtor with -fno-elide-constructors [PR100838]

tf_no_cleanup only applies to the outermost TARGET_EXPR, and we already
clear it for nested calls in build_over_call, but in this case both
constructor calls came from convert_like, so we need to clear it in the
recursive call as well. This revealed that we were adding an extra
ck_rvalue in direct-initialization cases where it was wrong.

PR c++/100838
PR c++/105265

gcc/cp/ChangeLog:

* call.c (convert_like_internal): Clear tf_no_cleanup when
recursing.
(build_user_type_conversion_1): Only add ck_rvalue if
LOOKUP_ONLYCONVERTING.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide2.C: New test.
* g++.dg/cpp0x/initlist-new6.C: New test.

c++: NRV in lambda in template [PR91217]

tsubst_lambda_expr was producing a function with two blocks that claimed to
be the outermost block in the function body, one from the call to
start_lambda_function in tsubst_lambda_expr, and one from tsubsting the
block added by start_lambda_function when we first parsed the lambda. This
messed with the named return value optimization, which only works for
variables in the outermost block.

gcc/cp/ChangeLog:

PR c++/91217
* pt.c (tsubst_lambda_expr): Skip the body block from
DECL_SAVED_TREE.

gcc/testsuite/ChangeLog:

PR c++/91217
* g++.dg/opt/nrv20.C: New test.

c++: array new initialized from a call [PR99643]

Here the get_foo() call results in a TARGET_EXPR, which we strip in
massage_init_elt, but then when build_vec_init tries to use it to initialize
the array element we crash because build_aggr_init expects a class rvalue to
have a TARGET_EXPR. So don't strip it.

The stripping was added in r206639 for PR59659, so I checked that removing
it didn't significantly increase compile time or memory usage for that
testcase; compile time was unaffected, memory usage increased by 0.00004%.

gcc/cp/ChangeLog:

PR c++/99643
* typeck2.c (massage_init_elt): Don't strip TARGET_EXPR.

gcc/testsuite/ChangeLog:

PR c++/99643
* g++.dg/cpp0x/initlist-new5.C: New test.

c++: alignment of local typedef in template [PR65211]

Because common_handle_aligned_attribute only applies the alignment to the
TREE_TYPE of a typedef, not the DECL_ORIGINAL_TYPE, we need to copy it
explicitly in tsubst.

PR c++/65211

gcc/cp/ChangeLog:

* pt.c (tsubst_decl) [TYPE_DECL]: Copy TYPE_ALIGN.

gcc/testsuite/ChangeLog:

* g++.target/i386/vec-tmpl1.C: New test.

c++: template conversion op [PR101698]

Asking for conversion to a dependent type also makes a BASELINK dependent.

PR c++/101698

gcc/cp/ChangeLog:

* pt.c (tsubst_baselink): Also check dependent optype.

gcc/testsuite/ChangeLog:

* g++.dg/template/conv19.C: New test.

c++: NRV and ref-extended temps [PR101442]

This issue goes back to r83221, where the cleanup for extended ref temps
changed from being unconditional to being tied to the declaration they
formed part of the initializer for.

The named return value optimization changes the cleanup for the NRV variable
to only run on the EH path; we don't want that change to affect temporary
cleanups. The perform_member_init change isn't necessary (there 'decl' is a
COMPONENT_REF), it's just for consistency.

PR c++/101442

gcc/cp/ChangeLog:

* decl.c (cp_finish_decl): Don't pass decl to push_cleanup.
* init.c (perform_member_init): Likewise.
* semantics.c (push_cleanup): Adjust comment.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-nrv1.C: New test.

c++: operator new lookup [PR98249]

The standard says, as we quote in the comment just above, that if we don't
find operator new in the allocated type, it should be looked up in the
global scope. This is specifically ::, not just any namespace, and we
already give an error for an operator new declared in any other namespace.

PR c++/98249

gcc/cp/ChangeLog:

* call.c (build_operator_new_call): Just look in ::.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/new3.C: New test.

c++: constexpr trivial -fno-elide-ctors [PR104646]

The constexpr constructor checking code got confused by the expansion of a
trivial copy constructor; we don't need to do that checking for defaulted
ctors, anyway.

PR c++/104646

gcc/cp/ChangeLog:

* constexpr.c (maybe_save_constexpr_fundef): Don't do extra
checks for defaulted ctors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-fno-elide-ctors1.C: New test.

c++: assignment to temporary [PR59950]

Given build_this of a TARGET_EXPR, cp_build_fold_indirect_ref returns the
TARGET_EXPR. But that's the wrong value category for the result of the
defaulted class assignment operator, which returns an lvalue, so we need to
actually build the INDIRECT_REF.

PR c++/59950

gcc/cp/ChangeLog:

* call.c (build_over_call): Use cp_build_indirect_ref.

gcc/testsuite/ChangeLog:

* g++.dg/init/assign2.C: New test.

c++: C++17 constexpr static data member linkage [PR99901]

C++17 makes constexpr static data members implicitly inline variables.  In
C++14, a subsequent out-of-class declaration is the definition.  We want to
continue emitting a symbol for such a declaration in C++17 mode, for ABI
compatibility with C++14 code that wants to refer to it.

Normally I'd distinguish in- and out-of-class declarations by looking at
DECL_IN_AGGR_P, but we never set DECL_IN_AGGR_P on inline variables.  I
think that's wrong, but don't want to mess with it so close to release.
Conveniently, we already have a test for in-class declaration earlier in the
function.

gcc/cp/ChangeLog:

PR c++/99901
* decl.c (cp_finish_decl): mark_needed an implicitly inline
static data member with an out-of-class redeclaration.

gcc/testsuite/ChangeLog:

PR c++/99901
* g++.dg/cpp1z/inline-var9.C: New test.

c++: nested generic lambda in DMI [PR101717]

We were already checking COMPLETE_TYPE_P to recognize instantiation of a
generic lambda, but didn't consider that we might be nested in a non-generic
lambda.

PR c++/101717

gcc/cp/ChangeLog:

* lambda.c (lambda_expr_this_capture): Check all enclosing
lambdas for completeness.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-this4.C: New test.

c++: PMF template parm and noexcept [PR90664]

The constexpr code only wants to preserve PTRMEM_CST in conversions if the
conversions are only qualification conversions; dropping noexcept counts as
a qualification adjustment in overload resolution, so let's include it here.

gcc/cp/ChangeLog:

PR c++/90664
* cvt.c (can_convert_qual): Check fnptr_conv_p.

gcc/testsuite/ChangeLog:

PR c++/90664
* g++.dg/cpp1z/noexcept-type24.C: New test.

c++: lambda in DMI in class template [PR95870]

Here enclosing_instantiation_of was failing to find a match because otctx is
struct S<T> and current_function_decl is S<int>::S(), so the latter has more
function contexts, and we end up trying to compare S() to NULL_TREE.

After spending a bit of time working on establishing the correspondence in
this case (class <=> constructor), it occurred to me that we could just use
DECL_SOURCE_LOCATION, which is unique for lambdas, since they cannot be
redeclared. Since we're so close to release, for now I'm only doing this
for the case that was failing before.

gcc/cp/ChangeLog:

PR c++/95870
* pt.c (enclosing_instantiation_of): Compare DECL_SOURCE_LOCATION if
there is no enclosing non-lambda function.

gcc/testsuite/ChangeLog:

PR c++/95870
* g++.dg/cpp0x/lambda/lambda-nsdmi10.C: New test.

c++: -Wunused, constant, and generic lambda [PR96311]

We never called mark_use for a return value in a function with dependent
return type. In that situation we don't know if the use is as an rvalue or
lvalue, but we can use mark_exp_read instead.

gcc/cp/ChangeLog:

PR c++/96311
* typeck.c (check_return_expr): Call mark_exp_read in dependent
case.

gcc/testsuite/ChangeLog:

PR c++/96311
* g++.dg/cpp1y/lambda-generic-Wunused.C: New test.

c++: access checking in aggregate initialization [PR96673]

We were deferring access checks while parsing B<int>{}, didn't adjust that
when we went to instantiate the default member initializer for B::c,
deferred access checking for C::C, and then checked it after parsing
B<int>{}, back in the main() context which has no access. We need to do the
access checks in the class context of the DMI.

I tried fixing this in push_to/pop_from_top_level, but that caused several
regressions.

gcc/cp/ChangeLog:

PR c++/96673
* init.c (get_nsdmi): Don't defer access checking.

gcc/testsuite/ChangeLog:

PR c++/96673
* g++.dg/cpp1y/nsdmi-aggr13.C: New test.

c++: constexpr, inheritance, and local class [PR91933]

Here we complained about referring to nm3 from the local class member
function because referring to the base class subobject involved taking the
variable's address. Let's shortcut this case to avoid that.

gcc/cp/ChangeLog:

PR c++/91933
* class.c (build_base_path): Shortcut simple non-pointer case.

gcc/testsuite/ChangeLog:

PR c++/91933
* g++.dg/cpp0x/constexpr-base7.C: New test.

c++: alias template equivalence and cv-quals [PR100032]

We also need to check that the cv-qualifiers are the same.

gcc/cp/ChangeLog:

PR c++/100032
* pt.c (get_underlying_template): Compare TYPE_QUALS.

gcc/testsuite/ChangeLog:

PR c++/100032
* g++.dg/cpp0x/alias-decl-equiv1.C: New test.

re PR c++/67184 (Missed optimization with C++11 final specifier)

PR c++/67184
PR c++/69445
* call.c (build_new_method_call_1): Remove set but not used variable
binfo.

c++: fix testcases

Only the first dg-error in constexpr-array23.C needs to be xfailed on the 9
branch.

lambda-pack-init6.C depends on the fix for PR94546; rather than backport
that as well, let's remove the test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array23.C: Remove xfail.
* g++.dg/cpp2a/lambda-pack-init6.C: Removed.

Daily bump.

testsuite: Fix up pr102860.f90 for gcc 9 [PR105570]

Apparently -mcpu=power10 is gcc 10+, but the PR102860 change otherwise
made sense also for 9.x. So just adjusting testcase...

2022-05-11 Jakub Jelinek <jakub@redhat.com>

PR middle-end/102860
PR testsuite/105570
* gfortran.dg/pr102860.f90: Use -mcpu=power9 instead of -mcpu=power10.

c++: ICE when building builtin operator->* set [PR103455]

Here when constructing the builtin operator->* candidate set according
to the available conversion functions for the operand types, we end up
considering a candidate with C1=T (through B's dependent conversion
function) and C2=F, during which we crash from DERIVED_FROM_P because
dependent_type_p sees a TEMPLATE_TYPE_PARM outside of a template
context.

Sidestepping the question of whether we should be considering such a
dependent conversion function here in the first place, it seems futile
to test DERIVED_FROM_P for anything other than an actual class type, so
this patch fixes this ICE by simply guarding the DERIVED_FROM_P test
with CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

PR c++/103455

gcc/cp/ChangeLog:

* call.c (add_builtin_candidate) <case MEMBER_REF>: Test
CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/overload/builtin6.C: New test.

(cherry picked from commit 04f19580e8dbdbc7366d0f5fd068aa0cecafdc9d)

c++: deleted fn and noexcept inst [PR101532, PR104225]

Here when attempting to use B's implicitly deleted default constructor,
mark_used rightfully returns false, but for the wrong reason: it
tries to instantiate the synthesized noexcept specifier which then only
silently fails because get_defaulted_eh_spec suppresses diagnostics
for deleted functions. This lack of diagnostics causes us to crash on
the first testcase below (thanks to the assert in finish_expr_stmt), and
silently accept the second testcase.

To fix this, this patch makes mark_used avoid attempting to instantiate
the noexcept specifier of a deleted function, so that we'll instead
directly reject (and diagnose) the function due to its deletedness.

PR c++/101532
PR c++/104225

gcc/cp/ChangeLog:

* decl2.c (mark_used): Don't consider maybe_instantiate_noexcept
on a deleted function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template21.C: New test.
* g++.dg/cpp0x/nsdmi-template21a.C: New test.

(cherry picked from commit bc90dd0ecf02e11d47d1af7f627e2e2acaa40106)

c++: Fix deduction with reference NTTP [PR83476]

In the testcase ref11.C below, during deduction for the call f(a),
uses_deducible_template_parms returns false for the dependent
specialization A<V> because the generic template argument V here is
wrapped in an implicit INDIRECT_REF (formed from template_parm_to_arg).
Since uses_deducible_template_parms returns false, unify_one_argument
exits early without ever attempting to deduce 'n' for 'V'. This patch
fixes this by making deducible_expression look through such implicit
INDIRECT_REFs.

gcc/cp/ChangeLog:

PR c++/83476
PR c++/99885
* pt.c (deducible_expression): Look through implicit
INDIRECT_REFs as well.

gcc/testsuite/ChangeLog:

PR c++/83476
PR c++/99885
* g++.dg/cpp1z/class-deduction85.C: New test.
* g++.dg/template/ref11.C: New test.

(cherry picked from commit 2ccc05a5141506fde0e20dec702c717fd67bf6ee)

g++.dg/gomp/clause-3.C: Fix - missing in r12-438-g1580fc7 [PR100422]

gcc/testsuite/
PR testsuite/100422
* g++.dg/gomp/clause-3.C: Use 'reduction(&:..)' instead of '...(&&:..)'.

(cherry picked from commit af4e4d35f0b84d7c2f57a7b682a09116e9911142)

asan: Fix up asan_redzone_buffer::emit_redzone_byte [PR105396]

On the following testcase, we have in main's frame 3 variables,
some red zone padding, 4 byte d, followed by 12 bytes of red zone padding, then
8 byte b followed by 24 bytes of red zone padding, then 40 bytes c followed
by some red zone padding.
The intended content of shadow memory for that is (note, each byte describes
8 bytes of memory):
f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
left red    d  mr b  middle r c              right red zone

f1 is left red zone magic
f2 is middle red zone magic
f3 is right red zone magic
00 when all 8 bytes are accessible
01-07 when only 1 to 7 bytes are accessible followed by inaccessible bytes

The -fdump-rtl-expand-details dump makes it clear that it misbehaves:
Flushing rzbuffer at offset -160 with: f1 f1 f1 f1
Flushing rzbuffer at offset -128 with: 04 f2 00 00
Flushing rzbuffer at offset -128 with: 00 00 00 f2
Flushing rzbuffer at offset -96 with: f2 f2 00 00
Flushing rzbuffer at offset -64 with: 00 00 00 f3
Flushing rzbuffer at offset -32 with: f3 f3 f3 f3
In the end we end up with
f1 f1 f1 f1 00 00 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
shadow bytes because at offset -128 there are 2 overlapping stores
as asan_redzone_buffer::emit_redzone_byte has flushed the temporary 4 byte
buffer in the middle.

The function is called with an offset and value.  If the passed offset is
consecutive with the prev_offset + buffer size (off == offset), then
we handle it correctly, similarly if the new offset is far enough from the
old one (we then flush whatever was in the buffer and if needed add up to 3
bytes of 00 before actually pushing value.

But what isn't handled correctly is when the offset isn't consecutive to
what has been added last time, but it is in the same 4 byte word of shadow
memory (32 bytes of actual memory), like the above case where
we have consecutive 04 f2 and then skip one shadow memory byte (aka 8 bytes
of real memory) and then want to emit f2.  Emitting that as a store
of little-endian 0x0000f204 followed by a store of 0xf2000000 to the same
address doesn't work, we want to emit 0xf200f204.

The following patch does that by pushing 1 or 2 00 bytes.
Additionally, as a small cleanup, instead of using
      m_shadow_bytes.safe_push (value);
      flush_if_full ();
in all of if, else if and else bodies it sinks those 2 stmts to the end
of function as all do the same thing.

2022-04-27  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/105396
* asan.c (asan_redzone_buffer::emit_redzone_byte): Handle the case
where offset is bigger than off but smaller than m_prev_offset + 32
bits by pushing one or more 0 bytes.  Sink the
m_shadow_bytes.safe_push (value); flush_if_full (); statements from
all cases to the end of the function.

* gcc.dg/asan/pr105396.c: New test.

(cherry picked from commit 9715f10c0651c9549b479b69d67be50ac4bd98a6)

sparc: Preserve ORIGINAL_REGNO in epilogue_renumber [PR105257]

The following testcase ICEs, because the pic register is
(reg:DI 24 %i0 [109]) and is used in the delay slot of a return.
We invoke epilogue_renumber and that changes it to
(reg:DI 8 %o0) which no longer satisfies sparc_pic_register_p
predicate, so we don't recognize the insn anymore.

The following patch fixes that by preserving ORIGINAL_REGNO if
specified, so we get (reg:DI 8 %o0 [109]) instead.

2022-04-19 Jakub Jelinek <jakub@redhat.com>

PR target/105257
* config/sparc/sparc.c (epilogue_renumber): If ORIGINAL_REGNO,
use gen_raw_REG instead of gen_rtx_REG and copy over also
ORIGINAL_REGNO. Use return 0; instead of /* fallthrough */.

* gcc.dg/pr105257.c: New test.

(cherry picked from commit eeca2b8bd03f57c59c6cf48bf6b9bd6dc86924f6)

c++: Fix up CONSTRUCTOR_PLACEHOLDER_BOUNDARY handling [PR105256]

The CONSTRUCTOR_PLACEHOLDER_BOUNDARY bit is supposed to separate
PLACEHOLDER_EXPRs that should be replaced by one object or subobjects of it
(variable, TARGET_EXPR slot, ...) from other PLACEHOLDER_EXPRs that should
be replaced by different objects or subobjects.
The bit is set when finding PLACEHOLDER_EXPRs inside of a CONSTRUCTOR, not
looking into nested CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctors, and we prevent
elision of TARGET_EXPRs (through TARGET_EXPR_NO_ELIDE) whose initializer
is a CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctor.  The following testcase ICEs
though, we don't replace the placeholders in there at all, because
CONSTRUCTOR_PLACEHOLDER_BOUNDARY isn't set on the TARGET_EXPR_INITIAL
ctor, but on a ctor nested in such a ctor.  replace_placeholders should be
run on the whole TARGET_EXPR slot.

So, the following patch fixes it by moving the CONSTRUCTOR_PLACEHOLDER_BOUNDARY
bit from nested CONSTRUCTORs to the CONSTRUCTOR containing those (but only
if it is closely nested, if there is some other tree sandwiched in between,
it doesn't do it).

2022-04-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/105256
* typeck2.c (process_init_constructor_array,
process_init_constructor_record, process_init_constructor_union): Move
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag from CONSTRUCTOR elements to the
containing CONSTRUCTOR.

* g++.dg/cpp0x/pr105256.C: New test.

(cherry picked from commit eb03e424598d30fed68801af6d6ef6236d32e32e)

i386: Fix ICE caused by ix86_emit_i387_log1p [PR105214]

The following testcase ICEs, because ix86_emit_i387_log1p attempts to
emit something like
  if (cond)
    some_code1;
  else
    some_code2;
and emits a conditional jump using emit_jump_insn (standard way in
the file) and an unconditional jump using emit_jump.
The problem with that is that if there is pending stack adjustment,
it isn't emitted before the conditional jump, but is before the
unconditional jump and therefore stack is adjusted only conditionally
(at the end of some_code1 above), which makes dwarf2 pass unhappy about it
but is a serious wrong-code even if it doesn't ICE.

This can be fixed either by emitting pending stack adjust before the
conditional jump as the following patch does, or by not using
  emit_jump (label2);
and instead hand inlining what that function does except for the
pending stack adjustment, like:
  emit_jump_insn (targetm.gen_jump (label2));
  emit_barrier ();
In that case there will be no stack adjustment in the sequence and
it will be done later on somewhere else.

2022-04-12  Jakub Jelinek  <jakub@redhat.com>

PR target/105214
* config/i386/i386.c (ix86_emit_i387_log1p): Call
do_pending_stack_adjust.

* gcc.dg/asan/pr105214.c: New test.

(cherry picked from commit d481d13786cb84f6294833538133dbd6f39d2e55)

builtins: Fix up expand_builtin_int_roundingfn_2 [PR105211]

The expansion of __builtin_iround{,f,l} etc. builtins in some cases
emits calls to a different fallback builtin. To locate the right builtin
it uses mathfn_built_in_1 with the type of the first argument.
If its TYPE_MAIN_VARIANT is {float,double,long_double}_type_node, all is
fine, but on the following testcase, because GIMPLE considers scalar
float conversions between types with the same mode as useless,
TYPE_MAIN_VARIANT of the arg's type is float32_type_node and because there
isn't __builtin_lroundf32 returns NULL and we ICE.

This patch will first try the type of the first argument of the builtin's
prototype (so that say on sizeof(double)==sizeof(long double) target it honors
whether it was a *l or non-*l call; though even that can't be 100% trusted,
user could incorrectly prototype it) and as fallback the type argument.
If neither works, doesn't fallback.

2022-04-11 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/105211
* builtins.c (expand_builtin_int_roundingfn_2): If mathfn_built_in_1
fails for TREE_TYPE (arg), retry it with
TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))) and if even that
fails, emit call normally.

* gcc.dg/pr105211.c: New test.

(cherry picked from commit 91a38e8a848c61b2e23ee277306dc8cd194d135b)

c-family: Initialize ridpointers for __int128 etc. [PR105186]

The following testcase ICEs with C++ and is incorrectly rejected with C.
The reason is that both FEs use ridpointers identifiers for CPP_KEYWORD
and value or u.value for CPP_NAME e.g. when parsing attributes or OpenMP
directives etc., like:
         /* Save away the identifier that indicates which attribute
            this is.  */
         identifier = (token->type == CPP_KEYWORD)
           /* For keywords, use the canonical spelling, not the
              parsed identifier.  */
           ? ridpointers[(int) token->keyword]
           : id_token->u.value;

         identifier = canonicalize_attr_name (identifier);
I've tried to change those to use ridpointers only if non-NULL and otherwise
use the value/u.value even for CPP_KEYWORDS, but that was a large 10 hunks
patch.

The following patch instead just initializes ridpointers for the __intNN
keywords.  It can't be done earlier before we record_builtin_type as there
are 2 different spellings and if we initialize those ridpointers early, the
second record_builtin_type fails miserably.

2022-04-11  Jakub Jelinek  <jakub@redhat.com>

PR c++/105186
* c-common.c (c_common_nodes_and_builtins): After registering __int%d
and __int%d__ builtin types, initialize corresponding ridpointers
entry.

* c-c++-common/pr105186.c: New test.

(cherry picked from commit 083e8e66d2e90992fa83a53bfc3553dfa91abda1)

fold-const: Fix up make_range_step [PR105189]

The following testcase is miscompiled, because fold_truth_andor
incorrectly folds
(unsigned) foo () >= 0U && 1
into
foo () >= 0
For the unsigned comparison (which is useless in this case,
as >= 0U is always true, but hasn't been folded yet), previous
make_range_step derives exp (unsigned) foo () and +[0U, -]
range for it.  Next we process the NOP_EXPR.  We have special code
for unsigned to signed casts, already earlier punt if low or high
aren't representable in arg0_type or if it is a narrowing conversion.
For the signed to unsigned casts, I think if high is specified we
are still fine, as we punt for non-representable values in arg0_type,
n_high is then still representable and so was smaller or equal to
signed maximum and either low is not present (equivalent to 0U), or
low must be smaller or equal to high and so for unsigned exp
+[low, high] the signed exp +[n_low, n_high] will be correct.
Similarly, if both low and high aren't specified (always true or
always false), it is ok too.
But if we have for unsigned exp +[low, -] or -[low, -], using
+[n_low, -] or -[n_high, -] is incorrect.  Because low is smaller
or equal to signed maximum and high is unspecified (i.e. unsigned
maximum), when signed that range is a union of +[n_low, -] and
+[-, -1] which is equivalent to -[0, n_low-1], unless low
is 0, in that case we can treat it as [-, -].

2022-04-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/105189
* fold-const.c (make_range_step): Fix up handling of
(unsigned) x +[low, -] ranges for signed x if low fits into
typeof (x).

* g++.dg/torture/pr105189.C: New test.

(cherry picked from commit 5e6597064b0c7eb93b8f720afc4aa970eefb0628)

combine: Don't record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]

The testcase in the PR fails under valgrind on mips64 (but only Martin
can reproduce, I couldn't).
But the problem reported there is that SUBST_MODE remembers addresses
into the regno_reg_rtx array, then some splitter needs a new pseudo
and calls gen_reg_rtx, which reallocates the regno_reg_rtx array
and finally undo operation is done and dereferences the old regno_reg_rtx
entry.
The rtx values stored in regno_reg_rtx array seems to be created
by gen_reg_rtx only and since then aren't modified, all we do for it
is adjusting its fields (e.g. adjust_reg_mode that SUBST_MODE does).

So, I think it is useless to use where.r for UNDO_MODE and store
&regno_reg_rtx[regno] in struct undo, we can store just
regno_reg_rtx[regno] (i.e. pointer to the REG itself instead of
pointer to pointer to REG) or could also store just the regno.

The following patch does the latter, and because SUBST_MODE no longer
needs to be a macro, changes all SUBST_MODE uses to subst_mode.

2022-04-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/104985
* combine.c (struct undo): Add where.regno member.
(do_SUBST_MODE): Rename to ...
(subst_mode): ... this.  Change first argument from rtx * into int,
operate on regno_reg_rtx[regno] and save regno into where.regno.
(SUBST_MODE): Remove.
(try_combine): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever.  For UNDO_MODE, use
regno_reg_rtx[undo->where.regno] instead of *undo->where.r.
(undo_to_marker): For UNDO_MODE, use regno_reg_rtx[undo->where.regno]
instead of *undo->where.r.
(simplify_set): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever.

(cherry picked from commit 61bee6aed26eb30b798c75b9a595c9d51e080442)

i386: Fix up ix86_expand_vector_init_general [PR105123]

The following testcase is miscompiled on ia32.
The problem is that at -O0 we end up with:
  vector(4) short unsigned int _1;
  short unsigned int u.0_3;
...
  _1 = {u.0_3, u.0_3, u.0_3, u.0_3};
statement (dead) which is wrongly expanded.
elt is (subreg:HI (reg:SI 83 [ u.0_3 ]) 0), tmp_mode SImode,
so after convert_mode we start with word (reg:SI 83 [ u.0_3 ]).
The intent is to manually broadcast that value to 2 SImode parts,
but because we pass word as target to expand_simple_binop, it will
overwrite (reg:SI 83 [ u.0_3 ]) and we end up with 0:
   10: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
   11: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
   12: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
   13: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
   14: clobber r110:V4HI
   15: r110:V4HI#0=r83:SI
   16: r110:V4HI#4=r83:SI
as the two ors do nothing and two shifts each by 16 left shift it all
away.
The following patch fixes that by using NULL_RTX target, so we expand it as
   10: {r110:SI=r83:SI<<0x10;clobber flags:CC;}
   11: {r111:SI=r110:SI|r83:SI;clobber flags:CC;}
   12: {r112:SI=r83:SI<<0x10;clobber flags:CC;}
   13: {r113:SI=r112:SI|r83:SI;clobber flags:CC;}
   14: clobber r114:V4HI
   15: r114:V4HI#0=r111:SI
   16: r114:V4HI#4=r113:SI
instead.

Another possibility would be to pass NULL_RTX only when word == elt
and word otherwise, where word would necessarily be a pseudo from the first
shift after passing NULL_RTX there once or pass NULL_RTX for the shift and
word for ior.

2022-04-03  Jakub Jelinek  <jakub@redhat.com>

PR target/105123
* config/i386/i386.c (ix86_expand_vector_init_general): Avoid
using word as target for expand_simple_binop when doing ASHIFT and
IOR.

* gcc.target/i386/pr105123.c: New test.

(cherry picked from commit e1a74058b784c845e84a0cf1997b54b984df483d)

ubsan: Fix ICE due to -fsanitize=object-size [PR105093]

The following testcase ICEs, because for a volatile X & RESULT_DECL
ubsan wants to take address of that reference.  instrument_object_size
is called with x, so the base is equal to the access and the var
is automatic, so there is no risk of an out of bounds access for it.
Normally we wouldn't instrument those because we fold address of the
t - address of inner to 0, add constant size of the decl and it is
equal to what __builtin_object_size computes.  But the volatile
results in the subtraction not being folded.

The first hunk fixes it by punting if we access the whole automatic
decl, so that even volatile won't cause a problem.
The second hunk (not strictly needed for this testcase) is similar
to what has been added to asan.cc recently, if we actually take
address of a decl and keep it in the IL, we better mark it addressable.

2022-03-30  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/105093
* ubsan.c (instrument_object_size): If t is equal to inner and
is a decl other than global var, punt.  When emitting call to
UBSAN_OBJECT_SIZE ifn, make sure base is addressable.

* g++.dg/ubsan/pr105093.C: New test.

(cherry picked from commit e3e68fa59ead502c24950298b53c637bbe535a74)

c++: Fix up __builtin_convertvector parsing

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

2022-03-26 Jakub Jelinek <jakub@redhat.com>

* parser.c (cp_parser_postfix_expression)
<case RID_BILTIN_CONVERTVECTOR>: Don't
return cp_build_vec_convert result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.

(cherry picked from commit 1806829e08f14e4cacacec43d7845cc2dad2ddc8)

c++: extern thread_local declarations in constexpr [PR104994]

C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.

2022-03-24 Jakub Jelinek <jakub@redhat.com>

PR c++/104994
* constexpr.c (potential_constant_expression_1): Don't diagnose extern
thread_local declarations.
* decl.c (start_decl): Likewise.

* g++.dg/cpp2a/constexpr-nonlit7.C: New test.

(cherry picked from commit 72124f487ccb5c8065dd5f7b8fba254600b7e611)

i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs [PR104971]

__builtin_ia32_readeflags_u* aren't marked const or pure I think
intentionally, so that they aren't CSEd from different regions of a function
etc. because we don't and can't easily track all dependencies between
it and surrounding code (if somebody looks at the condition flags, it is
dependent on the vast majority of instructions).
But the builtin itself doesn't have any side-effects, so if we ignore the
result of the builtin, there is no point to emit anything.

There is a LRA bug that miscompiles the testcase which this patch makes
latent, which is certainly worth fixing too, but IMHO this change
(and maybe ix86_gimple_fold_builtin too which would fold it even earlier
when it looses lhs) is worth it as well.

2022-03-19 Jakub Jelinek <jakub@redhat.com>

PR middle-end/104971
* config/i386/i386.c
(ix86_expand_builtin) <case IX86_BUILTIN_READ_FLAGS>: If ignore,
don't push/pop anything and just return const0_rtx.

* gcc.target/i386/pr104971.c: New test.

(cherry picked from commit b60bc913cca7439d29a7ec9e9a7f448d8841b43c)

c, c++, c-family: -Wshift-negative-value and -Wshift-overflow* tweaks for -fwrapv and C++20+ [PR104711]

As mentioned in the PR, different standards have different definition
on what is an UB left shift.  They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is.  As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case.  Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values.  The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.

While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.

2022-03-09  Jakub Jelinek  <jakub@redhat.com>

PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.c (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.c (ubsan_instrument_shift): Adjust comments.
* c-warn.c (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.c (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.c (build_binary_op): Likewise.
gcc/cp/
* constexpr.c (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.c (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.

(cherry picked from commit d76511138dc816ef66fd16f71531f48c37dac3b4)

c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

On the following testcase, we emit "did you mean '__dt '?" in the error
message. "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.

2022-03-08 Jakub Jelinek <jakub@redhat.com>

PR c++/104806
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
identifiers with space at the end.

* g++.dg/spellcheck-pr104806.C: New test.

(cherry picked from commit e480c3c06d20874fd7504bfdcca0b829f8000389)

s390: Fix up *cmp_and_trap_unsigned_int<mode> constraints [PR104775]

The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
insn not being accepted by assembler.
My rough understanding is that in the RSY-b insn format the spot
in other formats used for index registers is used instead for M3 what
kind of comparison it is, so this patch follows what other similar
instructions use for constraint (i.e. one without index register).

2022-03-07 Jakub Jelinek <jakub@redhat.com>

PR target/104775
* config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use
S constraint instead of T in the last alternative.

* gcc.target/s390/pr104775.c: New test.

(cherry picked from commit 2472dcaa8cb9e02e902f83d419c3ee7e0f3d9041)

match.pd: Further complex simplification fixes [PR104675]

Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.

2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>

PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.

* gcc.dg/pr104675-3.c : New test.

(cherry picked from commit f62115c9b770a66c5378f78a2d5866243d560573)

rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]

The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.

* g++.dg/opt/pr104681.C: New test.

(cherry picked from commit 3885a122f817a1b6dca4a84ba9e020d5ab2060af)

match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]

We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.

* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.

(cherry picked from commit 758671b88b78d7629376b118ec6ca6bcfbabbd36)

libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).

(cherry picked from commit 2f59f067610f22c3f2ec9b1516e24b85836676ed)

valtrack: Avoid creating raw SUBREGs with VOIDmode argument [PR104557]

After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations  we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them.  But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there.  We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.

2022-02-17  Jakub Jelinek  <jakub@redhat.com>

PR debug/104557
* valtrack.c (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.

* gcc.dg/dfp/pr104557.c: New test.

(cherry picked from commit 1c2b44b52364cb5661095b346de794bc7ff02866)

c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]

The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.

2022-02-16 Jakub Jelinek <jakub@redhat.com>

PR c/104510
* c-common.c (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.

* gcc.dg/dfp/pr104510.c: New test.

(cherry picked from commit 6e74122f0de6748b3fd0ed9183090cd7c61fb53e)

sanitizer: Use glibc _thread_db_sizeof_pthread symbol if present

I've cherry-picked following fix from llvm-project. Recent glibcs
have _thread_db_sizeof_pthread symbol variable which contains the
size of struct pthread, so that sanitizers don't need to guess that
and risk that it will change again.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

* sanitizer_common/sanitizer_linux_libcdep.cc: Cherry-pick
llvm-project revision ef14b78d9a144ba81ba02083fe21eb286a88732b.

(cherry picked from commit c4c0aa60891daeb4ea5a7c265bd681038f6d8271)

openmp: Make finalize_task_copyfn order reproduceable [PR104517]

The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

PR debug/104517
* omp-low.c (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.

(cherry picked from commit 6a0d6e7ca9b9e338e82572db79c26168684a7441)

c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]

return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.

potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine.  I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label.  For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.

I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.

2022-02-14  Jakub Jelinek  <jakub@redhat.com>

PR c++/104513
* constexpr.c (potential_constant_expression_1) <case GOTO_EXPR>:
Don't punt if returns (target).

* g++.dg/cpp1y/constexpr-104513.C: New test.

(cherry picked from commit 02a981a8e512934a990d1427d14e8e884409fade)

asan: Fix up address sanitizer instrumentation of __builtin_alloca* if it can throw [PR104449]

With -fstack-check* __builtin_alloca* can throw and the asan
instrumentation of this builtin wasn't prepared for that case.
The following patch fixes that by replacing the builtin with the
replacement builtin and emitting any further insns on the fallthru
edge.

I haven't touched the hwasan code which most likely suffers from the
same problem.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/104449
* asan.c: Include tree-eh.h.
(handle_builtin_alloca): Handle the case when __builtin_alloca or
__builtin_alloca_with_align can throw.

* gcc.dg/asan/pr104449.c: New test.
* g++.dg/asan/pr104449.C: New test.

(cherry picked from commit f0c7367b8802c47efaad87b1f2126fe6350d8b47)

i386: Fix up cvtsd2ss splitter [PR104502]

The following testcase ICEs, because AVX512F is enabled, AVX512VL is not,
and the cvtsd2ss insn has %xmm0-15 as output operand and %xmm16-31 as
input operand. For output operand %xmm16+ the splitter just gives up
in such case, but for such input it just emits vmovddup which requires
AVX512VL if either operand is EXT_REX_SSE_REG_P (when it is 128-bit).

The following patch fixes it by treating that case like the pre-SSE3
output != input case - move the input to output and do everything on
the output reg which is known to be < %xmm16.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR target/104502
* config/i386/i386.md (cvtsd2ss splitter): If operands[1] is xmm16+
and AVX512VL isn't available, move operands[1] to operands[0] first.

* gcc.target/i386/pr104502.c: New test.

(cherry picked from commit 0538d42cdd68f6b65d72ed7768f1d00ba44f8631)

c++: Fix up constant expression __builtin_convertvector folding [PR104472]

The following testcase ICEs, because due to the -frounding-math
fold_const_call fails, which is it returns NULL, and returning NULL from
cxx_eval* is wrong, all the callers rely on them to either return folded
value or original with *non_constant_p = true.

The following patch does that, and additionally falls through into the
default case where there is diagnostics for the !ctx->quiet case too.

2022-02-11 Jakub Jelinek <jakub@redhat.com>

PR c++/104472
* constexpr.c (cxx_eval_internal_function) <case IFN_VEC_CONVERT>:
Only return fold_const_call result if it is non-NULL. Otherwise
fall through into the default: case to return t, set *non_constant_p
and emit diagnostics if needed.

* g++.dg/cpp0x/constexpr-104472.C: New test.

(cherry picked from commit 84993d94e13ad2ab3aee151bb5a5e767cf75d51e)

combine: Fix ICE with substitution of CONST_INT into PRE_DEC argument [PR104446]

The following testcase ICEs, because combine substitutes
(insn 10 9 11 2 (set (reg/v:SI 7 sp [ a ])
        (const_int 0 [0])) "pr104446.c":9:5 81 {*movsi_internal}
     (nil))
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
forming
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (const_int 0 [0])) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
which is invalid RTL (pre_dec's argument must be a REG).
I know substitution creates various forms of invalid RTL and hopes that
invalid RTL just won't recog.
But unfortunately in this case we ICE before we get to recog, as
try_combine does:
  if (n_auto_inc)
    {
      int new_n_auto_inc = 0;
      for_each_inc_dec (newpat, count_auto_inc, &new_n_auto_inc);

      if (n_auto_inc != new_n_auto_inc)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "Number of auto_inc expressions changed\n");
          undo_all ();
          return 0;
        }
    }
and for_each_inc_dec under the hood will do e.g. for the PRE_DEC case:
    case PRE_DEC:
    case POST_DEC:
      {
        poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
        rtx r1 = XEXP (x, 0);
        rtx c = gen_int_mode (-size, GET_MODE (r1));
        return fn (mem, x, r1, r1, c, data);
      }
and that code rightfully expects that the PRE_DEC operand has non-VOIDmode
(as it needs to be a REG) - gen_int_mode for VOIDmode results in ICE.
I think it is better not to emit the clearly invalid RTL during substitution
like we do for other cases, than to adding workarounds for invalid IL
created by combine to rtlanal.cc and perhaps elsewhere.
As for the testcase, of course it is UB at runtime to modify sp that way,
but if such code is never reached, we must compile it, not to ICE on it.
And I don't see why on other targets which use the autoinc rtxes much more
it couldn't happen with other registers.

2022-02-11  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/104446
* combine.c (subst): Don't substitute CONST_INTs into RTX_AUTOINC
operands.

* gcc.target/i386/pr104446.c: New test.

(cherry picked from commit fb76c0ad35f96505ecd9213849ebc3df6163a0f7)

rs6000: Fix up vspltis_shifted [PR102140]

The following testcase ICEs, because
(const_vector:V4SI [
                (const_int 0 [0]) repeated x3
                (const_int -2147483648 [0xffffffff80000000])
            ])
is recognized as valid easy_vector_constant in between split1 pass and
end of RA.
The problem is that such constants need to be split, and the only
splitter for that is:
(define_split
  [(set (match_operand:VM 0 "altivec_register_operand")
        (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()"
There is only a single splitting pass before RA, so after that finishes,
if something gets matched in between that and end of RA (after that
can_create_pseudo_p () would be no longer true), it will never be
successfully split and we ICE at final.cc time or earlier.

The i386 backend (and a few others) already use
(cfun->curr_properties & PROP_rtl_split_insns)
as a test for split1 pass finished, so that some insns that should be split
during split1 and shouldn't be matched afterwards are properly guarded.

So, the following patch does that for vspltis_shifted too.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/102140
* config/rs6000/rs6000.c (vspltis_shifted): Return false also if
split1 pass has finished already.

* gcc.dg/pr102140.c: New test.

(cherry picked from commit 0c3e491a4e5ae74bfbed6d167d403d262b5a4adc)

libgomp: Fix segfault with posthumous orphan tasks [PR104385]

The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.

(cherry picked from commit 0af7ef050aed9f678d70d79931ede38374fde863)

libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one).  The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
and later on
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

Now, the funlike_invocation_p condition
       if (padding == NULL
           || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
        padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences.  Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
   case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
   (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
   case in stringify_arg)

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/104147
* macro.c (funlike_invocation_p): For padding prefer a token
with val.source non-NULL especially if it has PREV_WHITE set
on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
don't have PREV_WHITE set in flags.

* c-c++-common/cpp/pr104147.c: New test.

(cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens

The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
          pfile->directive_result.src_loc = pragma_token_virt_loc;
          pfile->directive_result.type = CPP_PRAGMA;
          pfile->directive_result.flags = pragma_token->flags;
          pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
  pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
              if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
                {
                  if (pfile->directive_result.type == CPP_PADDING)
                    continue;
                  result = &pfile->directive_result;
                }
but destringize_and_run does not:
  if (pfile->directive_result.type == CPP_PRAGMA)
    {
...
    }
  else
    {
      count = 1;
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
  _Pragma ("omp simd")
  for (int i = 0; i < 64; i++)
    ;
}

Here is a patch that replaces those
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
lines with
      toks = &pfile->avoid_paste;

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

* directives.c (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.

(cherry picked from commit efc46b550f035281e51c340f73fbc9a79655e852)

optabs: Don't create pseudos in prepare_cmp_insn when not allowed [PR102478]

cond traps can be created during ce3 after reload (and e.g. PR103028
recently fixed some ce3 cond trap related bug, so I think often that
works fine and we shouldn't disable cond traps after RA altogether),
but it calls prepare_cmp_insn.  This function can fail, so I don't
see why we couldn't make it work after RA (in most cases it already
just works).  The first hunk is just an optimization which doesn't
make sense after RA, so I've guarded it with can_create_pseudo_p.
The second hunk is just a theoretical case, I don't have a testcase for it.
prepare_cmp_insn has some other spots that can create pseudos, like when
both operands have VOIDmode, or when it is BLKmode comparison, or
not OPTAB_DIRECT, but I think none of that applies to ce3, we punt on
BLKmode earlier, use OPTAB_DIRECT and shouldn't be comparing two
VOIDmode CONST_INTs.

2022-01-21  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/102478
* optabs.c (prepare_cmp_insn): If !can_create_pseudo_p (), don't
force_reg constants and for -fnon-call-exceptions fail if copy_to_reg
would be needed.

* gcc.dg/pr102478.c: New test.

(cherry picked from commit c2d9159717b474f9c06dde4d32b48b87164deb50)

match.pd, optabs: Avoid vectorization of {FLOOR,CEIL,ROUND}_{DIV,MOD}_EXPR [PR102860]

power10 has modv4si3 expander and so vectorizes the following testcase
where Fortran modulo is FLOOR_MOD_EXPR.
optabs_for_tree_code indicates that the optab for all the *_MOD_EXPR
variants is umod_optab or smod_optab, but that isn't true, that optab
actually expands just TRUNC_MOD_EXPR.  For the other tree codes expmed.cc
has code how to adjust the TRUNC_MOD_EXPR into those by emitting some
extra comparisons and conditional updates.  Similarly for *_DIV_EXPR,
except in that case it actually needs both division and modulo.

While it would be possible to handle it in expmed.cc for vectors as well,
we'd need to be sure all the vector operations we need for that are
available, and furthermore we wouldn't account for that in the costing.

So, IMHO it is better to stop pretending those non-truncating (and
non-exact) div/mod operations have an optab.  For GCC 13, we should
IMHO pattern match these in tree-vect-patterns.cc and transform them
to truncating div/mod with follow-up adjustments and let the vectorizer
vectorize that.  As written in the PR, for signed operands:
r = x %[fl] y;
is
r = x % y; if (r && (x ^ y) < 0) r += y;
and
d = x /[fl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
and
r = x %[cl] y;
is
r = x % y; if (r && (x ^ y) >= 0) r -= y;
and
d = /[cl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) >= 0) ++d;
(too lazy to figure out rounding div/mod now).  I'll create a PR
for that.
The patch also extends a match.pd optimization that floor_mod on
unsigned operands is actually trunc_mod.

2022-01-19  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102860
* match.pd (x %[fl] y -> x % y): New simplification for
unsigned integral types.
* optabs-tree.c (optab_for_tree_code): Return unknown_optab
for {CEIL,FLOOR,ROUND}_{DIV,MOD}_EXPR with VECTOR_TYPE.

* gfortran.dg/pr102860.f90: New test.

(cherry picked from commit ffc7f200adbdf47f14b3594d9b21855c19cf797a)

ifcvt: Check for asm goto at the end of then_bb/else_bb in ifcvt [PR103908]

On the following testcase, RTL ifcvt sees then_bb
(note 7 6 8 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 8 7 9 3 (set (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7fdccf5b0cf0 b>) [1 b+0 S4 A32])
        (const_int 1 [0x1])) "pr103908.c":6:7 81 {*movsi_internal}
     (nil))
(jump_insn 9 8 13 3 (parallel [
            (asm_operands/v ("# insn 1") ("") 0 []
                 []
                 [
                    (label_ref:DI 21)
                ] pr103908.c:7)
            (clobber (reg:CC 17 flags))
        ]) "pr103908.c":7:5 -1
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil))
-> 21)
and similarly else_bb (just with a different asm_operands template).
It checks that those basic blocks have a single successor and
uses last_active_insn which intentionally skips over JUMP_INSNs, sees
both basic blocks contain the same set and merges them (or if the
sets are different, attempts some other noce optimization).
But we can't assume that the jump, even when it has only a single successor,
has no side-effects.

The following patch fixes it by punting if test_bb ends with a JUMP_INSN
that isn't onlyjump_p.

2022-01-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103908
* ifcvt.c (bb_valid_for_noce_process_p): Punt on bbs ending with
asm goto.

* gcc.target/i386/pr103908.c: New test.

(cherry picked from commit 80ad67e2af0620d58d57d0406dc22693cf5b8ca9)

libcpp: Fix up ##__VA_OPT__ handling [PR89971]

In the following testcase we incorrectly error about pasting / token
with padding token (which is a result of __VA_OPT__); instead we should
like e.g. for ##arg where arg is empty macro argument clear PASTE_LEFT
flag of the previous token if __VA_OPT__ doesn't add any real tokens
(which can happen either because the macro doesn't have any tokens
passed to ... (i.e. __VA_ARGS__ expands to empty) or when __VA_OPT__
doesn't have any tokens in between ()s).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR preprocessor/89971
libcpp/
* macro.c (replace_args): For ##__VA_OPT__, if __VA_OPT__ expands
to no tokens at all, drop PASTE_LEFT flag from the previous token.
gcc/testsuite/
* c-c++-common/cpp/va-opt-9.c: New test.

(cherry picked from commit 5545d1edcbdb1701443f94dde7ec97c5ce3e1a6c)

shrink-wrapping: Fix up prologue block discovery [PR103860]

The following testcase is miscompiled, because a prologue which
contains subq $8, %rsp instruction is emitted at the start of
a basic block which contains conditional jump that depends on
flags register set in an earlier basic block, the prologue instruction
then clobbers those flags.
Normally this case is checked by can_get_prologue predicate, but this
is done only at the start of the loop. If we update pro later in the
loop (because some bb shouldn't be duplicated) and then don't push
anything further into vec and the vec is already empty (this can happen
when the new pro is already in bb_with bitmask and either has no successors
(that is the case in the testcase where that bb ends with a trap) or
all the successors are already in bb_with, then the loop doesn't iterate
further and can_get_prologue will not be checked.

The following simple patch makes sure we call can_get_prologue even after
the last former iteration when vec is already empty and only break from
the loop afterwards (and only if the updating of pro done because of
!can_get_prologue didn't push anything into vec again).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/103860
* shrink-wrap.c (try_shrink_wrapping): Make sure can_get_prologue is
called on pro even if nothing further is pushed into vec.

* gcc.dg/pr103860.c: New test.

(cherry picked from commit 1820137ba624d7eb2004a10f9632498b6bc1696a)

loop-invariant: Fix -fcompare-debug failure [PR103837]

In the following testcase we have a -fcompare-debug failure, because
can_move_invariant_reg doesn't ignore DEBUG_INSNs in its decisions.
In the testcase we have due to uninitialized variable:
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and with -g decide not to move the pseudo84 = invariant before the
loop header; in this case not resetting the debug insns might be fine.
But, we could have also:
  pseudo84 = whatever
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and in that case not resetting the debug insns would result in wrong-debug.
And, we don't really have generally a good substitution on what pseudo84
contains, it could inherit various values from different paths.
So, the following patch ignores DEBUG_INSNs in the decisions, and if there
are any that previously prevented the optimization, resets them before
return true.

2021-12-28  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103837
* loop-invariant.c (can_move_invariant_reg): Ignore DEBUG_INSNs in
the decisions whether to return false or continue and right before
returning true reset those debug insns that previously caused
returning false.

* gcc.dg/pr103837.c: New test.

(cherry picked from commit 3c5fd3616f73fbcd241cc3a5e09275c2b0c49bd4)

bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]

On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
  /* Find real size of result (highest non-zero byte).  */
  if (n->base_addr)
    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
  else
    rsize = n->range;
The shifts then shift uint64_t by 64 bits.  For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103435
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
case.

(cherry picked from commit 567d5f3d62fba2a23a9e975f7e7c7b61bb67cf24)

fortran, debug: Fix up DW_AT_rank [PR103315]

For DW_AT_rank we were emitting
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x6     # DW_OP_deref
on 64-bit and
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x6     # DW_OP_deref
on 32-bit.  I think this is wrong, as dtype.rank field in the descriptor
has unsigned char type, not pointer type nor pointer sized integral.
E.g. if we have a
    REAL :: a(..)
dummy argument, which is passed as a reference to the function descriptor,
we want to evaluate a->dtype.rank.  The above DWARF expressions perform
*(uintptr_t *)(a + 0x1c)
and
*(uintptr_t *)(a + 0x10)
respectively.  The following patch changes those to:
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
and
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
which perform
*(unsigned char *)(a + 0x1c)
and
*(unsigned char *)(a + 0x10)
respectively.

2021-11-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/103315
* trans-types.c (gfc_get_array_descr_info): Use DW_OP_deref_size 1
instead of DW_OP_deref for DW_AT_rank.

(cherry picked from commit da17c304e22ba256eba0b03710aa329115163b08)

c++: Fix up -fstrong-eval-order handling of call arguments [PR70796]

For -fstrong-eval-order (default for C++17 and later) we make sure to
gimplify arguments in the right order, but as the following testcase
shows that is not enough.
The problem is that some lvalues can satisfy the is_gimple_val / fb_rvalue
predicate used by gimplify_arg for is_gimple_reg_type typed expressions,
or is_gimple_lvalue / fb_either used for other types.
E.g. in foo we have:
  C::C (&p,  ++i,  ++i)
before gimplification where i is an automatic int variable and without this
patch gimplify that as:
  i = i + 1;
  i = i + 1;
  C::C (&p, i, i);
which means that the ctor is called with the original i value incremented
by 2 in both arguments, while because the call is CALL_EXPR_ORDERED_ARGS
the first argument should be different.  Similarly in qux we have:
  B::B (&p, TARGET_EXPR <D.2274, *(const struct A &) A::operator++ (&i)>,
        TARGET_EXPR <D.2275, *(const struct A &) A::operator++ (&i)>)
and gimplify it as:
      _1 = A::operator++ (&i);
      _2 = A::operator++ (&i);
      B::B (&p, MEM[(const struct A &)_1], MEM[(const struct A &)_2]);
but because A::operator++ returns the passed in argument, again we have
the same value in both cases due to gimplify_arg doing:
      /* Also strip a TARGET_EXPR that would force an extra copy.  */
      if (TREE_CODE (*arg_p) == TARGET_EXPR)
        {
          tree init = TARGET_EXPR_INITIAL (*arg_p);
          if (init
              && !VOID_TYPE_P (TREE_TYPE (init)))
            *arg_p = init;
        }
which is perfectly fine optimization for calls with unordered arguments,
but breaks the ordered ones.
Lastly, in corge, we have before gimplification:
  D::foo (NON_LVALUE_EXPR <p>, 3,  ++p)
and gimplify it as
  p = p + 4;
  D::foo (p, 3, p);
which is again wrong, because the this argument isn't before the
side-effects but after it.
The following patch adds cp_gimplify_arg wrapper, which if ordered
and is_gimple_reg_type forces non-SSA_NAME is_gimple_variable
result into a temporary, and if ordered, not is_gimple_reg_type
and argument is TARGET_EXPR bypasses the gimplify_arg optimization.
So, in foo with this patch we gimplify it as:
  i = i + 1;
  i.0_1 = i;
  i = i + 1;
  C::C (&p, i.0_1, i);
in qux as:
      _1 = A::operator++ (&i);
      D.2312 = MEM[(const struct A &)_1];
      _2 = A::operator++ (&i);
      B::B (&p, D.2312, MEM[(const struct A &)_2]);
where D.2312 is a temporary and in corge as:
  p.9_1 = p;
  p = p + 4;
  D::foo (p.9_1, 3, p);
The is_gimple_reg_type forcing into a temporary should be really cheap
(I think even at -O0 it should be optimized if there is no modification in
between), the aggregate copies might be more expensive but I think e.g. SRA
or FRE should be able to deal with those if there are no intervening
changes.  But still, the patch tries to avoid those when it is cheaply
provable that nothing bad happens (if no argument following it in the
strong evaluation order doesn't have TREE_SIDE_EFFECTS, then even VAR_DECLs
etc. shouldn't be modified after it).  There is also an optimization to
avoid doing that for this or for arguments with reference types as nothing
can modify the parameter values during evaluation of other argument's
side-effects.

I've tried if e.g.
  int i = 1;
  return i << ++i;
doesn't suffer from this problem as well, but it doesn't, the FE uses
  SAVE_EXPR <i>, SAVE_EXPR <i> << ++i;
in that case which gimplifies the way we want (temporary in the first
operand).

2021-11-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/70796
* cp-gimplify.c (cp_gimplify_arg): New function.
(cp_gimplify_expr): Use cp_gimplify_arg instead of gimplify_arg,
pass true as last argument to it if there are any following
arguments in strong evaluation order with side-effects.

* g++.dg/cpp1z/eval-order11.C: New test.

(cherry picked from commit a84177aff7ca86f501d6aa5ef407fac5e71f56fb)

lim: Reset flow sensitive info even for pointers [PR103192]

Since 2014 is lim clearing SSA_NAME_RANGE_INFO for integral SSA_NAMEs
if moving them from conditional contexts inside of a loop into unconditional
before the loop, but as the miscompilation of gimplify.c shows, we need to
treat pointers the same, even for them we need to reset whether the pointer
can/can't be null or the recorded pointer alignment.

This fixes
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/target-in-reduction-2.C (internal compiler error)
-FAIL: libgomp.c++/target-in-reduction-2.C (test for excess errors)
-UNRESOLVED: libgomp.c++/target-in-reduction-2.C compilation failed to produce executable
on both x86_64 and i686.

2021-11-17 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/103192
* tree-ssa-loop-im.c (move_computations_worker): Use
reset_flow_sensitive_info instead of manually clearing
SSA_NAME_RANGE_INFO and do it for all SSA_NAMEs, not just ones
with integral types.

(cherry picked from commit 077425c890927eefacb765ab5236060de9859e82)

i386: Fix up x86 atomic_bit_test* expanders for !TARGET_HIMODE_MATH [PR103205]

With !TARGET_HIMODE_MATH, the OPTAB_DIRECT expand_simple_binop fail and so
we ICE. We don't really care if they are done promoted in SImode instead.

2021-11-15 Jakub Jelinek <jakub@redhat.com>

PR target/103205
* config/i386/sync.md (atomic_bit_test_and_set<mode>,
atomic_bit_test_and_complement<mode>,
atomic_bit_test_and_reset<mode>): Use OPTAB_WIDEN instead of
OPTAB_DIRECT.

* gcc.target/i386/pr103205.c: New test.

(cherry picked from commit 625eef42e32e65b3da0e65e23a706d228896d01c)

dwarf2out: Fix up field_byte_offset [PR101378]

For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.

* g++.dg/debug/dwarf2/pr101378.C: New test.

(cherry picked from commit 10db7573014008ff867098206f51012d501ab57b)

openmp: For default(none) ignore variables created by ubsan_create_data [PR64888]

We weren't ignoring the ubsan variables created by c-ubsan.c before gimplification
(others are added later). One way to fix this would be to introduce further
UBSAN_ internal functions and lower it later (sanopt pass) like other ifns,
this patch instead recognizes those magic vars by name/name of type and DECL_ARTIFICIAL
and TYPE_ARTIFICIAL.

2021-10-21 Jakub Jelinek <jakub@redhat.com>

PR middle-end/64888
gcc/c-family/
* c-omp.c (c_omp_predefined_variable): Return true also for
ubsan_create_data created artificial variables.
gcc/testsuite/
* c-c++-common/ubsan/pr64888.c: New test.

(cherry picked from commit 40dd9d839e52f679d8eabc1c5ca0ca17a5ccfd14)

c++: Don't reject calls through PMF during constant evaluation [PR102786]

The following testcase incorrectly rejects the c initializer,
while in the s.*a case cxx_eval_* sees .__pfn reads etc.,
in the s.*&S::foo case get_member_function_from_ptrfunc creates
expressions which use INTEGER_CSTs with type of pointer to METHOD_TYPE.
And cxx_eval_constant_expression rejects any INTEGER_CSTs with pointer
type if they aren't 0.
Either we'd need to make sure we defer such folding till cp_fold but the
function and pfn_from_ptrmemfunc is used from lots of places, or
the following patch just tries to reject only non-zero INTEGER_CSTs
with pointer types if they don't point to METHOD_TYPE in the hope that
all such INTEGER_CSTs with POINTER_TYPE to METHOD_TYPE are result of
folding valid pointer-to-member function expressions.
I don't immediately see how one could create such INTEGER_CSTs otherwise,
cast of integers to PMF is rejected and would have the PMF RECORD_TYPE
anyway, etc.

2021-10-19 Jakub Jelinek <jakub@redhat.com>

PR c++/102786
* constexpr.c (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

* g++.dg/cpp2a/constexpr-virtual19.C: New test.

(cherry picked from commit f45610a45236e97616726ca042898d6ac46a082e)

openmp: Fix up handling of OMP_PLACES=threads(1)

When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core. It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15 Jakub Jelinek <jakub@redhat.com>

* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
after creating count places clean up and return immediately.
* testsuite/libgomp.c/places-6.c: New test.
* testsuite/libgomp.c/places-7.c: New test.
* testsuite/libgomp.c/places-8.c: New test.

(cherry picked from commit 4764049dd620affcd3e2658dc7f03a6616370a29)

c++: Fix apply_identity_attributes [PR102548]

The following testcase ICEs on x86_64-linux with -m32 due to a bug in
apply_identity_attributes.  The function is being smart and attempts not
to duplicate the chain unnecessarily, if either there are no attributes
that affect type identity or there is possibly empty set of attributes
that do not affect type identity in the chain followed by attributes
that do affect type identity, it reuses that attribute chain.

The function mishandles the cases where in the chain an attribute affects
type identity and is followed by one or more attributes that don't
affect type identity (and then perhaps some further ones that do).

There are two bugs.  One is that when we notice first attribute that
doesn't affect type identity after first attribute that does affect type
identity (with perhaps some further such attributes in the chain after it),
we want to put into the new chain just attributes starting from
(inclusive) first_ident and up to (exclusive) the current attribute a,
but the code puts into the chain all attributes starting with first_ident,
including the ones that do not affect type identity and if e.g. we have
doesn't0 affects1 doesn't2 affects3 affects4 sequence of attributes, the
resulting sequence would have
affects1 doesn't2 affects3 affects4 affects3 affects4
attributes, i.e. one attribute that shouldn't be there and two attributes
duplicated.  That is fixed by the a2 -> a2 != a change.

The second one is that we ICE once we see second attribute that doesn't
affect type identity after an attribute that affects it.  That is because
first_ident is set to error_mark_node after handling the first attribute
that doesn't affect type identity (i.e. after we've copied the
[first_ident, a) set of attributes to the new chain) to denote that from
that time on, each attribute that affects type identity should be copied
whenever it is seen (the if (as && as->affects_type_identity) code does
that correctly).  But that condition is false and first_ident is
error_mark_node, we enter else if (first_ident) and use TREE_PURPOSE
/TREE_VALUE/TREE_CHAIN on error_mark_node, which ICEs.  When
first_ident is error_mark_node and a doesn't affect type identity,
we want to do nothing.  So that is the && first_ident != error_mark_node
chunk.

2021-10-05  Jakub Jelinek  <jakub@redhat.com>

PR c++/102548
* tree.c (apply_identity_attributes): Fix handling of the
case where an attribute in the list doesn't affect type
identity but some attribute before it does.

* g++.target/i386/pr102548.C: New test.

(cherry picked from commit 737f95bab557584d876f02779ab79fe3cfaacacf)

ubsan: Use -fno{,-}sanitize=float-divide-by-zero for float division by zero recovery [PR102515]

We've been using
-f{,no-}sanitize-recover=integer-divide-by-zero to decide on the float
-fsanitize=float-divide-by-zero instrumentation _abort suffix.
This patch fixes it to use -f{,no-}sanitize-recover=float-divide-by-zero
for it instead.

2021-10-01 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>

PR sanitizer/102515
gcc/c-family/
* c-ubsan.c (ubsan_instrument_division): Check the right
flag_sanitize_recover bit, depending on which sanitization
is done.
gcc/testsuite/
* c-c++-common/ubsan/float-div-by-zero-2.c: New test.

(cherry picked from commit 9c1a633d96926357155d4702b66f8a0ec856a81f)

i386: Don't emit fldpi etc. if -frounding-math [PR102498]

i387 has instructions to store some transcedental numbers into the top of
stack.  The problem is that what exact bit in the last place one gets for
those depends on the current rounding mode, the CPU knows the number with
slightly higher precision.  The compiler assumes rounding to nearest when
comparing them against constants in the IL, but at runtime the rounding
can be different and so some of these depending on rounding mode and the
constant could be 1 ulp higher or smaller than expected.
We only support changing the rounding mode at runtime if the non-default
-frounding-mode option is used, so the following patch just disables
using those constants if that flag is on.

2021-09-28  Jakub Jelinek  <jakub@redhat.com>

PR target/102498
* config/i386/i386.c (standard_80387_constant_p): Don't recognize
special 80387 instruction XFmode constants if flag_rounding_math.

* gcc.target/i386/pr102498.c: New test.

(cherry picked from commit 3b7041e8345c2f1030e58620f28e22d64b2c196b)