git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

Revert "c++: designator and anon struct [PR101767]"

101767 wasn't broken on the 10 branch, so it doesn't need the fix, just the
test.

PR c++/101767

This reverts commit 846bff4d4659d9b2026da574194599f38a00cc79.

Revert "c++: friend with redundant qualification [PR41723]"

PR c++/102300
PR c++/41723

The patch for PR41723 caused PR102300 on trunk; let's just back it out on
the 10 branch.

This reverts commit e41d610696b81e72d1d06db176b281424e32fc23.

gcc/testsuite/ChangeLog:

* g++.dg/template/nested7.C: New test.

c++: rodata and defaulted ctor [PR104142]

Trivial initialization shouldn't bump a variable out of .rodata; if the
result of build_aggr_init is an empty STATEMENT_LIST, throw it away.

PR c++/104142

gcc/cp/ChangeLog:

* decl.c (check_initializer): Check TREE_SIDE_EFFECTS.

gcc/testsuite/ChangeLog:

* g++.dg/opt/const7.C: New test.

c++: low -faligned-new [PR102071]

This test ICEd after the constexpr new patch (r10-3661) because alloc_call
had a NOP_EXPR around it; fixed by moving the NOP_EXPR to alloc_expr. And
the PR pointed out that the size_t cookie might need more alignment, so I
fix that as well.

PR c++/102071

gcc/cp/ChangeLog:

* init.c (build_new_1): Include cookie in alignment. Omit
constexpr wrapper from alloc_call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/aligned-new9.C: New test.

c++: missing dtor with -fno-elide-constructors [PR100838]

tf_no_cleanup only applies to the outermost TARGET_EXPR, and we already
clear it for nested calls in build_over_call, but in this case both
constructor calls came from convert_like, so we need to clear it in the
recursive call as well. This revealed that we were adding an extra
ck_rvalue in direct-initialization cases where it was wrong.

PR c++/100838
PR c++/105265

gcc/cp/ChangeLog:

* call.c (convert_like_internal): Clear tf_no_cleanup when
recursing.
(build_user_type_conversion_1): Only add ck_rvalue if
LOOKUP_ONLYCONVERTING.

gcc/testsuite/ChangeLog:

* g++.dg/init/no-elide2.C: New test.
* g++.dg/cpp0x/initlist-new6.C: New test.

c++: lambda and the current instantiation [PR82980]

When a captured variable is type-dependent, we've expressed the type of the
capture field and proxy with a decltype variant.  But if the type is "the
current instantiation", we need to be able to see that so that we can do
lookup inside it just like we could with the captured variable itself.

I also tried looking through lambda capture in
cp_parser_postfix_dot_deref_expression, but this way seems cleaner.  I plan
to treat more types as deducible in stage 1.

I considered also using this in do_auto_deduction, but think that would be
wrong: [temp.dep.expr] says an id-expression is type-dependent if it is
"associated by name lookup with a variable declared with a type that
contains a placeholder type where the initializer is type-dependent".  That
doesn't clearly exclude deducing a dependent type from the initializer, but
it seems like a barrier, and other implementations agree.

PR c++/82980

gcc/cp/ChangeLog:

* lambda.c (type_deducible_expression_p): New.
(lambda_capture_field_type): Check it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-current-inst1.C: New test.

c++: constexpr trivial -fno-elide-ctors [PR104646]

The constexpr constructor checking code got confused by the expansion of a
trivial copy constructor; we don't need to do that checking for defaulted
ctors, anyway.

PR c++/104646

gcc/cp/ChangeLog:

* constexpr.c (maybe_save_constexpr_fundef): Don't do extra
checks for defaulted ctors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-fno-elide-ctors1.C: New test.

c++: pack init-capture of unresolved overload [PR102629]

Here we were failing to diagnose that the initializer for the capture pack
is an unresolved overload. It turns out that the reason we didn't recognize
the deduction failure in do_auto_deduction was that the individual 'auto' in
the expansion of the capture pack was still marked as a parameter pack, so
we were deducing it to an empty pack instead of failing.

PR c++/102629

gcc/cp/ChangeLog:

* pt.c (gen_elem_of_pack_expansion_instantiation): Clear
TEMPLATE_TYPE_PARAMETER_PACK on auto.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-pack-init7.C: New test.

c++: assignment to temporary [PR59950]

Given build_this of a TARGET_EXPR, cp_build_fold_indirect_ref returns the
TARGET_EXPR. But that's the wrong value category for the result of the
defaulted class assignment operator, which returns an lvalue, so we need to
actually build the INDIRECT_REF.

PR c++/59950

gcc/cp/ChangeLog:

* call.c (build_over_call): Use cp_build_indirect_ref.

gcc/testsuite/ChangeLog:

* g++.dg/init/assign2.C: New test.

c++: empty base constexpr -fno-elide-ctors [PR105245]

The patch for 100111 extended our handling of empty base elision to the case
where the derived class has no other fields, but we still need to make sure
that there's some initializer for the derived object.

PR c++/105245
PR c++/100111

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_store_expression): Build a CONSTRUCTOR
as needed in empty base handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-empty2.C: Add -fno-elide-constructors.

c++: nested generic lambda in DMI [PR101717]

We were already checking COMPLETE_TYPE_P to recognize instantiation of a
generic lambda, but didn't consider that we might be nested in a non-generic
lambda.

PR c++/101717

gcc/cp/ChangeLog:

* lambda.c (lambda_expr_this_capture): Check all enclosing
lambdas for completeness.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-this4.C: New test.

c++: -Wshadow=compatible-local type vs var [PR100608]

The patch for PR92024 changed -Wshadow=compatible-local to warn if either
new or old decl was a type, but the rationale only talked about the case
where both are types. If only one is, they aren't compatible.

PR c++/100608

gcc/cp/ChangeLog:

* name-lookup.c (check_local_shadow): Use -Wshadow=local
if exactly one of 'old' and 'decl' is a type.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wshadow-compatible-local-3.C: New test.

c++: operator new lookup [PR98249]

The standard says, as we quote in the comment just above, that if we don't
find operator new in the allocated type, it should be looked up in the
global scope. This is specifically ::, not just any namespace, and we
already give an error for an operator new declared in any other namespace.

PR c++/98249

gcc/cp/ChangeLog:

* call.c (build_operator_new_call): Just look in ::.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/new3.C: New test.

c++: designator and anon struct [PR101767]

We found .x in the anonymous struct, but then didn't find .y there; we
should decide that means we're done with the struct rather than that the
code is wrong.

PR c++/101767

gcc/cp/ChangeLog:

* decl.c (reshape_init_class): Back out of anon struct
if a designator doesn't match.

gcc/testsuite/ChangeLog:

* g++.dg/ext/anon-struct10.C: New test.

Daily bump.

c++: cxx_eval_array_reference and empty elem type [PR101194]

Here the initializer for x is represented as an empty CONSTRUCTOR due to
its empty element type. So during constexpr evaluation of the ARRAY_REF
x[0], we end up trying to value initialize the omitted element at index 0,
which fails because the element type is not default constructible.

This patch makes cxx_eval_array_reference specifically handle the case
where the element type is an empty type.

PR c++/101194

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): When the element type
is an empty type and the corresponding element is omitted, just
return an empty CONSTRUCTOR instead of attempting value
initialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty16.C: New test.

(cherry picked from commit a688c284dd3848b6c4ea553035f0f9769fb4fbc9)

Daily bump.

[committed] Fix more problems with new linker warnings

gcc/testsuite
* lib/prune.exp (prune_gcc_output): Prune new linker warning.

(cherry picked from commit d993c6dea7c664aa26ee04210c471cfcb4e7d0e0)

g++.dg/gomp/clause-3.C: Fix - missing in r12-438-g1580fc7 [PR100422]

gcc/testsuite/
PR testsuite/100422
* g++.dg/gomp/clause-3.C: Use 'reduction(&:..)' instead of '...(&&:..)'.

(cherry picked from commit af4e4d35f0b84d7c2f57a7b682a09116e9911142)

asan: Fix up asan_redzone_buffer::emit_redzone_byte [PR105396]

On the following testcase, we have in main's frame 3 variables,
some red zone padding, 4 byte d, followed by 12 bytes of red zone padding, then
8 byte b followed by 24 bytes of red zone padding, then 40 bytes c followed
by some red zone padding.
The intended content of shadow memory for that is (note, each byte describes
8 bytes of memory):
f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
left red    d  mr b  middle r c              right red zone

f1 is left red zone magic
f2 is middle red zone magic
f3 is right red zone magic
00 when all 8 bytes are accessible
01-07 when only 1 to 7 bytes are accessible followed by inaccessible bytes

The -fdump-rtl-expand-details dump makes it clear that it misbehaves:
Flushing rzbuffer at offset -160 with: f1 f1 f1 f1
Flushing rzbuffer at offset -128 with: 04 f2 00 00
Flushing rzbuffer at offset -128 with: 00 00 00 f2
Flushing rzbuffer at offset -96 with: f2 f2 00 00
Flushing rzbuffer at offset -64 with: 00 00 00 f3
Flushing rzbuffer at offset -32 with: f3 f3 f3 f3
In the end we end up with
f1 f1 f1 f1 00 00 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
shadow bytes because at offset -128 there are 2 overlapping stores
as asan_redzone_buffer::emit_redzone_byte has flushed the temporary 4 byte
buffer in the middle.

The function is called with an offset and value.  If the passed offset is
consecutive with the prev_offset + buffer size (off == offset), then
we handle it correctly, similarly if the new offset is far enough from the
old one (we then flush whatever was in the buffer and if needed add up to 3
bytes of 00 before actually pushing value.

But what isn't handled correctly is when the offset isn't consecutive to
what has been added last time, but it is in the same 4 byte word of shadow
memory (32 bytes of actual memory), like the above case where
we have consecutive 04 f2 and then skip one shadow memory byte (aka 8 bytes
of real memory) and then want to emit f2.  Emitting that as a store
of little-endian 0x0000f204 followed by a store of 0xf2000000 to the same
address doesn't work, we want to emit 0xf200f204.

The following patch does that by pushing 1 or 2 00 bytes.
Additionally, as a small cleanup, instead of using
      m_shadow_bytes.safe_push (value);
      flush_if_full ();
in all of if, else if and else bodies it sinks those 2 stmts to the end
of function as all do the same thing.

2022-04-27  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/105396
* asan.c (asan_redzone_buffer::emit_redzone_byte): Handle the case
where offset is bigger than off but smaller than m_prev_offset + 32
bits by pushing one or more 0 bytes.  Sink the
m_shadow_bytes.safe_push (value); flush_if_full (); statements from
all cases to the end of the function.

* gcc.dg/asan/pr105396.c: New test.

(cherry picked from commit 9715f10c0651c9549b479b69d67be50ac4bd98a6)

rtlanal: Fix up replace_rtx [PR105333]

The following testcase FAILs, because replace_rtx replaces a REG with
CONST_WIDE_INT inside of a SUBREG, which is an invalid transformation
because a SUBREG relies on SUBREG_REG having non-VOIDmode but
CONST_WIDE_INT has VOIDmode.

replace_rtx already has code to deal with it, but it was doing
it only for CONST_INTs. The following patch does it also for
VOIDmode CONST_DOUBLE or CONST_WIDE_INT.

2022-04-22 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/105333
* rtlanal.c (replace_rtx): Use simplify_subreg or
simplify_unary_operation if CONST_SCALAR_INT_P rather than just
CONST_INT_P.

* gcc.dg/pr105333.c: New test.

(cherry picked from commit 7092b7aea122a91824d048aeb23834cf1d19b1a1)

sparc: Preserve ORIGINAL_REGNO in epilogue_renumber [PR105257]

The following testcase ICEs, because the pic register is
(reg:DI 24 %i0 [109]) and is used in the delay slot of a return.
We invoke epilogue_renumber and that changes it to
(reg:DI 8 %o0) which no longer satisfies sparc_pic_register_p
predicate, so we don't recognize the insn anymore.

The following patch fixes that by preserving ORIGINAL_REGNO if
specified, so we get (reg:DI 8 %o0 [109]) instead.

2022-04-19 Jakub Jelinek <jakub@redhat.com>

PR target/105257
* config/sparc/sparc.c (epilogue_renumber): If ORIGINAL_REGNO,
use gen_raw_REG instead of gen_rtx_REG and copy over also
ORIGINAL_REGNO. Use return 0; instead of /* fallthrough */.

* gcc.dg/pr105257.c: New test.

(cherry picked from commit eeca2b8bd03f57c59c6cf48bf6b9bd6dc86924f6)

c++: Fix up CONSTRUCTOR_PLACEHOLDER_BOUNDARY handling [PR105256]

The CONSTRUCTOR_PLACEHOLDER_BOUNDARY bit is supposed to separate
PLACEHOLDER_EXPRs that should be replaced by one object or subobjects of it
(variable, TARGET_EXPR slot, ...) from other PLACEHOLDER_EXPRs that should
be replaced by different objects or subobjects.
The bit is set when finding PLACEHOLDER_EXPRs inside of a CONSTRUCTOR, not
looking into nested CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctors, and we prevent
elision of TARGET_EXPRs (through TARGET_EXPR_NO_ELIDE) whose initializer
is a CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctor.  The following testcase ICEs
though, we don't replace the placeholders in there at all, because
CONSTRUCTOR_PLACEHOLDER_BOUNDARY isn't set on the TARGET_EXPR_INITIAL
ctor, but on a ctor nested in such a ctor.  replace_placeholders should be
run on the whole TARGET_EXPR slot.

So, the following patch fixes it by moving the CONSTRUCTOR_PLACEHOLDER_BOUNDARY
bit from nested CONSTRUCTORs to the CONSTRUCTOR containing those (but only
if it is closely nested, if there is some other tree sandwiched in between,
it doesn't do it).

2022-04-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/105256
* typeck2.c (process_init_constructor_array,
process_init_constructor_record, process_init_constructor_union): Move
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag from CONSTRUCTOR elements to the
containing CONSTRUCTOR.

* g++.dg/cpp0x/pr105256.C: New test.

(cherry picked from commit eb03e424598d30fed68801af6d6ef6236d32e32e)

i386: Fix ICE caused by ix86_emit_i387_log1p [PR105214]

The following testcase ICEs, because ix86_emit_i387_log1p attempts to
emit something like
  if (cond)
    some_code1;
  else
    some_code2;
and emits a conditional jump using emit_jump_insn (standard way in
the file) and an unconditional jump using emit_jump.
The problem with that is that if there is pending stack adjustment,
it isn't emitted before the conditional jump, but is before the
unconditional jump and therefore stack is adjusted only conditionally
(at the end of some_code1 above), which makes dwarf2 pass unhappy about it
but is a serious wrong-code even if it doesn't ICE.

This can be fixed either by emitting pending stack adjust before the
conditional jump as the following patch does, or by not using
  emit_jump (label2);
and instead hand inlining what that function does except for the
pending stack adjustment, like:
  emit_jump_insn (targetm.gen_jump (label2));
  emit_barrier ();
In that case there will be no stack adjustment in the sequence and
it will be done later on somewhere else.

2022-04-12  Jakub Jelinek  <jakub@redhat.com>

PR target/105214
* config/i386/i386-expand.c (ix86_emit_i387_log1p): Call
do_pending_stack_adjust.

* gcc.dg/asan/pr105214.c: New test.

(cherry picked from commit d481d13786cb84f6294833538133dbd6f39d2e55)

builtins: Fix up expand_builtin_int_roundingfn_2 [PR105211]

The expansion of __builtin_iround{,f,l} etc. builtins in some cases
emits calls to a different fallback builtin. To locate the right builtin
it uses mathfn_built_in_1 with the type of the first argument.
If its TYPE_MAIN_VARIANT is {float,double,long_double}_type_node, all is
fine, but on the following testcase, because GIMPLE considers scalar
float conversions between types with the same mode as useless,
TYPE_MAIN_VARIANT of the arg's type is float32_type_node and because there
isn't __builtin_lroundf32 returns NULL and we ICE.

This patch will first try the type of the first argument of the builtin's
prototype (so that say on sizeof(double)==sizeof(long double) target it honors
whether it was a *l or non-*l call; though even that can't be 100% trusted,
user could incorrectly prototype it) and as fallback the type argument.
If neither works, doesn't fallback.

2022-04-11 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/105211
* builtins.c (expand_builtin_int_roundingfn_2): If mathfn_built_in_1
fails for TREE_TYPE (arg), retry it with
TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))) and if even that
fails, emit call normally.

* gcc.dg/pr105211.c: New test.

(cherry picked from commit 91a38e8a848c61b2e23ee277306dc8cd194d135b)

c-family: Initialize ridpointers for __int128 etc. [PR105186]

The following testcase ICEs with C++ and is incorrectly rejected with C.
The reason is that both FEs use ridpointers identifiers for CPP_KEYWORD
and value or u.value for CPP_NAME e.g. when parsing attributes or OpenMP
directives etc., like:
         /* Save away the identifier that indicates which attribute
            this is.  */
         identifier = (token->type == CPP_KEYWORD)
           /* For keywords, use the canonical spelling, not the
              parsed identifier.  */
           ? ridpointers[(int) token->keyword]
           : id_token->u.value;

         identifier = canonicalize_attr_name (identifier);
I've tried to change those to use ridpointers only if non-NULL and otherwise
use the value/u.value even for CPP_KEYWORDS, but that was a large 10 hunks
patch.

The following patch instead just initializes ridpointers for the __intNN
keywords.  It can't be done earlier before we record_builtin_type as there
are 2 different spellings and if we initialize those ridpointers early, the
second record_builtin_type fails miserably.

2022-04-11  Jakub Jelinek  <jakub@redhat.com>

PR c++/105186
* c-common.c (c_common_nodes_and_builtins): After registering __int%d
and __int%d__ builtin types, initialize corresponding ridpointers
entry.

* c-c++-common/pr105186.c: New test.

(cherry picked from commit 083e8e66d2e90992fa83a53bfc3553dfa91abda1)

fold-const: Fix up make_range_step [PR105189]

The following testcase is miscompiled, because fold_truth_andor
incorrectly folds
(unsigned) foo () >= 0U && 1
into
foo () >= 0
For the unsigned comparison (which is useless in this case,
as >= 0U is always true, but hasn't been folded yet), previous
make_range_step derives exp (unsigned) foo () and +[0U, -]
range for it.  Next we process the NOP_EXPR.  We have special code
for unsigned to signed casts, already earlier punt if low or high
aren't representable in arg0_type or if it is a narrowing conversion.
For the signed to unsigned casts, I think if high is specified we
are still fine, as we punt for non-representable values in arg0_type,
n_high is then still representable and so was smaller or equal to
signed maximum and either low is not present (equivalent to 0U), or
low must be smaller or equal to high and so for unsigned exp
+[low, high] the signed exp +[n_low, n_high] will be correct.
Similarly, if both low and high aren't specified (always true or
always false), it is ok too.
But if we have for unsigned exp +[low, -] or -[low, -], using
+[n_low, -] or -[n_high, -] is incorrect.  Because low is smaller
or equal to signed maximum and high is unspecified (i.e. unsigned
maximum), when signed that range is a union of +[n_low, -] and
+[-, -1] which is equivalent to -[0, n_low-1], unless low
is 0, in that case we can treat it as [-, -].

2022-04-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/105189
* fold-const.c (make_range_step): Fix up handling of
(unsigned) x +[low, -] ranges for signed x if low fits into
typeof (x).

* g++.dg/torture/pr105189.C: New test.

(cherry picked from commit 5e6597064b0c7eb93b8f720afc4aa970eefb0628)

combine: Don't record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]

The testcase in the PR fails under valgrind on mips64 (but only Martin
can reproduce, I couldn't).
But the problem reported there is that SUBST_MODE remembers addresses
into the regno_reg_rtx array, then some splitter needs a new pseudo
and calls gen_reg_rtx, which reallocates the regno_reg_rtx array
and finally undo operation is done and dereferences the old regno_reg_rtx
entry.
The rtx values stored in regno_reg_rtx array seems to be created
by gen_reg_rtx only and since then aren't modified, all we do for it
is adjusting its fields (e.g. adjust_reg_mode that SUBST_MODE does).

So, I think it is useless to use where.r for UNDO_MODE and store
&regno_reg_rtx[regno] in struct undo, we can store just
regno_reg_rtx[regno] (i.e. pointer to the REG itself instead of
pointer to pointer to REG) or could also store just the regno.

The following patch does the latter, and because SUBST_MODE no longer
needs to be a macro, changes all SUBST_MODE uses to subst_mode.

2022-04-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/104985
* combine.c (struct undo): Add where.regno member.
(do_SUBST_MODE): Rename to ...
(subst_mode): ... this.  Change first argument from rtx * into int,
operate on regno_reg_rtx[regno] and save regno into where.regno.
(SUBST_MODE): Remove.
(try_combine): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever.  For UNDO_MODE, use
regno_reg_rtx[undo->where.regno] instead of *undo->where.r.
(undo_to_marker): For UNDO_MODE, use regno_reg_rtx[undo->where.regno]
instead of *undo->where.r.
(simplify_set): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever.

(cherry picked from commit 61bee6aed26eb30b798c75b9a595c9d51e080442)

i386: Fix up ix86_expand_vector_init_general [PR105123]

The following testcase is miscompiled on ia32.
The problem is that at -O0 we end up with:
  vector(4) short unsigned int _1;
  short unsigned int u.0_3;
...
  _1 = {u.0_3, u.0_3, u.0_3, u.0_3};
statement (dead) which is wrongly expanded.
elt is (subreg:HI (reg:SI 83 [ u.0_3 ]) 0), tmp_mode SImode,
so after convert_mode we start with word (reg:SI 83 [ u.0_3 ]).
The intent is to manually broadcast that value to 2 SImode parts,
but because we pass word as target to expand_simple_binop, it will
overwrite (reg:SI 83 [ u.0_3 ]) and we end up with 0:
   10: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
   11: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
   12: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
   13: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
   14: clobber r110:V4HI
   15: r110:V4HI#0=r83:SI
   16: r110:V4HI#4=r83:SI
as the two ors do nothing and two shifts each by 16 left shift it all
away.
The following patch fixes that by using NULL_RTX target, so we expand it as
   10: {r110:SI=r83:SI<<0x10;clobber flags:CC;}
   11: {r111:SI=r110:SI|r83:SI;clobber flags:CC;}
   12: {r112:SI=r83:SI<<0x10;clobber flags:CC;}
   13: {r113:SI=r112:SI|r83:SI;clobber flags:CC;}
   14: clobber r114:V4HI
   15: r114:V4HI#0=r111:SI
   16: r114:V4HI#4=r113:SI
instead.

Another possibility would be to pass NULL_RTX only when word == elt
and word otherwise, where word would necessarily be a pseudo from the first
shift after passing NULL_RTX there once or pass NULL_RTX for the shift and
word for ior.

2022-04-03  Jakub Jelinek  <jakub@redhat.com>

PR target/105123
* config/i386/i386-expand.c (ix86_expand_vector_init_general): Avoid
using word as target for expand_simple_binop when doing ASHIFT and
IOR.

* gcc.target/i386/pr105123.c: New test.

(cherry picked from commit e1a74058b784c845e84a0cf1997b54b984df483d)

ubsan: Fix ICE due to -fsanitize=object-size [PR105093]

The following testcase ICEs, because for a volatile X & RESULT_DECL
ubsan wants to take address of that reference.  instrument_object_size
is called with x, so the base is equal to the access and the var
is automatic, so there is no risk of an out of bounds access for it.
Normally we wouldn't instrument those because we fold address of the
t - address of inner to 0, add constant size of the decl and it is
equal to what __builtin_object_size computes.  But the volatile
results in the subtraction not being folded.

The first hunk fixes it by punting if we access the whole automatic
decl, so that even volatile won't cause a problem.
The second hunk (not strictly needed for this testcase) is similar
to what has been added to asan.cc recently, if we actually take
address of a decl and keep it in the IL, we better mark it addressable.

2022-03-30  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/105093
* ubsan.c (instrument_object_size): If t is equal to inner and
is a decl other than global var, punt.  When emitting call to
UBSAN_OBJECT_SIZE ifn, make sure base is addressable.

* g++.dg/ubsan/pr105093.C: New test.

(cherry picked from commit e3e68fa59ead502c24950298b53c637bbe535a74)

store-merging: Avoid ICEs on roughly ~0ULL/8 sized stores [PR105094]

On the following testcase on 64-bit targets, store-merging sees
a MEM_REF store from {} ctor with "negative" bitsize where bitoff + bitsize
wraps around to very small end offset.  This later confuses the code
so that it allocates just a few bytes of memory but fills in huge amounts of
it.  Later on there is a param_store_merging_max_size size check but due to
the wrap-around we pass that.

The following patch punts on such large bitsizes.

2022-03-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/105094
* gimple-ssa-store-merging.c (mem_valid_for_store_merging): Punt if
bitsize <= 0 rather than just == 0.

* gcc.dg/pr105094.c: New test.

(cherry picked from commit 387e818cda0ffde86f624228c3da1ab28f453685)

c++: Fox template-introduction tentative parsing in class bodies clear colon_corrects_to_scope_p [PR105061]

The concepts support (in particular template introductions from concepts TS)
broke the following testcase, valid unnamed bitfields with dependent
types (or even just typedefs) were diagnosed as typos (: instead of correct
::) in template introduction during their tentative parsing.
The following patch fixes that by not doing this : to :: correction when
member_p is true.

2022-03-30 Jakub Jelinek <jakub@redhat.com>

PR c++/105061
* parser.c (cp_parser_template_introduction): If member_p, temporarily
clear parser->colon_corrects_to_scope_p around tentative parsing of
nested name specifier.

* g++.dg/concepts/pr105061.C: New test.

(cherry picked from commit 4f2795218a6ba6a7b7b9b18ca7a6e390661e1608)

c++: Fix up __builtin_convertvector parsing

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

2022-03-26 Jakub Jelinek <jakub@redhat.com>

* parser.c (cp_parser_postfix_expression)
<case RID_BILTIN_CONVERTVECTOR>: Don't
return cp_build_vec_convert result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.

(cherry picked from commit 1806829e08f14e4cacacec43d7845cc2dad2ddc8)

c++: extern thread_local declarations in constexpr [PR104994]

C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.

2022-03-24 Jakub Jelinek <jakub@redhat.com>

PR c++/104994
* constexpr.c (potential_constant_expression_1): Don't diagnose extern
thread_local declarations.
* decl.c (start_decl): Likewise.

* g++.dg/cpp2a/constexpr-nonlit7.C: New test.

(cherry picked from commit 72124f487ccb5c8065dd5f7b8fba254600b7e611)

i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs [PR104971]

__builtin_ia32_readeflags_u* aren't marked const or pure I think
intentionally, so that they aren't CSEd from different regions of a function
etc. because we don't and can't easily track all dependencies between
it and surrounding code (if somebody looks at the condition flags, it is
dependent on the vast majority of instructions).
But the builtin itself doesn't have any side-effects, so if we ignore the
result of the builtin, there is no point to emit anything.

There is a LRA bug that miscompiles the testcase which this patch makes
latent, which is certainly worth fixing too, but IMHO this change
(and maybe ix86_gimple_fold_builtin too which would fold it even earlier
when it looses lhs) is worth it as well.

2022-03-19 Jakub Jelinek <jakub@redhat.com>

PR middle-end/104971
* config/i386/i386-expand.c
(ix86_expand_builtin) <case IX86_BUILTIN_READ_FLAGS>: If ignore,
don't push/pop anything and just return const0_rtx.

* gcc.target/i386/pr104971.c: New test.

(cherry picked from commit b60bc913cca7439d29a7ec9e9a7f448d8841b43c)

c++: Fix up constexpr evaluation of new with zero sized types [PR104568]

The new expression constant expression evaluation right now tries to
deduce how many elts the array it uses for the heap or heap [] vars
should have (or how many elts should its trailing array have if it has
cookie at the start).  As new is lowered at that point to
(some_type *) ::operator new (size)
or so, it computes it by subtracting cookie size if any from size, then
divides the result by sizeof (some_type).
This works fine for most types, except when sizeof (some_type) is 0,
then we divide by zero; size is then equal to cookie_size (or if there
is no cookie, to 0).
The following patch special cases those cases so that we don't divide
by zero and also recover the original outer_nelts from the expression
by forcing the size not to be folded in that case but be explicit
0 * outer_nelts or cookie_size + 0 * outer_nelts.

Note, we have further issues, we accept-invalid various cases, for both
zero sized elt_type and even non-zero sized elts, we aren't able to
diagnose out of bounds POINTER_PLUS_EXPR like:
constexpr bool
foo ()
{
  auto p = new int[2];
  auto q1 = &p[0];
  auto q2 = &p[1];
  auto q3 = &p[2];
  auto q4 = &p[3];
  delete[] p;
  return true;
}
constexpr bool a = foo ();
That doesn't look like a regression so I think we should resolve that for
GCC 13, but there are 2 problems.  Figure out why
cxx_fold_pointer_plus_expression doesn't deal with the &heap []
etc. cases, and for the zero sized arrays, I think we really need to preserve
whether user wrote an array ref or pointer addition, because in the
&p[3] case if sizeof(p[0]) == 0 we know that if it has 2 elements it is
out of bounds, while if we see p p+ 0 the information if it was
p + 2 or p + 3 in the source is lost.
clang++ seems to handle it fine even in the zero sized cases or with
new expressions.

2022-03-18  Jakub Jelinek  <jakub@redhat.com>

PR c++/104568
* init.c (build_new_constexpr_heap_type): Remove FULL_SIZE
argument and its handling, instead add ITYPE2 argument.  Only
support COOKIE_SIZE != NULL.
(build_new_1): If size is 0, change it to 0 * outer_nelts if
outer_nelts is non-NULL.  Pass type rather than elt_type to
maybe_wrap_new_for_constexpr.
* constexpr.c (build_new_constexpr_heap_type): New function.
(cxx_eval_constant_expression) <case CONVERT_EXPR>:
If elt_size is zero sized type, try to recover outer_nelts from
the size argument to operator new/new[] and pass that as
arg_size to build_new_constexpr_heap_type.  Pass ctx,
non_constant_p and overflow_p to that call too.

* g++.dg/cpp2a/constexpr-new22.C: New test.

(cherry picked from commit 0a0c2c3f06227d46b5e9542dfdd4e0fd2d67d894)

aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

We unshare all RTL created during expansion, but when
aarch64_load_symref_appropriately is called after expansion like in the
following testcases, we use imm in both HIGH and LO_SUM operands.
If imm is some RTL that shouldn't be shared like a non-sharable CONST,
we get at least with --enable-checking=rtl a checking ICE, otherwise might
just get silently wrong code.

The following patch fixes that by copying it if it can't be shared.

2022-03-16 Jakub Jelinek <jakub@redhat.com>

PR target/104910
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Copy
imm rtx.

* gcc.dg/pr104910.c: New test.

(cherry picked from commit 952155629ca1a4dfe7c7b26e53d118a9b853ed4a)

ifcvt: Punt if not onlyjump_p for find_if_case_{1,2} [PR104814]

find_if_case_{1,2} implicitly assumes conditional jumps and rewrites them,
so if they have extra side-effects or are say asm goto, things don't work
well, either the side-effects are lost or we could ICE.
In particular, the testcase below on s390x has there a doloop instruction
that decrements a register in addition to testing it for non-zero and
conditionally jumping based on that.

The following patch fixes that by punting for !onlyjump_p case, i.e.
if there are side-effects in the jump instruction or it isn't a plain PC
setter.

Also, it assumes BB_END (test_bb) will be always non-NULL, because basic
blocks with 2 non-abnormal successor edges should always have some instruction
at the end that determines which edge to take.

2022-03-15 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/104814
* ifcvt.c (find_if_case_1, find_if_case_2): Punt if test_bb doesn't
end with onlyjump_p. Assume BB_END (test_bb) is always non-NULL.

* gcc.c-torture/execute/pr104814.c: New test.

(cherry picked from commit a2645cd8fb33b36d737b310e26f4c47401305c7b)

c, c++, c-family: -Wshift-negative-value and -Wshift-overflow* tweaks for -fwrapv and C++20+ [PR104711]

As mentioned in the PR, different standards have different definition
on what is an UB left shift.  They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is.  As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case.  Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values.  The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.

While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.

2022-03-09  Jakub Jelinek  <jakub@redhat.com>

PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.c (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.c (ubsan_instrument_shift): Adjust comments.
* c-warn.c (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.c (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.c (build_binary_op): Likewise.
gcc/cp/
* constexpr.c (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.c (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.

(cherry picked from commit d76511138dc816ef66fd16f71531f48c37dac3b4)

c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

On the following testcase, we emit "did you mean '__dt '?" in the error
message. "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.

2022-03-08 Jakub Jelinek <jakub@redhat.com>

PR c++/104806
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
identifiers with space at the end.

* g++.dg/spellcheck-pr104806.C: New test.

(cherry picked from commit e480c3c06d20874fd7504bfdcca0b829f8000389)

s390: Fix up *cmp_and_trap_unsigned_int<mode> constraints [PR104775]

The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
insn not being accepted by assembler.
My rough understanding is that in the RSY-b insn format the spot
in other formats used for index registers is used instead for M3 what
kind of comparison it is, so this patch follows what other similar
instructions use for constraint (i.e. one without index register).

2022-03-07 Jakub Jelinek <jakub@redhat.com>

PR target/104775
* config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use
S constraint instead of T in the last alternative.

* gcc.target/s390/pr104775.c: New test.

(cherry picked from commit 2472dcaa8cb9e02e902f83d419c3ee7e0f3d9041)

match.pd: Further complex simplification fixes [PR104675]

Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.

2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>

PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.

* gcc.dg/pr104675-3.c : New test.

(cherry picked from commit f62115c9b770a66c5378f78a2d5866243d560573)

rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]

The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.

* g++.dg/opt/pr104681.C: New test.

(cherry picked from commit 3885a122f817a1b6dca4a84ba9e020d5ab2060af)

match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]

We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.

* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.

(cherry picked from commit 758671b88b78d7629376b118ec6ca6bcfbabbd36)

libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).

(cherry picked from commit 2f59f067610f22c3f2ec9b1516e24b85836676ed)

asan: Mark instrumented vars addressable [PR102656]

We ICE on the following testcase, because the asan1 pass decides to
instrument
  <retval>.x = 0;
and does that by
  _13 = &<retval>.x;
  .ASAN_CHECK (7, _13, 4, 4);
  <retval>.x = 0;
and later sanopt pass turns that into:
  _39 = (unsigned long) &<retval>.x;
  _40 = _39 >> 3;
  _41 = _40 + 2147450880;
  _42 = (signed char *) _41;
  _43 = *_42;
  _44 = _43 != 0;
  _45 = _39 & 7;
  _46 = (signed char) _45;
  _47 = _46 + 3;
  _48 = _47 >= _43;
  _49 = _44 & _48;
  if (_49 != 0)
    goto <bb 10>; [0.05%]
  else
    goto <bb 9>; [99.95%]

  <bb 10> [local count: 536864]:
  __builtin___asan_report_store4 (_39);

  <bb 9> [local count: 1073741824]:
  <retval>.x = 0;
The problem is during expansion, <retval> isn't marked TREE_ADDRESSABLE,
even when we take its address in (unsigned long) &<retval>.x.

Now, instrument_derefs has code to avoid the instrumentation altogether
if we can prove the access is within bounds of an automatic variable in the
current function and the var isn't TREE_ADDRESSABLE (or we don't instrument
use after scope), but we do it solely for VAR_DECLs.

I think we should treat RESULT_DECLs exactly like that too, which is what
the following patch does.  I must say I'm unsure about PARM_DECLs, those can
have different cases, either they are fully or partially passed in
registers, then if we take parameter's address, they are in a local copy
inside of a function and so work like those automatic vars.  But if they
are fully passed in memory, we typically just take address of the slot
and in that case they live in the caller's frame.  It is true we don't
(can't) put any asan padding in between the arguments, so all asan could
detect in that case is if caller passes fewer on stack arguments or smaller
arguments than callee accepts.  Anyway, as I'm unsure, I haven't added
PARM_DECLs to that case.

And another thing is, when we actually build_fold_addr_expr, we need to
mark_addressable the inner if it isn't addressable already.

2022-02-19  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/102656
* asan.c (instrument_derefs): If inner is a RESULT_DECL and access is
known to be within bounds, treat it like automatic variables.
If instrumenting access and inner is {VAR,PARM,RESULT}_DECL from
current function and !TREE_STATIC which is not TREE_ADDRESSABLE, mark
it addressable.

(cherry picked from commit 9e3bbb4a8024121eb0fa675cb1f074218c1345a6)

valtrack: Avoid creating raw SUBREGs with VOIDmode argument [PR104557]

After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations  we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them.  But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there.  We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.

2022-02-17  Jakub Jelinek  <jakub@redhat.com>

PR debug/104557
* valtrack.c (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.

* gcc.dg/dfp/pr104557.c: New test.

(cherry picked from commit 1c2b44b52364cb5661095b346de794bc7ff02866)

combine: Fix up -fcompare-debug issue in the combiner [PR104544]

On the following testcase on aarch64-linux, we behave differently
with -g and -g0.

The problem is that on:
(insn 10011 10010 10012 2 (set (reg:CC 66 cc)
        (compare:CC (reg:DI 105)
            (const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi}
     (expr_list:REG_DEAD (reg:DI 105)
        (nil)))
(insn 10012 10011 10013 2 (set (reg:SI 109)
        (eq:SI (reg:CC 66 cc)
            (const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi}
     (expr_list:REG_DEAD (reg:CC 66 cc)
        (nil)))
(insn 10013 10012 10016 2 (set (reg:DI 110)
        (zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 109)
        (nil)))
(insn 10016 10013 10017 2 (parallel [
            (set (reg:CC 66 cc)
                (compare:CC (const_int 0 [0])
                    (reg:DI 110)))
            (set (reg:DI 111)
                (neg:DI (reg:DI 110)))
        ]) "pr104544.c":18:3 281 {negdi_carryout}
     (expr_list:REG_DEAD (reg:DI 110)
        (nil)))
...
(debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1
     (nil))
(debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1
     (nil))
(insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ])
        (ior:DI (reg:DI 111)
            (reg:DI 112))) "pr104544.c":11:6 496 {iordi3}
     (expr_list:REG_DEAD (reg:DI 112)
        (expr_list:REG_DEAD (reg:DI 111)
            (nil))))
we successfully split 3 insns into two:

Trying 10011, 10013 -> 10016:
10011: cc:CC=cmp(r105:DI,0)
      REG_DEAD r105:DI
10013: r110:DI=cc:CC==0
      REG_DEAD cc:CC
10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;}
      REG_DEAD r110:DI
Failed to match this instruction:
(parallel [
        (set (reg:CC 66 cc)
            (compare:CC (reg:DI 105)
                (const_int 0 [0])))
        (set (reg:DI 111)
            (neg:DI (eq:DI (reg:DI 105)
                    (const_int 0 [0]))))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:CC 66 cc)
            (compare:CC (reg:DI 105)
                (const_int 0 [0])))
        (set (reg:DI 111)
            (neg:DI (eq:DI (reg:DI 105)
                    (const_int 0 [0]))))
    ])
Successfully matched this instruction:
(set (reg:DI 111)
    (neg:DI (eq:DI (reg:DI 105)
            (const_int 0 [0]))))
Successfully matched this instruction:
(set (reg:CC 66 cc)
    (compare:CC (reg:DI 105)
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg:DI 112)
    (neg:DI (eq:DI (reg:CC 66 cc)
            (const_int 0 [0]))))
allowing combination of insns 10011, 10013 and 10016
original costs 4 + 4 + 4 = 16
replacement costs 4 + 4 = 12
deferring deletion of insn with uid = 10011.

but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between.  So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.

The following patch fixes that by instead ignoring debug insns during the
searching.  We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.

2022-02-16  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/104544
* combine.c (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.

* gcc.dg/pr104544.c: New test.

(cherry picked from commit f997eef5654f782bedb985c9285862c4d76b3209)

c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]

The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.

2022-02-16 Jakub Jelinek <jakub@redhat.com>

PR c/104510
* c-common.c (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.

* gcc.dg/dfp/pr104510.c: New test.

(cherry picked from commit 6e74122f0de6748b3fd0ed9183090cd7c61fb53e)

sanitizer: Use glibc _thread_db_sizeof_pthread symbol if present

I've cherry-picked following fix from llvm-project. Recent glibcs
have _thread_db_sizeof_pthread symbol variable which contains the
size of struct pthread, so that sanitizers don't need to guess that
and risk that it will change again.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

* sanitizer_common/sanitizer_linux_libcdep.cpp: Cherry-pick
llvm-project revision ef14b78d9a144ba81ba02083fe21eb286a88732b.

(cherry picked from commit c4c0aa60891daeb4ea5a7c265bd681038f6d8271)

openmp: Make finalize_task_copyfn order reproduceable [PR104517]

The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

PR debug/104517
* omp-low.c (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.

(cherry picked from commit 6a0d6e7ca9b9e338e82572db79c26168684a7441)

c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]

return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.

potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine.  I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label.  For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.

I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.

2022-02-14  Jakub Jelinek  <jakub@redhat.com>

PR c++/104513
* constexpr.c (potential_constant_expression_1) <case GOTO_EXPR>:
Don't punt if returns (target).

* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.

(cherry picked from commit 02a981a8e512934a990d1427d14e8e884409fade)

asan: Fix up address sanitizer instrumentation of __builtin_alloca* if it can throw [PR104449]

With -fstack-check* __builtin_alloca* can throw and the asan
instrumentation of this builtin wasn't prepared for that case.
The following patch fixes that by replacing the builtin with the
replacement builtin and emitting any further insns on the fallthru
edge.

I haven't touched the hwasan code which most likely suffers from the
same problem.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/104449
* asan.c: Include tree-eh.h.
(handle_builtin_alloca): Handle the case when __builtin_alloca or
__builtin_alloca_with_align can throw.

* gcc.dg/asan/pr104449.c: New test.
* g++.dg/asan/pr104449.C: New test.

(cherry picked from commit f0c7367b8802c47efaad87b1f2126fe6350d8b47)

i386: Fix up cvtsd2ss splitter [PR104502]

The following testcase ICEs, because AVX512F is enabled, AVX512VL is not,
and the cvtsd2ss insn has %xmm0-15 as output operand and %xmm16-31 as
input operand. For output operand %xmm16+ the splitter just gives up
in such case, but for such input it just emits vmovddup which requires
AVX512VL if either operand is EXT_REX_SSE_REG_P (when it is 128-bit).

The following patch fixes it by treating that case like the pre-SSE3
output != input case - move the input to output and do everything on
the output reg which is known to be < %xmm16.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR target/104502
* config/i386/i386.md (cvtsd2ss splitter): If operands[1] is xmm16+
and AVX512VL isn't available, move operands[1] to operands[0] first.

* gcc.target/i386/pr104502.c: New test.

(cherry picked from commit 0538d42cdd68f6b65d72ed7768f1d00ba44f8631)

c++: Fix up constant expression __builtin_convertvector folding [PR104472]

The following testcase ICEs, because due to the -frounding-math
fold_const_call fails, which is it returns NULL, and returning NULL from
cxx_eval* is wrong, all the callers rely on them to either return folded
value or original with *non_constant_p = true.

The following patch does that, and additionally falls through into the
default case where there is diagnostics for the !ctx->quiet case too.

2022-02-11 Jakub Jelinek <jakub@redhat.com>

PR c++/104472
* constexpr.c (cxx_eval_internal_function) <case IFN_VEC_CONVERT>:
Only return fold_const_call result if it is non-NULL. Otherwise
fall through into the default: case to return t, set *non_constant_p
and emit diagnostics if needed.

* g++.dg/cpp0x/constexpr-104472.C: New test.

(cherry picked from commit 84993d94e13ad2ab3aee151bb5a5e767cf75d51e)

combine: Fix ICE with substitution of CONST_INT into PRE_DEC argument [PR104446]

The following testcase ICEs, because combine substitutes
(insn 10 9 11 2 (set (reg/v:SI 7 sp [ a ])
        (const_int 0 [0])) "pr104446.c":9:5 81 {*movsi_internal}
     (nil))
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
forming
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (const_int 0 [0])) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
which is invalid RTL (pre_dec's argument must be a REG).
I know substitution creates various forms of invalid RTL and hopes that
invalid RTL just won't recog.
But unfortunately in this case we ICE before we get to recog, as
try_combine does:
  if (n_auto_inc)
    {
      int new_n_auto_inc = 0;
      for_each_inc_dec (newpat, count_auto_inc, &new_n_auto_inc);

      if (n_auto_inc != new_n_auto_inc)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "Number of auto_inc expressions changed\n");
          undo_all ();
          return 0;
        }
    }
and for_each_inc_dec under the hood will do e.g. for the PRE_DEC case:
    case PRE_DEC:
    case POST_DEC:
      {
        poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
        rtx r1 = XEXP (x, 0);
        rtx c = gen_int_mode (-size, GET_MODE (r1));
        return fn (mem, x, r1, r1, c, data);
      }
and that code rightfully expects that the PRE_DEC operand has non-VOIDmode
(as it needs to be a REG) - gen_int_mode for VOIDmode results in ICE.
I think it is better not to emit the clearly invalid RTL during substitution
like we do for other cases, than to adding workarounds for invalid IL
created by combine to rtlanal.cc and perhaps elsewhere.
As for the testcase, of course it is UB at runtime to modify sp that way,
but if such code is never reached, we must compile it, not to ICE on it.
And I don't see why on other targets which use the autoinc rtxes much more
it couldn't happen with other registers.

2022-02-11  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/104446
* combine.c (subst): Don't substitute CONST_INTs into RTX_AUTOINC
operands.

* gcc.target/i386/pr104446.c: New test.

(cherry picked from commit fb76c0ad35f96505ecd9213849ebc3df6163a0f7)

rs6000: Fix up vspltis_shifted [PR102140]

The following testcase ICEs, because
(const_vector:V4SI [
                (const_int 0 [0]) repeated x3
                (const_int -2147483648 [0xffffffff80000000])
            ])
is recognized as valid easy_vector_constant in between split1 pass and
end of RA.
The problem is that such constants need to be split, and the only
splitter for that is:
(define_split
  [(set (match_operand:VM 0 "altivec_register_operand")
        (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()"
There is only a single splitting pass before RA, so after that finishes,
if something gets matched in between that and end of RA (after that
can_create_pseudo_p () would be no longer true), it will never be
successfully split and we ICE at final.cc time or earlier.

The i386 backend (and a few others) already use
(cfun->curr_properties & PROP_rtl_split_insns)
as a test for split1 pass finished, so that some insns that should be split
during split1 and shouldn't be matched afterwards are properly guarded.

So, the following patch does that for vspltis_shifted too.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/102140
* config/rs6000/rs6000.c (vspltis_shifted): Return false also if
split1 pass has finished already.

* gcc.dg/pr102140.c: New test.

(cherry picked from commit 0c3e491a4e5ae74bfbed6d167d403d262b5a4adc)

libgomp: Fix segfault with posthumous orphan tasks [PR104385]

The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.

(cherry picked from commit 0af7ef050aed9f678d70d79931ede38374fde863)

libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one).  The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
and later on
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

Now, the funlike_invocation_p condition
       if (padding == NULL
           || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
        padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences.  Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
   case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
   (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
   case in stringify_arg)

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/104147
* macro.c (funlike_invocation_p): For padding prefer a token
with val.source non-NULL especially if it has PREV_WHITE set
on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
don't have PREV_WHITE set in flags.

* c-c++-common/cpp/pr104147.c: New test.

(cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens

The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
          pfile->directive_result.src_loc = pragma_token_virt_loc;
          pfile->directive_result.type = CPP_PRAGMA;
          pfile->directive_result.flags = pragma_token->flags;
          pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
  pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
              if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
                {
                  if (pfile->directive_result.type == CPP_PADDING)
                    continue;
                  result = &pfile->directive_result;
                }
but destringize_and_run does not:
  if (pfile->directive_result.type == CPP_PRAGMA)
    {
...
    }
  else
    {
      count = 1;
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
  _Pragma ("omp simd")
  for (int i = 0; i < 64; i++)
    ;
}

Here is a patch that replaces those
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
lines with
      toks = &pfile->avoid_paste;

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

* directives.c (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.

(cherry picked from commit efc46b550f035281e51c340f73fbc9a79655e852)

store-merging: Fix up a -fcompare-debug bug in get_status_for_store_merging [PR104263]

As mentioned in the PRthe following testcase fails, because the last
stmt of a bb with -g is a debug stmt and get_status_for_store_merging
uses gimple_seq_last_stmt (bb_seq (bb)) when testing if it is valid
for store merging. The debug stmt isn't valid, while a stmt at that
position with -g0 is valid and so the divergence.

As we walk the whole bb already, this patch just remembers the last
non-debug stmt, so that we don't need to skip backwards debug stmts at the
end of the bb to find last real stmt.

2022-01-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104263
* gimple-ssa-store-merging.c (get_status_for_store_merging): For
cfun->can_throw_non_call_exceptions && cfun->eh test whether
last non-debug stmt in the bb is store_valid_for_store_merging_p
rather than last stmt.

* gcc.dg/pr104263.c: New test.

(cherry picked from commit a591c71b41e18e4ff86852a974592af4962aef57)

optabs: Don't create pseudos in prepare_cmp_insn when not allowed [PR102478]

cond traps can be created during ce3 after reload (and e.g. PR103028
recently fixed some ce3 cond trap related bug, so I think often that
works fine and we shouldn't disable cond traps after RA altogether),
but it calls prepare_cmp_insn.  This function can fail, so I don't
see why we couldn't make it work after RA (in most cases it already
just works).  The first hunk is just an optimization which doesn't
make sense after RA, so I've guarded it with can_create_pseudo_p.
The second hunk is just a theoretical case, I don't have a testcase for it.
prepare_cmp_insn has some other spots that can create pseudos, like when
both operands have VOIDmode, or when it is BLKmode comparison, or
not OPTAB_DIRECT, but I think none of that applies to ce3, we punt on
BLKmode earlier, use OPTAB_DIRECT and shouldn't be comparing two
VOIDmode CONST_INTs.

2022-01-21  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/102478
* optabs.c (prepare_cmp_insn): If !can_create_pseudo_p (), don't
force_reg constants and for -fnon-call-exceptions fail if copy_to_reg
would be needed.

* gcc.dg/pr102478.c: New test.

(cherry picked from commit c2d9159717b474f9c06dde4d32b48b87164deb50)

match.pd, optabs: Avoid vectorization of {FLOOR,CEIL,ROUND}_{DIV,MOD}_EXPR [PR102860]

power10 has modv4si3 expander and so vectorizes the following testcase
where Fortran modulo is FLOOR_MOD_EXPR.
optabs_for_tree_code indicates that the optab for all the *_MOD_EXPR
variants is umod_optab or smod_optab, but that isn't true, that optab
actually expands just TRUNC_MOD_EXPR.  For the other tree codes expmed.cc
has code how to adjust the TRUNC_MOD_EXPR into those by emitting some
extra comparisons and conditional updates.  Similarly for *_DIV_EXPR,
except in that case it actually needs both division and modulo.

While it would be possible to handle it in expmed.cc for vectors as well,
we'd need to be sure all the vector operations we need for that are
available, and furthermore we wouldn't account for that in the costing.

So, IMHO it is better to stop pretending those non-truncating (and
non-exact) div/mod operations have an optab.  For GCC 13, we should
IMHO pattern match these in tree-vect-patterns.cc and transform them
to truncating div/mod with follow-up adjustments and let the vectorizer
vectorize that.  As written in the PR, for signed operands:
r = x %[fl] y;
is
r = x % y; if (r && (x ^ y) < 0) r += y;
and
d = x /[fl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
and
r = x %[cl] y;
is
r = x % y; if (r && (x ^ y) >= 0) r -= y;
and
d = /[cl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) >= 0) ++d;
(too lazy to figure out rounding div/mod now).  I'll create a PR
for that.
The patch also extends a match.pd optimization that floor_mod on
unsigned operands is actually trunc_mod.

2022-01-19  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102860
* match.pd (x %[fl] y -> x % y): New simplification for
unsigned integral types.
* optabs-tree.c (optab_for_tree_code): Return unknown_optab
for {CEIL,FLOOR,ROUND}_{DIV,MOD}_EXPR with VECTOR_TYPE.

* gfortran.dg/pr102860.f90: New test.

(cherry picked from commit ffc7f200adbdf47f14b3594d9b21855c19cf797a)

c++: Fix handling of temporaries with consteval ctors and non-trivial dtors [PR104055]

The following testcase is miscompiled.  We see the constructor is immediate,
in build_over_call we trigger:
          if (obj_arg && is_dummy_object (obj_arg))
            {
              call = build_cplus_new (DECL_CONTEXT (fndecl), call, complain);
              obj_arg = NULL_TREE;
            }
which makes call a TARGET_EXPR with the dtor in TARGET_EXPR_CLEANUP,
but then call cxx_constant_value on it.  In cxx_eval_outermost_constant_expr
it triggers the:
      else if (TREE_CODE (t) != CONSTRUCTOR)
        {
          r = get_target_expr_sfinae (r, tf_warning_or_error | tf_no_cleanup);
          TREE_CONSTANT (r) = true;
        }
which wraps the CONSTRUCTOR r into a new TARGET_EXPR, but one without
dtors (I think we need e.g. the TREE_CONSTANT for the callers),
and finally build_over_call uses that.

The following patch fixes that by using get_target_expr instead
of get_target_expr_sfinae + TREE_CONSTANT (r) = true if t is
a TARGET_EXPR with non-NULL TARGET_EXPR_CLEANUP.

2022-01-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/104055
* constexpr.c (cxx_eval_outermost_constant_expr): If t is a
TARGET_EXPR with TARGET_EXPR_CLEANUP, use get_target_expr rather
than get_target_expr_sfinae with tf_no_cleanup, and don't set
TREE_CONSTANT.

* g++.dg/cpp2a/consteval27.C: New test.

(cherry picked from commit 1a5145f1e3adf8b2ba4ad416a5ddef59a1e34d48)

c++: Silence -Wuseless-cast warnings during move [PR103480]

This is maybe just a shot in the dark, but IMHO we shouldn't be diagnosing
-Wuseless-cast on casts the compiler adds on its own when calling its move
function.  We don't seem to warn when user calls std::move either.
We call move on elinit (*NON_LVALUE_EXPR <(struct C[2] &&) &D.2497->b>)[0]
so it is already an xvalue_p and try to static_cast it to struct C &&.
But we don't warn e.g. on std::move (std::move (whatever)).

Fixed by not doing the static cast and just returning expr from move
if expr is already an xvalue.

2022-01-11  Jakub Jelinek  <jakub@redhat.com>
    Jason Merrill  <jason@redhat.com>

PR c++/103480
* tree.c (move): If expr is xvalue_p, just return expr without
build_static_cast.

* g++.dg/warn/Wuseless-cast2.C: New test.

(cherry picked from commit 6bba184ccbf47368eaea27ee2c1e7b850526640b)

c-family: Fix up -W*conversion on bitwise &/|/^ [PR101537]

The following testcases emit a bogus -Wconversion warning.  This is because
conversion_warning function doesn't handle BIT_*_EXPR (only unsafe_conversion_p
that is called during the default: case, and that one doesn't handle
SAVE_EXPRs added because the unsigned char & or | operands promoted to int
have side-effects and =| or =& is used.

The patch handles BIT_IOR_EXPR/BIT_XOR_EXPR like the last 2 operands of
COND_EXPR by recursing on the two operands, if either of them doesn't fit
into the narrower type, complain.  BIT_AND_EXPR too, but first it needs to
handle some special cases that unsafe_conversion_p does, namely when one
of the two operands is a constant.

This fixes completely the pr101537.c test and for C also pr103881.c
and doesn't regress anything in the testsuite, for C++ pr103881.c still
emits the bogus warnings.
This is because while the C FE emits in that case a SAVE_EXPR that
conversion_warning can handle already, C++ FE emits
TARGET_EXPR <D.whatever, ...>, something | D.whatever
etc. and conversion_warning handles COMPOUND_EXPR by "recursing" on the
rhs.  To handle that case, we'd need for TARGET_EXPR on the lhs remember
in some hash map the mapping from D.whatever to the TARGET_EXPR and when
we see D.whatever, use corresponding TARGET_EXPR initializer instead.

2022-01-11  Jakub Jelinek  <jakub@redhat.com>

PR c/101537
PR c/103881
gcc/c-family/
* c-warn.c (conversion_warning): Handle BIT_AND_EXPR, BIT_IOR_EXPR
and BIT_XOR_EXPR.
gcc/testsuite/
* c-c++-common/pr101537.c: New test.
* c-c++-common/pr103881.c: New test.

(cherry picked from commit 20e4a5e573e76f4379b353cc736215a5f10cdb84)

c++: Ensure some more that immediate functions aren't gimplified [PR103912]

Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function.  cgraphunit.c then
does:
              /* Preserve a functions function context node.  It will
                 later be needed to output debug info.  */
              if (tree fn = decl_function_context (decl))
                {
                  cgraph_node *origin_node = cgraph_node::get_create (fn);
                  enqueue_node (origin_node);
                }
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.

When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.

The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.

2022-01-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.

* g++.dg/cpp2a/consteval26.C: New test.

(cherry picked from commit 54fa7daefe35cacf4a933947d1802318da193c01)

ifcvt: Check for asm goto at the end of then_bb/else_bb in ifcvt [PR103908]

On the following testcase, RTL ifcvt sees then_bb
(note 7 6 8 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 8 7 9 3 (set (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7fdccf5b0cf0 b>) [1 b+0 S4 A32])
        (const_int 1 [0x1])) "pr103908.c":6:7 81 {*movsi_internal}
     (nil))
(jump_insn 9 8 13 3 (parallel [
            (asm_operands/v ("# insn 1") ("") 0 []
                 []
                 [
                    (label_ref:DI 21)
                ] pr103908.c:7)
            (clobber (reg:CC 17 flags))
        ]) "pr103908.c":7:5 -1
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil))
-> 21)
and similarly else_bb (just with a different asm_operands template).
It checks that those basic blocks have a single successor and
uses last_active_insn which intentionally skips over JUMP_INSNs, sees
both basic blocks contain the same set and merges them (or if the
sets are different, attempts some other noce optimization).
But we can't assume that the jump, even when it has only a single successor,
has no side-effects.

The following patch fixes it by punting if test_bb ends with a JUMP_INSN
that isn't onlyjump_p.

2022-01-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103908
* ifcvt.c (bb_valid_for_noce_process_p): Punt on bbs ending with
asm goto.

* gcc.target/i386/pr103908.c: New test.

(cherry picked from commit 80ad67e2af0620d58d57d0406dc22693cf5b8ca9)

libcpp: Fix up ##__VA_OPT__ handling [PR89971]

In the following testcase we incorrectly error about pasting / token
with padding token (which is a result of __VA_OPT__); instead we should
like e.g. for ##arg where arg is empty macro argument clear PASTE_LEFT
flag of the previous token if __VA_OPT__ doesn't add any real tokens
(which can happen either because the macro doesn't have any tokens
passed to ... (i.e. __VA_ARGS__ expands to empty) or when __VA_OPT__
doesn't have any tokens in between ()s).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR preprocessor/89971
libcpp/
* macro.c (replace_args): For ##__VA_OPT__, if __VA_OPT__ expands
to no tokens at all, drop PASTE_LEFT flag from the previous token.
gcc/testsuite/
* c-c++-common/cpp/va-opt-9.c: New test.

(cherry picked from commit 5545d1edcbdb1701443f94dde7ec97c5ce3e1a6c)

shrink-wrapping: Fix up prologue block discovery [PR103860]

The following testcase is miscompiled, because a prologue which
contains subq $8, %rsp instruction is emitted at the start of
a basic block which contains conditional jump that depends on
flags register set in an earlier basic block, the prologue instruction
then clobbers those flags.
Normally this case is checked by can_get_prologue predicate, but this
is done only at the start of the loop. If we update pro later in the
loop (because some bb shouldn't be duplicated) and then don't push
anything further into vec and the vec is already empty (this can happen
when the new pro is already in bb_with bitmask and either has no successors
(that is the case in the testcase where that bb ends with a trap) or
all the successors are already in bb_with, then the loop doesn't iterate
further and can_get_prologue will not be checked.

The following simple patch makes sure we call can_get_prologue even after
the last former iteration when vec is already empty and only break from
the loop afterwards (and only if the updating of pro done because of
!can_get_prologue didn't push anything into vec again).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/103860
* shrink-wrap.c (try_shrink_wrapping): Make sure can_get_prologue is
called on pro even if nothing further is pushed into vec.

* gcc.dg/pr103860.c: New test.

(cherry picked from commit 1820137ba624d7eb2004a10f9632498b6bc1696a)

loop-invariant: Fix -fcompare-debug failure [PR103837]

In the following testcase we have a -fcompare-debug failure, because
can_move_invariant_reg doesn't ignore DEBUG_INSNs in its decisions.
In the testcase we have due to uninitialized variable:
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and with -g decide not to move the pseudo84 = invariant before the
loop header; in this case not resetting the debug insns might be fine.
But, we could have also:
  pseudo84 = whatever
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and in that case not resetting the debug insns would result in wrong-debug.
And, we don't really have generally a good substitution on what pseudo84
contains, it could inherit various values from different paths.
So, the following patch ignores DEBUG_INSNs in the decisions, and if there
are any that previously prevented the optimization, resets them before
return true.

2021-12-28  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103837
* loop-invariant.c (can_move_invariant_reg): Ignore DEBUG_INSNs in
the decisions whether to return false or continue and right before
returning true reset those debug insns that previously caused
returning false.

* gcc.dg/pr103837.c: New test.

(cherry picked from commit 3c5fd3616f73fbcd241cc3a5e09275c2b0c49bd4)

c: Fix ICE on deferred pragma in unknown attribute arguments [PR103587]

We ICE on the following testcase, because c_parser_balanced_token_sequence
when encountering a deferred pragma will just use c_parser_consume_token
which the FE doesn't allow for CPP_PRAGMA tokens (and if that wasn't
the case, it could ICE on CPP_PRAGMA_EOL similarly).
We don't know in what exact context the pragma appears when we don't
know what those arguments semantically mean, so I think we should just
skip over them, like e.g. the C++ FE does. And, I think (/[/{ vs. )/]/}
from outside of the pragma shouldn't be paired with those inside of
the pragma and it doesn't seem to be necessary to check that inside of
the pragma line itself all the paren kinds are balanced.

2021-12-14 Jakub Jelinek <jakub@redhat.com>

PR c/103587
* c-parser.c (c_parser_balanced_token_sequence): For CPP_PRAGMA,
consume the pragma and silently skip to the pragma eol.

* gcc.dg/pr103587.c: New test.

(cherry picked from commit e163dbbc4433e598cad7e6011b255d1d6ad93a3b)

bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]

On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
  /* Find real size of result (highest non-zero byte).  */
  if (n->base_addr)
    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
  else
    rsize = n->range;
The shifts then shift uint64_t by 64 bits.  For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103435
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
case.

(cherry picked from commit 567d5f3d62fba2a23a9e975f7e7c7b61bb67cf24)

openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384]

As the testcase shows, we weren't handling kind(host) and kind(nohost) properly
in the ACCEL_COMPILERs, the code written in there is valid for the host
compiler only, where if we are maybe offloaded, we defer resolution after IPA,
otherwise return 0 for kind(nohost) and accept it for kind(host).  Note,
omp_maybe_offloaded is false after IPA.  If ACCEL_COMPILER is defined, it is
the other way around, but also we know we are after IPA.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/103384
gcc/
* omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER,
return 0 for kind(host) and continue for kind(nohost).
libgomp/
* testsuite/libgomp.c/declare-variant-2.c: New test.

(cherry picked from commit 5bca26742cf3357bf4e20ec97eee4c7f7de17ce0)

openmp: Fix up handling of reduction clauses on the loop construct [PR102431]

We were using unshare_expr and walk_tree_without_duplicate replacement
of the placeholder vars.  The OMP_CLAUSE_REDUCTION_{INIT,MERGE} can contain
other trees that need to be duplicated though, e.g. BLOCKs referenced in
BIND_EXPR(s), or local VAR_DECLs.  This patch uses the inliner code to copy
all of that.  There is a slight complication that those local VAR_DECLs or
placeholders don't have DECL_CONTEXT set, they will get that only when
they are gimplified later on, so this patch sets DECL_CONTEXT for those
temporarily and resets it afterwards.

2021-11-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102431
* gimplify.c (replace_reduction_placeholders): Remove.
(note_no_context_vars): New function.
(gimplify_omp_loop): For OMP_PARALLEL's BIND_EXPR create a new
BLOCK.  Use copy_tree_body_r with walk_tree instead of unshare_expr
and replace_reduction_placeholders for duplication of
OMP_CLAUSE_REDUCTION_{INIT,MERGE} expressions.  Ensure all mentioned
automatic vars have DECL_CONTEXT set to non-NULL before doing so
and reset it afterwards for those vars and their corresponding
vars.

* c-c++-common/gomp/pr102431.c: New test.
* g++.dg/gomp/pr102431.C: New test.

(cherry picked from commit 5e9b973bd60185f221222022f56db7df3d92250e)

fortran, debug: Fix up DW_AT_rank [PR103315]

For DW_AT_rank we were emitting
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x6     # DW_OP_deref
on 64-bit and
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x6     # DW_OP_deref
on 32-bit.  I think this is wrong, as dtype.rank field in the descriptor
has unsigned char type, not pointer type nor pointer sized integral.
E.g. if we have a
    REAL :: a(..)
dummy argument, which is passed as a reference to the function descriptor,
we want to evaluate a->dtype.rank.  The above DWARF expressions perform
*(uintptr_t *)(a + 0x1c)
and
*(uintptr_t *)(a + 0x10)
respectively.  The following patch changes those to:
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
and
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
which perform
*(unsigned char *)(a + 0x1c)
and
*(unsigned char *)(a + 0x10)
respectively.

2021-11-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/103315
* trans-types.c (gfc_get_array_descr_info): Use DW_OP_deref_size 1
instead of DW_OP_deref for DW_AT_rank.

(cherry picked from commit da17c304e22ba256eba0b03710aa329115163b08)

c++: Fix up -fstrong-eval-order handling of call arguments [PR70796]

For -fstrong-eval-order (default for C++17 and later) we make sure to
gimplify arguments in the right order, but as the following testcase
shows that is not enough.
The problem is that some lvalues can satisfy the is_gimple_val / fb_rvalue
predicate used by gimplify_arg for is_gimple_reg_type typed expressions,
or is_gimple_lvalue / fb_either used for other types.
E.g. in foo we have:
  C::C (&p,  ++i,  ++i)
before gimplification where i is an automatic int variable and without this
patch gimplify that as:
  i = i + 1;
  i = i + 1;
  C::C (&p, i, i);
which means that the ctor is called with the original i value incremented
by 2 in both arguments, while because the call is CALL_EXPR_ORDERED_ARGS
the first argument should be different.  Similarly in qux we have:
  B::B (&p, TARGET_EXPR <D.2274, *(const struct A &) A::operator++ (&i)>,
        TARGET_EXPR <D.2275, *(const struct A &) A::operator++ (&i)>)
and gimplify it as:
      _1 = A::operator++ (&i);
      _2 = A::operator++ (&i);
      B::B (&p, MEM[(const struct A &)_1], MEM[(const struct A &)_2]);
but because A::operator++ returns the passed in argument, again we have
the same value in both cases due to gimplify_arg doing:
      /* Also strip a TARGET_EXPR that would force an extra copy.  */
      if (TREE_CODE (*arg_p) == TARGET_EXPR)
        {
          tree init = TARGET_EXPR_INITIAL (*arg_p);
          if (init
              && !VOID_TYPE_P (TREE_TYPE (init)))
            *arg_p = init;
        }
which is perfectly fine optimization for calls with unordered arguments,
but breaks the ordered ones.
Lastly, in corge, we have before gimplification:
  D::foo (NON_LVALUE_EXPR <p>, 3,  ++p)
and gimplify it as
  p = p + 4;
  D::foo (p, 3, p);
which is again wrong, because the this argument isn't before the
side-effects but after it.
The following patch adds cp_gimplify_arg wrapper, which if ordered
and is_gimple_reg_type forces non-SSA_NAME is_gimple_variable
result into a temporary, and if ordered, not is_gimple_reg_type
and argument is TARGET_EXPR bypasses the gimplify_arg optimization.
So, in foo with this patch we gimplify it as:
  i = i + 1;
  i.0_1 = i;
  i = i + 1;
  C::C (&p, i.0_1, i);
in qux as:
      _1 = A::operator++ (&i);
      D.2312 = MEM[(const struct A &)_1];
      _2 = A::operator++ (&i);
      B::B (&p, D.2312, MEM[(const struct A &)_2]);
where D.2312 is a temporary and in corge as:
  p.9_1 = p;
  p = p + 4;
  D::foo (p.9_1, 3, p);
The is_gimple_reg_type forcing into a temporary should be really cheap
(I think even at -O0 it should be optimized if there is no modification in
between), the aggregate copies might be more expensive but I think e.g. SRA
or FRE should be able to deal with those if there are no intervening
changes.  But still, the patch tries to avoid those when it is cheaply
provable that nothing bad happens (if no argument following it in the
strong evaluation order doesn't have TREE_SIDE_EFFECTS, then even VAR_DECLs
etc. shouldn't be modified after it).  There is also an optimization to
avoid doing that for this or for arguments with reference types as nothing
can modify the parameter values during evaluation of other argument's
side-effects.

I've tried if e.g.
  int i = 1;
  return i << ++i;
doesn't suffer from this problem as well, but it doesn't, the FE uses
  SAVE_EXPR <i>, SAVE_EXPR <i> << ++i;
in that case which gimplifies the way we want (temporary in the first
operand).

2021-11-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/70796
* cp-gimplify.c (cp_gimplify_arg): New function.
(cp_gimplify_expr): Use cp_gimplify_arg instead of gimplify_arg,
pass true as last argument to it if there are any following
arguments in strong evaluation order with side-effects.

* g++.dg/cpp1z/eval-order11.C: New test.

(cherry picked from commit a84177aff7ca86f501d6aa5ef407fac5e71f56fb)

lim: Reset flow sensitive info even for pointers [PR103192]

Since 2014 is lim clearing SSA_NAME_RANGE_INFO for integral SSA_NAMEs
if moving them from conditional contexts inside of a loop into unconditional
before the loop, but as the miscompilation of gimplify.c shows, we need to
treat pointers the same, even for them we need to reset whether the pointer
can/can't be null or the recorded pointer alignment.

This fixes
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/target-in-reduction-2.C (internal compiler error)
-FAIL: libgomp.c++/target-in-reduction-2.C (test for excess errors)
-UNRESOLVED: libgomp.c++/target-in-reduction-2.C compilation failed to produce executable
on both x86_64 and i686.

2021-11-17 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/103192
* tree-ssa-loop-im.c (move_computations_worker): Use
reset_flow_sensitive_info instead of manually clearing
SSA_NAME_RANGE_INFO and do it for all SSA_NAMEs, not just ones
with integral types.

(cherry picked from commit 077425c890927eefacb765ab5236060de9859e82)

i386: Fix up x86 atomic_bit_test* expanders for !TARGET_HIMODE_MATH [PR103205]

With !TARGET_HIMODE_MATH, the OPTAB_DIRECT expand_simple_binop fail and so
we ICE. We don't really care if they are done promoted in SImode instead.

2021-11-15 Jakub Jelinek <jakub@redhat.com>

PR target/103205
* config/i386/sync.md (atomic_bit_test_and_set<mode>,
atomic_bit_test_and_complement<mode>,
atomic_bit_test_and_reset<mode>): Use OPTAB_WIDEN instead of
OPTAB_DIRECT.

* gcc.target/i386/pr103205.c: New test.

(cherry picked from commit 625eef42e32e65b3da0e65e23a706d228896d01c)

dwarf2out: Fix up field_byte_offset [PR101378]

For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.

* g++.dg/debug/dwarf2/pr101378.C: New test.

(cherry picked from commit 10db7573014008ff867098206f51012d501ab57b)

openmp: For default(none) ignore variables created by ubsan_create_data [PR64888]

We weren't ignoring the ubsan variables created by c-ubsan.c before gimplification
(others are added later). One way to fix this would be to introduce further
UBSAN_ internal functions and lower it later (sanopt pass) like other ifns,
this patch instead recognizes those magic vars by name/name of type and DECL_ARTIFICIAL
and TYPE_ARTIFICIAL.

2021-10-21 Jakub Jelinek <jakub@redhat.com>

PR middle-end/64888
gcc/c-family/
* c-omp.c (c_omp_predefined_variable): Return true also for
ubsan_create_data created artificial variables.
gcc/testsuite/
* c-c++-common/ubsan/pr64888.c: New test.

(cherry picked from commit 40dd9d839e52f679d8eabc1c5ca0ca17a5ccfd14)

c++: Don't reject calls through PMF during constant evaluation [PR102786]

The following testcase incorrectly rejects the c initializer,
while in the s.*a case cxx_eval_* sees .__pfn reads etc.,
in the s.*&S::foo case get_member_function_from_ptrfunc creates
expressions which use INTEGER_CSTs with type of pointer to METHOD_TYPE.
And cxx_eval_constant_expression rejects any INTEGER_CSTs with pointer
type if they aren't 0.
Either we'd need to make sure we defer such folding till cp_fold but the
function and pfn_from_ptrmemfunc is used from lots of places, or
the following patch just tries to reject only non-zero INTEGER_CSTs
with pointer types if they don't point to METHOD_TYPE in the hope that
all such INTEGER_CSTs with POINTER_TYPE to METHOD_TYPE are result of
folding valid pointer-to-member function expressions.
I don't immediately see how one could create such INTEGER_CSTs otherwise,
cast of integers to PMF is rejected and would have the PMF RECORD_TYPE
anyway, etc.

2021-10-19 Jakub Jelinek <jakub@redhat.com>

PR c++/102786
* constexpr.c (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

* g++.dg/cpp2a/constexpr-virtual19.C: New test.

(cherry picked from commit f45610a45236e97616726ca042898d6ac46a082e)

openmp: Fix up handling of OMP_PLACES=threads(1)

When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core. It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15 Jakub Jelinek <jakub@redhat.com>

* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
after creating count places clean up and return immediately.
* testsuite/libgomp.c/places-6.c: New test.
* testsuite/libgomp.c/places-7.c: New test.
* testsuite/libgomp.c/places-8.c: New test.

(cherry picked from commit 4764049dd620affcd3e2658dc7f03a6616370a29)

var-tracking: Fix a wrong-debug issue caused by my r10-7665 var-tracking change [PR102441]

Since my r10-7665-g33c45e51b4914008064d9b77f2c1fc0eea1ad060 change, we get
wrong-debug on e.g. the following testcase at -O2 -g on x86_64-linux for the
x parameter:
void bar (int *r);
int
foo (int x)
{
  int r = 0;
  bar (&r);
  return r;
}
At the start of function, we have
        subq    $24, %rsp
        leaq    12(%rsp), %rdi
instructions.  The x parameter is passed in %rdi, but isn't used in the
function and so the leaq instruction overwrites %rdi without remembering
%rdi anywhere.  Before the r10-7665 change (which was trying to fix a large
(3% for 32-bit, 1% for 64-bit x86-64) debug info/loc growth introduced with
r10-7515), the leaq insn above resulted in a MO_VAL_SET micro-operation that
said that the value of sp + 12, a cselib_sp_derived_value_p, is stored into
the %rdi register.  The r10-7665 change added a change to add_stores that
added no micro-operation for the leaq store, with the rationale that the sp
based values can be and will be always computable some other more compact
and primarily more stable way (cfa based expression like DW_OP_fbreg, that
is the same in the whole function).  That is true.  But by throwing the
micro-operation on the floor, we miss another important part of the
MO_VAL_SET, in particular that the destination of the store, %rdi in this
case, now has a different value from what it had before, so the vt_*
dataflow code thinks that even after the leaq instruction %rdi still holds
the x argument value (and changes it to DW_OP_entry_value (%rdi) only in the
middle of the call to bar).  Previously and with the patches below,
the location for x changes already at the end of leaq instruction to
DW_OP_entry_value (%rdi).

My first attempt to fix this was instead of dropping the MO_VAL_SET add
a MO_CLOBBER operation:
--- gcc/var-tracking.c.jj       2021-05-04 21:02:24.196799586 +0200
+++ gcc/var-tracking.c  2021-09-24 19:23:16.420154828 +0200
@@ -6133,7 +6133,9 @@ add_stores (rtx loc, const_rtx expr, voi
     {
       if (preserve)
        preserve_value (v);
-      return;
+      mo.type = MO_CLOBBER;
+      mo.u.loc = loc;
+      goto log_and_return;
     }

   nloc = replace_expr_with_values (oloc);
so don't track that the value lives in the loc destination, but track
that the previous value doesn't live there anymore.  That failed bootstrap
miserably, the vt_* code isn't prepared to see MO_CLOBBER of a MEM that
isn't tracked (e.g. has MEM_EXPR on it that the var-tracking code wants
to track, i.e. track_p in add_stores).  On the other side, thinking about
it more, in the most common case where a cselib_sp_derived_value_p value
is stored into the sp register (and which is the reason why PR94495
testcase got larger), dropping the micro-operation on the floor is the
right thing, because we have that cselib_sp_derived_value_p tracking, any
reads from the sp hard register will be treated as
cselib_sp_derived_value_p.
Then I've tried 3 different patches described below and in the end
what is committed is patch2.
Additionally, I've gathered statistics from cc1plus by always reverting the
var-tracking.c change after finished bootstrap/regtest and rebuilding the
stage3 var-tracking.o and cc1plus, such that it would be comparable.
dwlocstat and .debug_{info,loclists} section sizes detailed below.
patch3 uses MO_VAL_SET (i.e. essentially reversion of the r10-7665
change) when destination is not a REG_P and !track_p, otherwise if
destination is sp drops the micro-operation on the floor (i.e. no change),
otherwise adds a MO_CLOBBER.
patch1 is similar, except it checks for destination not equal to sp and
!track_p, i.e. for !track_p REG_P destinations other than sp it will use
MO_VAL_SET rather than MO_CLOBBER.
Finally, patch2, the shortest patch, uses MO_VAL_SET whenever destination
is not sp and otherwise drops the micro-operation on the floor.
All the 3 patches don't affect the PR94495 testcase, all the changes
there were caused by stores of sp based values into %rsp.

While the patch2 (and patch1 which results in exactly the same sizes)
causes the largest debug loclists/info growth from the 3, it is still quite
minor (0.651% on 64-bit and 0.114% on 32-bit) compared
to the 1% and 3% PR94495 was trying to solve, and I actually think it is the
best thing to do.  Because, if we have say
  int q[10];
  int *p = &q[0];
or similar and we load the &q[0] sp based value into some hard register,
by noting in the debug info that p lives in some hard reg for some part
of the function and a user is trying to change the p var in the debugger,
if we say it lives in some register or memory, there is some chance that
the changing of the value could work successfully (of course, nothing
is guaranteed, we don't have tracking of where each var lives at which
moment for changing purposes (i.e. what register, memory or else you need
to change in order to change behavior of the code)), while if we just say
that p's location is DW_OP_fbreg 16 DW_OP_stack_value, that is a read-only
value one can just print but not change.  Now, for stores of variable
values into the sp register, I don't think we have such an issue, you don't
want debugger to change your stack pointer when user asks to change value
of some variable whose value lives in the stack pointer, that would pretty
much always result in misbehavior of the program.
So, my preference from these 3 is patch2 and that is being committed.

64-bit cc1plus
==============
vanilla
cov%    samples cumul
0..10   1064665/37%     1064665/37%
11..20  35972/1%        1100637/38%
21..30  47969/1%        1148606/40%
31..40  45787/1%        1194393/42%
41..50  57529/2%        1251922/44%
51..60  53974/1%        1305896/46%
61..70  112055/3%       1417951/50%
71..80  79420/2%        1497371/52%
81..90  126225/4%       1623596/57%
91..100 1206682/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a44949f 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff5d046 506e947 00      0   0  1
patch1 (same as patch2)
cov%    samples cumul
0..10   1064685/37%     1064685/37%
11..20  36011/1%        1100696/38%
21..30  47975/1%        1148671/40%
31..40  45799/1%        1194470/42%
41..50  57566/2%        1252036/44%
51..60  54011/1%        1306047/46%
61..70  112068/3%       1418115/50%
71..80  79421/2%        1497536/52%
81..90  126171/4%       1623707/57%
91..100 1206571/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a448f27 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff608bc 52070dd 00      0   0  1
patch3
cov%    samples cumul
0..10   1064698/37%     1064698/37%
11..20  36018/1%        1100716/38%
21..30  47977/1%        1148693/40%
31..40  45804/1%        1194497/42%
41..50  57562/2%        1252059/44%
51..60  54018/1%        1306077/46%
61..70  112071/3%       1418148/50%
71..80  79424/2%        1497572/52%
81..90  126172/4%       1623744/57%
91..100 1206534/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a449548 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff5df39 507acd8 00      0   0  1
So, size of .debug_info+.debug_loclists grows for vanilla -> patch1 (or patch2) by
0.651% and for vanilla -> patch3 by 0.020%.

32-bit cc1plus
==============
vanilla
cov%    samples cumul
0..10   1061892/37%     1061892/37%
11..20  34002/1%        1095894/39%
21..30  43513/1%        1139407/40%
31..40  41667/1%        1181074/42%
41..50  59144/2%        1240218/44%
51..60  47009/1%        1287227/45%
61..70  105069/3%       1392296/49%
71..80  72990/2%        1465286/52%
81..90  125988/4%       1591274/56%
91..100 1208726/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c83d 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebc816e 3fe44fd 00      0   0  1
patch1 (same as patch2)
cov%    samples cumul
0..10   1061999/37%     1061999/37%
11..20  34065/1%        1096064/39%
21..30  43557/1%        1139621/40%
31..40  41690/1%        1181311/42%
41..50  59191/2%        1240502/44%
51..60  47143/1%        1287645/45%
61..70  105045/3%       1392690/49%
71..80  73021/2%        1465711/52%
81..90  125885/4%       1591596/56%
91..100 1208404/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c597 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebca915 401ffad 00      0   0  1
patch3
cov%    samples cumul
0..10   1062006/37%     1062006/37%
11..20  34073/1%        1096079/39%
21..30  43559/1%        1139638/40%
31..40  41693/1%        1181331/42%
41..50  59189/2%        1240520/44%
51..60  47142/1%        1287662/45%
61..70  105054/3%       1392716/49%
71..80  73027/2%        1465743/52%
81..90  125874/4%       1591617/56%
91..100 1208383/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c690 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebca40a 4020a6e 00      0   0  1
So, size of .debug_info+.debug_loclists grows for vanilla -> patch1 (or patch2) by
0.114% and for vanilla -> patch3 by 0.116%.

2021-10-10  Jakub Jelinek  <jakub@redhat.com>

PR debug/102441
* var-tracking.c (add_stores): For cselib_sp_derived_value_p values
use MO_VAL_SET if loc is not sp.

(cherry picked from commit 9583b26f3701ea0456405d84f9a898451a2f7452)

c++: Fix apply_identity_attributes [PR102548]

The following testcase ICEs on x86_64-linux with -m32 due to a bug in
apply_identity_attributes.  The function is being smart and attempts not
to duplicate the chain unnecessarily, if either there are no attributes
that affect type identity or there is possibly empty set of attributes
that do not affect type identity in the chain followed by attributes
that do affect type identity, it reuses that attribute chain.

The function mishandles the cases where in the chain an attribute affects
type identity and is followed by one or more attributes that don't
affect type identity (and then perhaps some further ones that do).

There are two bugs.  One is that when we notice first attribute that
doesn't affect type identity after first attribute that does affect type
identity (with perhaps some further such attributes in the chain after it),
we want to put into the new chain just attributes starting from
(inclusive) first_ident and up to (exclusive) the current attribute a,
but the code puts into the chain all attributes starting with first_ident,
including the ones that do not affect type identity and if e.g. we have
doesn't0 affects1 doesn't2 affects3 affects4 sequence of attributes, the
resulting sequence would have
affects1 doesn't2 affects3 affects4 affects3 affects4
attributes, i.e. one attribute that shouldn't be there and two attributes
duplicated.  That is fixed by the a2 -> a2 != a change.

The second one is that we ICE once we see second attribute that doesn't
affect type identity after an attribute that affects it.  That is because
first_ident is set to error_mark_node after handling the first attribute
that doesn't affect type identity (i.e. after we've copied the
[first_ident, a) set of attributes to the new chain) to denote that from
that time on, each attribute that affects type identity should be copied
whenever it is seen (the if (as && as->affects_type_identity) code does
that correctly).  But that condition is false and first_ident is
error_mark_node, we enter else if (first_ident) and use TREE_PURPOSE
/TREE_VALUE/TREE_CHAIN on error_mark_node, which ICEs.  When
first_ident is error_mark_node and a doesn't affect type identity,
we want to do nothing.  So that is the && first_ident != error_mark_node
chunk.

2021-10-05  Jakub Jelinek  <jakub@redhat.com>

PR c++/102548
* tree.c (apply_identity_attributes): Fix handling of the
case where an attribute in the list doesn't affect type
identity but some attribute before it does.

* g++.target/i386/pr102548.C: New test.

(cherry picked from commit 737f95bab557584d876f02779ab79fe3cfaacacf)

ubsan: Use -fno{,-}sanitize=float-divide-by-zero for float division by zero recovery [PR102515]

We've been using
-f{,no-}sanitize-recover=integer-divide-by-zero to decide on the float
-fsanitize=float-divide-by-zero instrumentation _abort suffix.
This patch fixes it to use -f{,no-}sanitize-recover=float-divide-by-zero
for it instead.

2021-10-01 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>

PR sanitizer/102515
gcc/c-family/
* c-ubsan.c (ubsan_instrument_division): Check the right
flag_sanitize_recover bit, depending on which sanitization
is done.
gcc/testsuite/
* c-c++-common/ubsan/float-div-by-zero-2.c: New test.

(cherry picked from commit 9c1a633d96926357155d4702b66f8a0ec856a81f)

i386: Don't emit fldpi etc. if -frounding-math [PR102498]

i387 has instructions to store some transcedental numbers into the top of
stack.  The problem is that what exact bit in the last place one gets for
those depends on the current rounding mode, the CPU knows the number with
slightly higher precision.  The compiler assumes rounding to nearest when
comparing them against constants in the IL, but at runtime the rounding
can be different and so some of these depending on rounding mode and the
constant could be 1 ulp higher or smaller than expected.
We only support changing the rounding mode at runtime if the non-default
-frounding-mode option is used, so the following patch just disables
using those constants if that flag is on.

2021-09-28  Jakub Jelinek  <jakub@redhat.com>

PR target/102498
* config/i386/i386.c (standard_80387_constant_p): Don't recognize
special 80387 instruction XFmode constants if flag_rounding_math.

* gcc.target/i386/pr102498.c: New test.

(cherry picked from commit 3b7041e8345c2f1030e58620f28e22d64b2c196b)

c++: Fix handling of decls with flexible array members initialized with side-effects [PR88578]

> > Note, if the flexible array member is initialized only with non-constant
> > initializers, we have a worse bug that this patch doesn't solve, the
> > splitting of initializers into constant and dynamic initialization removes
> > the initializer and we don't have just wrong DECL_*SIZE, but nothing is
> > emitted when emitting those vars into assembly either and so the dynamic
> > initialization clobbers other vars that may overlap the variable.
> > I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
> > flexible array member in that case.
>
> Makes sense.

So, the following patch fixes that.

The typeck2.c change makes sure we keep those CONSTRUCTORs around (although
they should be empty because all their elts had side-effects/was
non-constant if it was removed earlier), and the varasm.c change is to avoid
ICEs on those as well as ICEs on other flex array members that had some
initializers without side-effects, but not on the last array element.

The code was already asserting that the (index of the last elt in the
CONSTRUCTOR + 1) times elt size is equal to TYPE_SIZE_UNIT of the local->val
type, which is true for C flex arrays or for C++ if they don't have any
side-effects or the last elt doesn't have side-effects, this patch changes
that to assertion that the TYPE_SIZE_UNIT is greater than equal to the
offset of the end of last element in the CONSTRUCTOR and uses TYPE_SIZE_UNIT
(int_size_in_bytes) in the code later on.

2021-09-15 Jakub Jelinek <jakub@redhat.com>

PR c++/88578
PR c++/102295
gcc/
* varasm.c (output_constructor_regular_field): Instead of assertion
that array_size_for_constructor result is equal to size of
TREE_TYPE (local->val) in bytes, assert that the type size is greater
or equal to array_size_for_constructor result and use type size as
fieldsize.
gcc/cp/
* typeck2.c (split_nonconstant_init_1): Don't throw away empty
initializers of flexible array members if they have non-zero type
size.
gcc/testsuite/
* g++.dg/ext/flexary39.C: New test.
* g++.dg/ext/flexary40.C: New test.

(cherry picked from commit e5d1af8a07ae9fcc40ea5c781c3ad46d20ea12a6)

c++: Update DECL_*SIZE for objects with flexible array members with initializers [PR102295]

The C FE updates DECL_*SIZE for vars which have initializers for flexible
array members for many years, but C++ FE kept DECL_*SIZE the same as the
type size (i.e. as if there were zero elements in the flexible array
member). This results e.g. in ELF symbol sizes being too small.

Note, if the flexible array member is initialized only with non-constant
initializers, we have a worse bug that this patch doesn't solve, the
splitting of initializers into constant and dynamic initialization removes
the initializer and we don't have just wrong DECL_*SIZE, but nothing is
emitted when emitting those vars into assembly either and so the dynamic
initialization clobbers other vars that may overlap the variable.
I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
flexible array member in that case.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102295
* decl.c (layout_var_decl): For aggregates ending with a flexible
array member, add the size of the initializer for that member to
DECL_SIZE and DECL_SIZE_UNIT.

* g++.target/i386/pr102295.C: New test.

(cherry picked from commit 818c505188ff5cd8eb048eb0e614c4ef732225bd)

c++: Fix __is_*constructible/assignable for templates [PR102305]

is_xible_helper returns error_mark_node (i.e. false from the traits)
for abstract classes by testing ABSTRACT_CLASS_TYPE_P (to) early.
Unfortunately, as the testcase shows, that doesn't work on class templates
that haven't been instantiated yet, ABSTRACT_CLASS_TYPE_P for them is false
until it is instantiated, which is done when the routine later constructs
a dummy object with that type.

The following patch fixes this by calling complete_type first, so that
ABSTRACT_CLASS_TYPE_P test will work properly, while keeping the handling
of arrays with unknown bounds, or incomplete types where it is done
currently.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102305
* method.c (is_xible_helper): Call complete_type on to.

* g++.dg/cpp0x/pr102305.C: New test.

(cherry picked from commit f008fd3a480e3718436156697ebe7eeb47841457)

i386: Fix up @xorsign<mode>3_1 [PR102224]

As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear. We need to use earlyclobber, we require dest and op1 to be
the same but op0 must be different because we overwrite
op1 first.

2021-09-08 Jakub Jelinek <jakub@redhat.com>

PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobber for output operand.

* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.

(cherry picked from commit a7b626d98a9a821ffb33466818d6aa86cac1d6fd)

dwarf2out: Emit DW_AT_location for global register vars during early dwarf [PR101905]

The following patch emits DW_AT_location for global register variables
already during early dwarf, since usually late_global_decl hook isn't even
called for those, as nothing needs to be emitted for them.

2021-08-23 Jakub Jelinek <jakub@redhat.com>

PR debug/101905
* dwarf2out.c (gen_variable_die): Add DW_AT_location for global
register variables already during early_dwarf if possible.

* gcc.dg/guality/pr101905.c: New test.

(cherry picked from commit b284053bb75661fc1bf13c275f3ba5364bb17608)

ubsan: Fix ICEs with DECL_REGISTER tests [PR101624]

The following testcase ICEs, because the base is a CONST_DECL for
the Fortran parameter, and ubsan/sanopt uses DECL_REGISTER macro on it.
/* In VAR_DECL and PARM_DECL nodes, nonzero means declared `register'. */
#define DECL_REGISTER(NODE) (DECL_WRTL_CHECK (NODE)->decl_common.decl_flag_0)
while CONST_DECL doesn't satisfy DECL_WRTL_CHECK.

The following patch checks explicitly for VAR_DECL/PARM_DECL/RESULT_DECL
only before using DECL_REGISTER, assumes other decls aren't DECL_REGISTER.
Not really sure about RESULT_DECL but it at least satisfies DECL_WRTL_CHECK...

2021-07-28 Jakub Jelinek <jakub@redhat.com>

PR middle-end/101624
* ubsan.c (maybe_instrument_pointer_overflow,
instrument_object_size): Only test DECL_REGISTER on VAR_DECLs,
PARM_DECLs or RESULT_DECLs.
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.

* gfortran.dg/ubsan/ubsan.exp: New file.
* gfortran.dg/ubsan/pr101624.f90: New test.

(cherry picked from commit 49e28c02a95a4bee981e69a80950309869580151)

expmed: Fix store_integral_bit_field [PR101562]

Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
     This expression code is used in only one context: as the
     destination operand of a 'set' expression.  In addition, the
     operand of this expression must be a non-paradoxical 'subreg'
     expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.

* gcc.c-torture/compile/pr101562.c: New test.

(cherry picked from commit 8408d34570c9fe9f3d22a25a76df2a4c64f08477)

openmp: Fix up omp_check_private [PR101535]

The target data construct shouldn't affect omp_check_private, unless
the decl there is privatized (use_device_* clauses).  The routine
had some code for that, but it just did continue; in a loop that looped
only if the region type is one of selected 4 kinds, so effectively resulted
in return false; instead of looping again.  And not diagnosing lastprivate
(or reduction etc.) on a variable that is private to containing parallel
results in ICEs later on, as there is no original list item to which store
the last result.
The target construct is unclear as it has an implicit parallel region
and it is not obvious if the data privatization clauses on the construct
shall be treated as data privatization on the implicit parallel or just
on the target.  For now treat those as privatization on the implicit
parallel, but treat map clauses as shared on the implicit parallel.

2021-07-21  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/101535
* gimplify.c (omp_check_private): Properly skip ORT_TARGET_DATA
contexts in which decl isn't privatized and for ORT_TARGET return
false if decl is mapped.

* c-c++-common/gomp/pr101535-1.c: New test.
* c-c++-common/gomp/pr101535-2.c: New test.

(cherry picked from commit b136b7a78774107943fe94051c42b5a968a3ad3f)

c++: Ensure OpenMP reduction with reference type references complete type [PR101516]

The following testcase ICEs because we haven't verified if reduction decl
has reference type that TREE_TYPE of the reference is a complete type,
require_complete_type on the decl doesn't ensure that.

2021-07-21 Jakub Jelinek <jakub@redhat.com>

PR c++/101516
* semantics.c (finish_omp_reduction_clause): Also call
complete_type_or_else and return true if it fails.

* g++.dg/gomp/pr101516.C: New test.

(cherry picked from commit aea199f96cf116ba4c81426207acde371556610c)

rs6000: Fix up easy_vector_constant_msb handling [PR101384]

The following gcc.dg/pr101384.c testcase is miscompiled on
powerpc64le-linux.
easy_altivec_constant has code to try construct vector constants with
different element sizes, perhaps different from CONST_VECTOR's mode. But as
written, that works fine for vspltis[bhw] cases, but not for the vspltisw
x,-1; vsl[bhw] x,x,x case, because that creates always a V16QImode, V8HImode
or V4SImode constant containing broadcasted constant with just the MSB set.
The vspltis_constant function etc. expects the vspltis[bhw] instructions
where the small [-16..15] or even [-32..30] constant is sign-extended to the
remaining step bytes, but that is not the case for the 0x80...00 constants,
with step 1 we can't handle e.g.
{ 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff }
vectors but do want to handle e.g.
{ 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80 }
and similarly with copies 1 we do want to handle e.g.
{ 0x80808080, 0x80808080, 0x80808080, 0x80808080 }.

This is a simpler version of the fix for backports, which limits the EASY_VECTOR_MSB case
matching to step == 1 && copies == 1, because that is the only case the
splitter handles correctly.

2021-07-20 Jakub Jelinek <jakub@redhat.com>

PR target/101384
* config/rs6000/rs6000.c (vspltis_constant): Accept EASY_VECTOR_MSB
only if step and copies are equal to 1.

* gcc.dg/pr101384.c: New test.

(cherry picked from commit dc386b020869ad0095cf58f8c76a40ea457e7a2c)

openmp - Fix up && and || reductions [PR94366]

As the testcase shows, the special treatment of && and || reduction combiners
where we expand them as omp_out = (omp_out != 0) && (omp_in != 0) (or with ||)
is not needed just for &&/|| on floating point or complex types, but for all
&&/|| reductions - when expanded as omp_out = omp_out && omp_in (not in C but
GENERIC) it is actually gimplified into NOP_EXPRs to bool from both operands,
which turns non-zero values multiple of 2 into 0 rather than 1.

This patch just treats all &&/|| the same and furthermore uses bool type
instead of int for the comparisons.

2021-07-01 Jakub Jelinek <jakub@redhat.com>

PR middle-end/94366
gcc/
* omp-low.c (lower_rec_input_clauses): Rename is_fp_and_or to
is_truth_op, set it for TRUTH_*IF_EXPR regardless of new_var's type,
use boolean_type_node instead of integer_type_node as NE_EXPR type.
(lower_reduction_clauses): Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/pr94366.c: New test.

(cherry picked from commit 91c771ec8a3b649765de3e0a7b04cf946c6649ef)

OpenMP: Support complex/float in && and || reduction

C/C++ permit logical AND and logical OR also with floating-point or complex
arguments by doing an unequal zero comparison; the result is an 'int' with
value one or zero. Hence, those are also permitted as reduction variable,
even though it is not the most sensible thing to do.

gcc/c/ChangeLog:

* c-typeck.c (c_finish_omp_clauses): Accept float + complex
for || and && reductions.

gcc/cp/ChangeLog:

* semantics.c (finish_omp_reduction_clause): Accept float + complex
for || and && reductions.

gcc/ChangeLog:

* omp-low.c (lower_rec_input_clauses, lower_reduction_clauses): Handle
&& and || with floating-point and complex arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/clause-1.c: Use 'reduction(&:..)' instead of '...(&&:..)'.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/reduction-2.c: New test.
* testsuite/libgomp.c-c++-common/reduction-3.c: New test.

(cherry picked from commit 1580fc764423bf89e9b853aaa8c65999e37ccb8b)