git.ipfire.org Git - thirdparty/gcc.git/log

c++: Fox template-introduction tentative parsing in class bodies clear colon_corrects_to_scope_p [PR105061]

The concepts support (in particular template introductions from concepts TS)
broke the following testcase, valid unnamed bitfields with dependent
types (or even just typedefs) were diagnosed as typos (: instead of correct
::) in template introduction during their tentative parsing.
The following patch fixes that by not doing this : to :: correction when
member_p is true.

2022-03-30 Jakub Jelinek <jakub@redhat.com>

PR c++/105061
* parser.c (cp_parser_template_introduction): If member_p, temporarily
clear parser->colon_corrects_to_scope_p around tentative parsing of
nested name specifier.

* g++.dg/concepts/pr105061.C: New test.

(cherry picked from commit 4f2795218a6ba6a7b7b9b18ca7a6e390661e1608)

c++: Fix up __builtin_convertvector parsing

Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.

2022-03-26 Jakub Jelinek <jakub@redhat.com>

* parser.c (cp_parser_postfix_expression)
<case RID_BILTIN_CONVERTVECTOR>: Don't
return cp_build_vec_convert result right away, instead
set postfix_expression to it and break.

* c-c++-common/builtin-convertvector-3.c: New test.

(cherry picked from commit 1806829e08f14e4cacacec43d7845cc2dad2ddc8)

c++: extern thread_local declarations in constexpr [PR104994]

C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.

2022-03-24 Jakub Jelinek <jakub@redhat.com>

PR c++/104994
* constexpr.c (potential_constant_expression_1): Don't diagnose extern
thread_local declarations.
* decl.c (start_decl): Likewise.

* g++.dg/cpp2a/constexpr-nonlit7.C: New test.

(cherry picked from commit 72124f487ccb5c8065dd5f7b8fba254600b7e611)

i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs [PR104971]

__builtin_ia32_readeflags_u* aren't marked const or pure I think
intentionally, so that they aren't CSEd from different regions of a function
etc. because we don't and can't easily track all dependencies between
it and surrounding code (if somebody looks at the condition flags, it is
dependent on the vast majority of instructions).
But the builtin itself doesn't have any side-effects, so if we ignore the
result of the builtin, there is no point to emit anything.

There is a LRA bug that miscompiles the testcase which this patch makes
latent, which is certainly worth fixing too, but IMHO this change
(and maybe ix86_gimple_fold_builtin too which would fold it even earlier
when it looses lhs) is worth it as well.

2022-03-19 Jakub Jelinek <jakub@redhat.com>

PR middle-end/104971
* config/i386/i386-expand.c
(ix86_expand_builtin) <case IX86_BUILTIN_READ_FLAGS>: If ignore,
don't push/pop anything and just return const0_rtx.

* gcc.target/i386/pr104971.c: New test.

(cherry picked from commit b60bc913cca7439d29a7ec9e9a7f448d8841b43c)

c++: Fix up constexpr evaluation of new with zero sized types [PR104568]

The new expression constant expression evaluation right now tries to
deduce how many elts the array it uses for the heap or heap [] vars
should have (or how many elts should its trailing array have if it has
cookie at the start).  As new is lowered at that point to
(some_type *) ::operator new (size)
or so, it computes it by subtracting cookie size if any from size, then
divides the result by sizeof (some_type).
This works fine for most types, except when sizeof (some_type) is 0,
then we divide by zero; size is then equal to cookie_size (or if there
is no cookie, to 0).
The following patch special cases those cases so that we don't divide
by zero and also recover the original outer_nelts from the expression
by forcing the size not to be folded in that case but be explicit
0 * outer_nelts or cookie_size + 0 * outer_nelts.

Note, we have further issues, we accept-invalid various cases, for both
zero sized elt_type and even non-zero sized elts, we aren't able to
diagnose out of bounds POINTER_PLUS_EXPR like:
constexpr bool
foo ()
{
  auto p = new int[2];
  auto q1 = &p[0];
  auto q2 = &p[1];
  auto q3 = &p[2];
  auto q4 = &p[3];
  delete[] p;
  return true;
}
constexpr bool a = foo ();
That doesn't look like a regression so I think we should resolve that for
GCC 13, but there are 2 problems.  Figure out why
cxx_fold_pointer_plus_expression doesn't deal with the &heap []
etc. cases, and for the zero sized arrays, I think we really need to preserve
whether user wrote an array ref or pointer addition, because in the
&p[3] case if sizeof(p[0]) == 0 we know that if it has 2 elements it is
out of bounds, while if we see p p+ 0 the information if it was
p + 2 or p + 3 in the source is lost.
clang++ seems to handle it fine even in the zero sized cases or with
new expressions.

2022-03-18  Jakub Jelinek  <jakub@redhat.com>

PR c++/104568
* init.c (build_new_constexpr_heap_type): Remove FULL_SIZE
argument and its handling, instead add ITYPE2 argument.  Only
support COOKIE_SIZE != NULL.
(build_new_1): If size is 0, change it to 0 * outer_nelts if
outer_nelts is non-NULL.  Pass type rather than elt_type to
maybe_wrap_new_for_constexpr.
* constexpr.c (build_new_constexpr_heap_type): New function.
(cxx_eval_constant_expression) <case CONVERT_EXPR>:
If elt_size is zero sized type, try to recover outer_nelts from
the size argument to operator new/new[] and pass that as
arg_size to build_new_constexpr_heap_type.  Pass ctx,
non_constant_p and overflow_p to that call too.

* g++.dg/cpp2a/constexpr-new22.C: New test.

(cherry picked from commit 0a0c2c3f06227d46b5e9542dfdd4e0fd2d67d894)

aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

We unshare all RTL created during expansion, but when
aarch64_load_symref_appropriately is called after expansion like in the
following testcases, we use imm in both HIGH and LO_SUM operands.
If imm is some RTL that shouldn't be shared like a non-sharable CONST,
we get at least with --enable-checking=rtl a checking ICE, otherwise might
just get silently wrong code.

The following patch fixes that by copying it if it can't be shared.

2022-03-16 Jakub Jelinek <jakub@redhat.com>

PR target/104910
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Copy
imm rtx.

* gcc.dg/pr104910.c: New test.

(cherry picked from commit 952155629ca1a4dfe7c7b26e53d118a9b853ed4a)

ifcvt: Punt if not onlyjump_p for find_if_case_{1,2} [PR104814]

find_if_case_{1,2} implicitly assumes conditional jumps and rewrites them,
so if they have extra side-effects or are say asm goto, things don't work
well, either the side-effects are lost or we could ICE.
In particular, the testcase below on s390x has there a doloop instruction
that decrements a register in addition to testing it for non-zero and
conditionally jumping based on that.

The following patch fixes that by punting for !onlyjump_p case, i.e.
if there are side-effects in the jump instruction or it isn't a plain PC
setter.

Also, it assumes BB_END (test_bb) will be always non-NULL, because basic
blocks with 2 non-abnormal successor edges should always have some instruction
at the end that determines which edge to take.

2022-03-15 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/104814
* ifcvt.c (find_if_case_1, find_if_case_2): Punt if test_bb doesn't
end with onlyjump_p. Assume BB_END (test_bb) is always non-NULL.

* gcc.c-torture/execute/pr104814.c: New test.

(cherry picked from commit a2645cd8fb33b36d737b310e26f4c47401305c7b)

c, c++, c-family: -Wshift-negative-value and -Wshift-overflow* tweaks for -fwrapv and C++20+ [PR104711]

As mentioned in the PR, different standards have different definition
on what is an UB left shift.  They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is.  As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case.  Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values.  The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.

While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.

2022-03-09  Jakub Jelinek  <jakub@redhat.com>

PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.c (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.c (ubsan_instrument_shift): Adjust comments.
* c-warn.c (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.c (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.c (build_binary_op): Likewise.
gcc/cp/
* constexpr.c (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.c (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.

(cherry picked from commit d76511138dc816ef66fd16f71531f48c37dac3b4)

c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

On the following testcase, we emit "did you mean '__dt '?" in the error
message. "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.

2022-03-08 Jakub Jelinek <jakub@redhat.com>

PR c++/104806
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
identifiers with space at the end.

* g++.dg/spellcheck-pr104806.C: New test.

(cherry picked from commit e480c3c06d20874fd7504bfdcca0b829f8000389)

s390: Fix up *cmp_and_trap_unsigned_int<mode> constraints [PR104775]

The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
insn not being accepted by assembler.
My rough understanding is that in the RSY-b insn format the spot
in other formats used for index registers is used instead for M3 what
kind of comparison it is, so this patch follows what other similar
instructions use for constraint (i.e. one without index register).

2022-03-07 Jakub Jelinek <jakub@redhat.com>

PR target/104775
* config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use
S constraint instead of T in the last alternative.

* gcc.target/s390/pr104775.c: New test.

(cherry picked from commit 2472dcaa8cb9e02e902f83d419c3ee7e0f3d9041)

match.pd: Further complex simplification fixes [PR104675]

Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.

2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>

PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.

* gcc.dg/pr104675-3.c : New test.

(cherry picked from commit f62115c9b770a66c5378f78a2d5866243d560573)

rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]

The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.

* g++.dg/opt/pr104681.C: New test.

(cherry picked from commit 3885a122f817a1b6dca4a84ba9e020d5ab2060af)

match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]

We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.

2022-02-25 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.

* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.

(cherry picked from commit 758671b88b78d7629376b118ec6ca6bcfbabbd36)

libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).

(cherry picked from commit 2f59f067610f22c3f2ec9b1516e24b85836676ed)

asan: Mark instrumented vars addressable [PR102656]

We ICE on the following testcase, because the asan1 pass decides to
instrument
  <retval>.x = 0;
and does that by
  _13 = &<retval>.x;
  .ASAN_CHECK (7, _13, 4, 4);
  <retval>.x = 0;
and later sanopt pass turns that into:
  _39 = (unsigned long) &<retval>.x;
  _40 = _39 >> 3;
  _41 = _40 + 2147450880;
  _42 = (signed char *) _41;
  _43 = *_42;
  _44 = _43 != 0;
  _45 = _39 & 7;
  _46 = (signed char) _45;
  _47 = _46 + 3;
  _48 = _47 >= _43;
  _49 = _44 & _48;
  if (_49 != 0)
    goto <bb 10>; [0.05%]
  else
    goto <bb 9>; [99.95%]

  <bb 10> [local count: 536864]:
  __builtin___asan_report_store4 (_39);

  <bb 9> [local count: 1073741824]:
  <retval>.x = 0;
The problem is during expansion, <retval> isn't marked TREE_ADDRESSABLE,
even when we take its address in (unsigned long) &<retval>.x.

Now, instrument_derefs has code to avoid the instrumentation altogether
if we can prove the access is within bounds of an automatic variable in the
current function and the var isn't TREE_ADDRESSABLE (or we don't instrument
use after scope), but we do it solely for VAR_DECLs.

I think we should treat RESULT_DECLs exactly like that too, which is what
the following patch does.  I must say I'm unsure about PARM_DECLs, those can
have different cases, either they are fully or partially passed in
registers, then if we take parameter's address, they are in a local copy
inside of a function and so work like those automatic vars.  But if they
are fully passed in memory, we typically just take address of the slot
and in that case they live in the caller's frame.  It is true we don't
(can't) put any asan padding in between the arguments, so all asan could
detect in that case is if caller passes fewer on stack arguments or smaller
arguments than callee accepts.  Anyway, as I'm unsure, I haven't added
PARM_DECLs to that case.

And another thing is, when we actually build_fold_addr_expr, we need to
mark_addressable the inner if it isn't addressable already.

2022-02-19  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/102656
* asan.c (instrument_derefs): If inner is a RESULT_DECL and access is
known to be within bounds, treat it like automatic variables.
If instrumenting access and inner is {VAR,PARM,RESULT}_DECL from
current function and !TREE_STATIC which is not TREE_ADDRESSABLE, mark
it addressable.

(cherry picked from commit 9e3bbb4a8024121eb0fa675cb1f074218c1345a6)

valtrack: Avoid creating raw SUBREGs with VOIDmode argument [PR104557]

After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations  we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them.  But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there.  We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.

2022-02-17  Jakub Jelinek  <jakub@redhat.com>

PR debug/104557
* valtrack.c (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.

* gcc.dg/dfp/pr104557.c: New test.

(cherry picked from commit 1c2b44b52364cb5661095b346de794bc7ff02866)

combine: Fix up -fcompare-debug issue in the combiner [PR104544]

On the following testcase on aarch64-linux, we behave differently
with -g and -g0.

The problem is that on:
(insn 10011 10010 10012 2 (set (reg:CC 66 cc)
        (compare:CC (reg:DI 105)
            (const_int 0 [0]))) "pr104544.c":18:3 407 {cmpdi}
     (expr_list:REG_DEAD (reg:DI 105)
        (nil)))
(insn 10012 10011 10013 2 (set (reg:SI 109)
        (eq:SI (reg:CC 66 cc)
            (const_int 0 [0]))) "pr104544.c":18:3 444 {aarch64_cstoresi}
     (expr_list:REG_DEAD (reg:CC 66 cc)
        (nil)))
(insn 10013 10012 10016 2 (set (reg:DI 110)
        (zero_extend:DI (reg:SI 109))) "pr104544.c":18:3 111 {*zero_extendsidi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 109)
        (nil)))
(insn 10016 10013 10017 2 (parallel [
            (set (reg:CC 66 cc)
                (compare:CC (const_int 0 [0])
                    (reg:DI 110)))
            (set (reg:DI 111)
                (neg:DI (reg:DI 110)))
        ]) "pr104544.c":18:3 281 {negdi_carryout}
     (expr_list:REG_DEAD (reg:DI 110)
        (nil)))
...
(debug_insn 6 5 7 2 (var_location:SI y (debug_expr:SI D#5)) "pr104544.c":18:3 -1
     (nil))
(debug_insn 7 6 10033 2 (debug_marker) "pr104544.c":11:3 -1
     (nil))
(insn 10033 7 10034 2 (set (reg:DI 117 [ _14 ])
        (ior:DI (reg:DI 111)
            (reg:DI 112))) "pr104544.c":11:6 496 {iordi3}
     (expr_list:REG_DEAD (reg:DI 112)
        (expr_list:REG_DEAD (reg:DI 111)
            (nil))))
we successfully split 3 insns into two:

Trying 10011, 10013 -> 10016:
10011: cc:CC=cmp(r105:DI,0)
      REG_DEAD r105:DI
10013: r110:DI=cc:CC==0
      REG_DEAD cc:CC
10016: {cc:CC=cmp(0,r110:DI);r111:DI=-r110:DI;}
      REG_DEAD r110:DI
Failed to match this instruction:
(parallel [
        (set (reg:CC 66 cc)
            (compare:CC (reg:DI 105)
                (const_int 0 [0])))
        (set (reg:DI 111)
            (neg:DI (eq:DI (reg:DI 105)
                    (const_int 0 [0]))))
    ])
Failed to match this instruction:
(parallel [
        (set (reg:CC 66 cc)
            (compare:CC (reg:DI 105)
                (const_int 0 [0])))
        (set (reg:DI 111)
            (neg:DI (eq:DI (reg:DI 105)
                    (const_int 0 [0]))))
    ])
Successfully matched this instruction:
(set (reg:DI 111)
    (neg:DI (eq:DI (reg:DI 105)
            (const_int 0 [0]))))
Successfully matched this instruction:
(set (reg:CC 66 cc)
    (compare:CC (reg:DI 105)
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg:DI 112)
    (neg:DI (eq:DI (reg:CC 66 cc)
            (const_int 0 [0]))))
allowing combination of insns 10011, 10013 and 10016
original costs 4 + 4 + 4 = 16
replacement costs 4 + 4 = 12
deferring deletion of insn with uid = 10011.

but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between.  So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.

The following patch fixes that by instead ignoring debug insns during the
searching.  We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.

2022-02-16  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/104544
* combine.c (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.

* gcc.dg/pr104544.c: New test.

(cherry picked from commit f997eef5654f782bedb985c9285862c4d76b3209)

c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]

The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.

2022-02-16 Jakub Jelinek <jakub@redhat.com>

PR c/104510
* c-common.c (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.

* gcc.dg/dfp/pr104510.c: New test.

(cherry picked from commit 6e74122f0de6748b3fd0ed9183090cd7c61fb53e)

sanitizer: Use glibc _thread_db_sizeof_pthread symbol if present

I've cherry-picked following fix from llvm-project. Recent glibcs
have _thread_db_sizeof_pthread symbol variable which contains the
size of struct pthread, so that sanitizers don't need to guess that
and risk that it will change again.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

* sanitizer_common/sanitizer_linux_libcdep.cpp: Cherry-pick
llvm-project revision ef14b78d9a144ba81ba02083fe21eb286a88732b.

(cherry picked from commit c4c0aa60891daeb4ea5a7c265bd681038f6d8271)

openmp: Make finalize_task_copyfn order reproduceable [PR104517]

The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.

2022-02-15 Jakub Jelinek <jakub@redhat.com>

PR debug/104517
* omp-low.c (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.

(cherry picked from commit 6a0d6e7ca9b9e338e82572db79c26168684a7441)

c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]

return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.

potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine.  I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label.  For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.

I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.

2022-02-14  Jakub Jelinek  <jakub@redhat.com>

PR c++/104513
* constexpr.c (potential_constant_expression_1) <case GOTO_EXPR>:
Don't punt if returns (target).

* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.

(cherry picked from commit 02a981a8e512934a990d1427d14e8e884409fade)

asan: Fix up address sanitizer instrumentation of __builtin_alloca* if it can throw [PR104449]

With -fstack-check* __builtin_alloca* can throw and the asan
instrumentation of this builtin wasn't prepared for that case.
The following patch fixes that by replacing the builtin with the
replacement builtin and emitting any further insns on the fallthru
edge.

I haven't touched the hwasan code which most likely suffers from the
same problem.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/104449
* asan.c: Include tree-eh.h.
(handle_builtin_alloca): Handle the case when __builtin_alloca or
__builtin_alloca_with_align can throw.

* gcc.dg/asan/pr104449.c: New test.
* g++.dg/asan/pr104449.C: New test.

(cherry picked from commit f0c7367b8802c47efaad87b1f2126fe6350d8b47)

i386: Fix up cvtsd2ss splitter [PR104502]

The following testcase ICEs, because AVX512F is enabled, AVX512VL is not,
and the cvtsd2ss insn has %xmm0-15 as output operand and %xmm16-31 as
input operand. For output operand %xmm16+ the splitter just gives up
in such case, but for such input it just emits vmovddup which requires
AVX512VL if either operand is EXT_REX_SSE_REG_P (when it is 128-bit).

The following patch fixes it by treating that case like the pre-SSE3
output != input case - move the input to output and do everything on
the output reg which is known to be < %xmm16.

2022-02-12 Jakub Jelinek <jakub@redhat.com>

PR target/104502
* config/i386/i386.md (cvtsd2ss splitter): If operands[1] is xmm16+
and AVX512VL isn't available, move operands[1] to operands[0] first.

* gcc.target/i386/pr104502.c: New test.

(cherry picked from commit 0538d42cdd68f6b65d72ed7768f1d00ba44f8631)

c++: Fix up constant expression __builtin_convertvector folding [PR104472]

The following testcase ICEs, because due to the -frounding-math
fold_const_call fails, which is it returns NULL, and returning NULL from
cxx_eval* is wrong, all the callers rely on them to either return folded
value or original with *non_constant_p = true.

The following patch does that, and additionally falls through into the
default case where there is diagnostics for the !ctx->quiet case too.

2022-02-11 Jakub Jelinek <jakub@redhat.com>

PR c++/104472
* constexpr.c (cxx_eval_internal_function) <case IFN_VEC_CONVERT>:
Only return fold_const_call result if it is non-NULL. Otherwise
fall through into the default: case to return t, set *non_constant_p
and emit diagnostics if needed.

* g++.dg/cpp0x/constexpr-104472.C: New test.

(cherry picked from commit 84993d94e13ad2ab3aee151bb5a5e767cf75d51e)

combine: Fix ICE with substitution of CONST_INT into PRE_DEC argument [PR104446]

The following testcase ICEs, because combine substitutes
(insn 10 9 11 2 (set (reg/v:SI 7 sp [ a ])
        (const_int 0 [0])) "pr104446.c":9:5 81 {*movsi_internal}
     (nil))
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
forming
(insn 13 11 14 2 (set (mem/f:SI (pre_dec:SI (const_int 0 [0])) [0  S4 A32])
        (reg:SI 85)) "pr104446.c":10:3 56 {*pushsi2}
     (expr_list:REG_DEAD (reg:SI 85)
        (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
            (nil))))
which is invalid RTL (pre_dec's argument must be a REG).
I know substitution creates various forms of invalid RTL and hopes that
invalid RTL just won't recog.
But unfortunately in this case we ICE before we get to recog, as
try_combine does:
  if (n_auto_inc)
    {
      int new_n_auto_inc = 0;
      for_each_inc_dec (newpat, count_auto_inc, &new_n_auto_inc);

      if (n_auto_inc != new_n_auto_inc)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "Number of auto_inc expressions changed\n");
          undo_all ();
          return 0;
        }
    }
and for_each_inc_dec under the hood will do e.g. for the PRE_DEC case:
    case PRE_DEC:
    case POST_DEC:
      {
        poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
        rtx r1 = XEXP (x, 0);
        rtx c = gen_int_mode (-size, GET_MODE (r1));
        return fn (mem, x, r1, r1, c, data);
      }
and that code rightfully expects that the PRE_DEC operand has non-VOIDmode
(as it needs to be a REG) - gen_int_mode for VOIDmode results in ICE.
I think it is better not to emit the clearly invalid RTL during substitution
like we do for other cases, than to adding workarounds for invalid IL
created by combine to rtlanal.cc and perhaps elsewhere.
As for the testcase, of course it is UB at runtime to modify sp that way,
but if such code is never reached, we must compile it, not to ICE on it.
And I don't see why on other targets which use the autoinc rtxes much more
it couldn't happen with other registers.

2022-02-11  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/104446
* combine.c (subst): Don't substitute CONST_INTs into RTX_AUTOINC
operands.

* gcc.target/i386/pr104446.c: New test.

(cherry picked from commit fb76c0ad35f96505ecd9213849ebc3df6163a0f7)

rs6000: Fix up vspltis_shifted [PR102140]

The following testcase ICEs, because
(const_vector:V4SI [
                (const_int 0 [0]) repeated x3
                (const_int -2147483648 [0xffffffff80000000])
            ])
is recognized as valid easy_vector_constant in between split1 pass and
end of RA.
The problem is that such constants need to be split, and the only
splitter for that is:
(define_split
  [(set (match_operand:VM 0 "altivec_register_operand")
        (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()"
There is only a single splitting pass before RA, so after that finishes,
if something gets matched in between that and end of RA (after that
can_create_pseudo_p () would be no longer true), it will never be
successfully split and we ICE at final.cc time or earlier.

The i386 backend (and a few others) already use
(cfun->curr_properties & PROP_rtl_split_insns)
as a test for split1 pass finished, so that some insns that should be split
during split1 and shouldn't be matched afterwards are properly guarded.

So, the following patch does that for vspltis_shifted too.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR target/102140
* config/rs6000/rs6000.c (vspltis_shifted): Return false also if
split1 pass has finished already.

* gcc.dg/pr102140.c: New test.

(cherry picked from commit 0c3e491a4e5ae74bfbed6d167d403d262b5a4adc)

libgomp: Fix segfault with posthumous orphan tasks [PR104385]

The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.

(cherry picked from commit 0af7ef050aed9f678d70d79931ede38374fde863)

libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

As mentioned in the PR, in some cases we preprocess incorrectly when we
encounter an identifier which is defined as function-like macro, followed
by at least 2 CPP_PADDING tokens and then some other identifier.
On the following testcase, the problem is in the 3rd funlike_invocation_p,
the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared token),
CPP_PADDING (one created with padding_token, val.source is non-NULL and
val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
funlike_invocation_p remembers there was a padding token, but remembers the
first one because of its condition, then the next token is the CPP_NAME,
which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
can't easily backup more tokens, it pushes into a new context the padding
token (the pfile->avoid_paste one).  The net effect is that when Y is not
defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
(the second avoid_paste is because that is how we handle end of a context).
Now, for stringify_arg that is unfortunately a significant difference,
which handles CPP_PADDING tokens with:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
and later on
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
(and c-ppoutput.cc has similar code).
So, when Y is not fun-like macro, ' ' is added because padding_token's
val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
macro, we don't add ' ' in between, because source is NULL and so
used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

Now, the funlike_invocation_p condition
       if (padding == NULL
           || (!(padding->flags & PREV_WHITE) && token->val.source == NULL))
        padding = token;
looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
the intent was to prefer do the same thing and pick the right padding.
But there are significant differences.  Both stringify_arg and c-ppoutput.cc
don't remember the CPP_PADDING token, but its val.source instead, while
in funlike_invocation_p we want to remember the padding token that has the
significant information for stringify_arg/c-ppoutput.cc.
So, IMHO we want to overwrite padding if:
1) padding == NULL (remember that there was any padding at all)
2) padding->val.source == NULL (this matches the source == NULL
   case in stringify_arg)
3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
   (this matches the !(source->flags & PREV_WHITE) && token->val.source == NULL
   case in stringify_arg)

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/104147
* macro.c (funlike_invocation_p): For padding prefer a token
with val.source non-NULL especially if it has PREV_WHITE set
on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
don't have PREV_WHITE set in flags.

* c-c++-common/cpp/pr104147.c: New test.

(cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens

The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
          pfile->directive_result.src_loc = pragma_token_virt_loc;
          pfile->directive_result.type = CPP_PRAGMA;
          pfile->directive_result.flags = pragma_token->flags;
          pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
  pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
              if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
                {
                  if (pfile->directive_result.type == CPP_PADDING)
                    continue;
                  result = &pfile->directive_result;
                }
but destringize_and_run does not:
  if (pfile->directive_result.type == CPP_PRAGMA)
    {
...
    }
  else
    {
      count = 1;
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
  _Pragma ("omp simd")
  for (int i = 0; i < 64; i++)
    ;
}

Here is a patch that replaces those
      toks = XNEW (cpp_token);
      toks[0] = pfile->directive_result;
lines with
      toks = &pfile->avoid_paste;

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

* directives.c (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.

(cherry picked from commit efc46b550f035281e51c340f73fbc9a79655e852)

store-merging: Fix up a -fcompare-debug bug in get_status_for_store_merging [PR104263]

As mentioned in the PRthe following testcase fails, because the last
stmt of a bb with -g is a debug stmt and get_status_for_store_merging
uses gimple_seq_last_stmt (bb_seq (bb)) when testing if it is valid
for store merging. The debug stmt isn't valid, while a stmt at that
position with -g0 is valid and so the divergence.

As we walk the whole bb already, this patch just remembers the last
non-debug stmt, so that we don't need to skip backwards debug stmts at the
end of the bb to find last real stmt.

2022-01-28 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/104263
* gimple-ssa-store-merging.c (get_status_for_store_merging): For
cfun->can_throw_non_call_exceptions && cfun->eh test whether
last non-debug stmt in the bb is store_valid_for_store_merging_p
rather than last stmt.

* gcc.dg/pr104263.c: New test.

(cherry picked from commit a591c71b41e18e4ff86852a974592af4962aef57)

optabs: Don't create pseudos in prepare_cmp_insn when not allowed [PR102478]

cond traps can be created during ce3 after reload (and e.g. PR103028
recently fixed some ce3 cond trap related bug, so I think often that
works fine and we shouldn't disable cond traps after RA altogether),
but it calls prepare_cmp_insn.  This function can fail, so I don't
see why we couldn't make it work after RA (in most cases it already
just works).  The first hunk is just an optimization which doesn't
make sense after RA, so I've guarded it with can_create_pseudo_p.
The second hunk is just a theoretical case, I don't have a testcase for it.
prepare_cmp_insn has some other spots that can create pseudos, like when
both operands have VOIDmode, or when it is BLKmode comparison, or
not OPTAB_DIRECT, but I think none of that applies to ce3, we punt on
BLKmode earlier, use OPTAB_DIRECT and shouldn't be comparing two
VOIDmode CONST_INTs.

2022-01-21  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/102478
* optabs.c (prepare_cmp_insn): If !can_create_pseudo_p (), don't
force_reg constants and for -fnon-call-exceptions fail if copy_to_reg
would be needed.

* gcc.dg/pr102478.c: New test.

(cherry picked from commit c2d9159717b474f9c06dde4d32b48b87164deb50)

match.pd, optabs: Avoid vectorization of {FLOOR,CEIL,ROUND}_{DIV,MOD}_EXPR [PR102860]

power10 has modv4si3 expander and so vectorizes the following testcase
where Fortran modulo is FLOOR_MOD_EXPR.
optabs_for_tree_code indicates that the optab for all the *_MOD_EXPR
variants is umod_optab or smod_optab, but that isn't true, that optab
actually expands just TRUNC_MOD_EXPR.  For the other tree codes expmed.cc
has code how to adjust the TRUNC_MOD_EXPR into those by emitting some
extra comparisons and conditional updates.  Similarly for *_DIV_EXPR,
except in that case it actually needs both division and modulo.

While it would be possible to handle it in expmed.cc for vectors as well,
we'd need to be sure all the vector operations we need for that are
available, and furthermore we wouldn't account for that in the costing.

So, IMHO it is better to stop pretending those non-truncating (and
non-exact) div/mod operations have an optab.  For GCC 13, we should
IMHO pattern match these in tree-vect-patterns.cc and transform them
to truncating div/mod with follow-up adjustments and let the vectorizer
vectorize that.  As written in the PR, for signed operands:
r = x %[fl] y;
is
r = x % y; if (r && (x ^ y) < 0) r += y;
and
d = x /[fl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
and
r = x %[cl] y;
is
r = x % y; if (r && (x ^ y) >= 0) r -= y;
and
d = /[cl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) >= 0) ++d;
(too lazy to figure out rounding div/mod now).  I'll create a PR
for that.
The patch also extends a match.pd optimization that floor_mod on
unsigned operands is actually trunc_mod.

2022-01-19  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102860
* match.pd (x %[fl] y -> x % y): New simplification for
unsigned integral types.
* optabs-tree.c (optab_for_tree_code): Return unknown_optab
for {CEIL,FLOOR,ROUND}_{DIV,MOD}_EXPR with VECTOR_TYPE.

* gfortran.dg/pr102860.f90: New test.

(cherry picked from commit ffc7f200adbdf47f14b3594d9b21855c19cf797a)

c++: Fix handling of temporaries with consteval ctors and non-trivial dtors [PR104055]

The following testcase is miscompiled.  We see the constructor is immediate,
in build_over_call we trigger:
          if (obj_arg && is_dummy_object (obj_arg))
            {
              call = build_cplus_new (DECL_CONTEXT (fndecl), call, complain);
              obj_arg = NULL_TREE;
            }
which makes call a TARGET_EXPR with the dtor in TARGET_EXPR_CLEANUP,
but then call cxx_constant_value on it.  In cxx_eval_outermost_constant_expr
it triggers the:
      else if (TREE_CODE (t) != CONSTRUCTOR)
        {
          r = get_target_expr_sfinae (r, tf_warning_or_error | tf_no_cleanup);
          TREE_CONSTANT (r) = true;
        }
which wraps the CONSTRUCTOR r into a new TARGET_EXPR, but one without
dtors (I think we need e.g. the TREE_CONSTANT for the callers),
and finally build_over_call uses that.

The following patch fixes that by using get_target_expr instead
of get_target_expr_sfinae + TREE_CONSTANT (r) = true if t is
a TARGET_EXPR with non-NULL TARGET_EXPR_CLEANUP.

2022-01-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/104055
* constexpr.c (cxx_eval_outermost_constant_expr): If t is a
TARGET_EXPR with TARGET_EXPR_CLEANUP, use get_target_expr rather
than get_target_expr_sfinae with tf_no_cleanup, and don't set
TREE_CONSTANT.

* g++.dg/cpp2a/consteval27.C: New test.

(cherry picked from commit 1a5145f1e3adf8b2ba4ad416a5ddef59a1e34d48)

c++: Silence -Wuseless-cast warnings during move [PR103480]

This is maybe just a shot in the dark, but IMHO we shouldn't be diagnosing
-Wuseless-cast on casts the compiler adds on its own when calling its move
function.  We don't seem to warn when user calls std::move either.
We call move on elinit (*NON_LVALUE_EXPR <(struct C[2] &&) &D.2497->b>)[0]
so it is already an xvalue_p and try to static_cast it to struct C &&.
But we don't warn e.g. on std::move (std::move (whatever)).

Fixed by not doing the static cast and just returning expr from move
if expr is already an xvalue.

2022-01-11  Jakub Jelinek  <jakub@redhat.com>
    Jason Merrill  <jason@redhat.com>

PR c++/103480
* tree.c (move): If expr is xvalue_p, just return expr without
build_static_cast.

* g++.dg/warn/Wuseless-cast2.C: New test.

(cherry picked from commit 6bba184ccbf47368eaea27ee2c1e7b850526640b)

c-family: Fix up -W*conversion on bitwise &/|/^ [PR101537]

The following testcases emit a bogus -Wconversion warning.  This is because
conversion_warning function doesn't handle BIT_*_EXPR (only unsafe_conversion_p
that is called during the default: case, and that one doesn't handle
SAVE_EXPRs added because the unsigned char & or | operands promoted to int
have side-effects and =| or =& is used.

The patch handles BIT_IOR_EXPR/BIT_XOR_EXPR like the last 2 operands of
COND_EXPR by recursing on the two operands, if either of them doesn't fit
into the narrower type, complain.  BIT_AND_EXPR too, but first it needs to
handle some special cases that unsafe_conversion_p does, namely when one
of the two operands is a constant.

This fixes completely the pr101537.c test and for C also pr103881.c
and doesn't regress anything in the testsuite, for C++ pr103881.c still
emits the bogus warnings.
This is because while the C FE emits in that case a SAVE_EXPR that
conversion_warning can handle already, C++ FE emits
TARGET_EXPR <D.whatever, ...>, something | D.whatever
etc. and conversion_warning handles COMPOUND_EXPR by "recursing" on the
rhs.  To handle that case, we'd need for TARGET_EXPR on the lhs remember
in some hash map the mapping from D.whatever to the TARGET_EXPR and when
we see D.whatever, use corresponding TARGET_EXPR initializer instead.

2022-01-11  Jakub Jelinek  <jakub@redhat.com>

PR c/101537
PR c/103881
gcc/c-family/
* c-warn.c (conversion_warning): Handle BIT_AND_EXPR, BIT_IOR_EXPR
and BIT_XOR_EXPR.
gcc/testsuite/
* c-c++-common/pr101537.c: New test.
* c-c++-common/pr103881.c: New test.

(cherry picked from commit 20e4a5e573e76f4379b353cc736215a5f10cdb84)

c++: Ensure some more that immediate functions aren't gimplified [PR103912]

Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function.  cgraphunit.c then
does:
              /* Preserve a functions function context node.  It will
                 later be needed to output debug info.  */
              if (tree fn = decl_function_context (decl))
                {
                  cgraph_node *origin_node = cgraph_node::get_create (fn);
                  enqueue_node (origin_node);
                }
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.

When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.

The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.

2022-01-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.

* g++.dg/cpp2a/consteval26.C: New test.

(cherry picked from commit 54fa7daefe35cacf4a933947d1802318da193c01)

ifcvt: Check for asm goto at the end of then_bb/else_bb in ifcvt [PR103908]

On the following testcase, RTL ifcvt sees then_bb
(note 7 6 8 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 8 7 9 3 (set (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7fdccf5b0cf0 b>) [1 b+0 S4 A32])
        (const_int 1 [0x1])) "pr103908.c":6:7 81 {*movsi_internal}
     (nil))
(jump_insn 9 8 13 3 (parallel [
            (asm_operands/v ("# insn 1") ("") 0 []
                 []
                 [
                    (label_ref:DI 21)
                ] pr103908.c:7)
            (clobber (reg:CC 17 flags))
        ]) "pr103908.c":7:5 -1
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil))
-> 21)
and similarly else_bb (just with a different asm_operands template).
It checks that those basic blocks have a single successor and
uses last_active_insn which intentionally skips over JUMP_INSNs, sees
both basic blocks contain the same set and merges them (or if the
sets are different, attempts some other noce optimization).
But we can't assume that the jump, even when it has only a single successor,
has no side-effects.

The following patch fixes it by punting if test_bb ends with a JUMP_INSN
that isn't onlyjump_p.

2022-01-06  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103908
* ifcvt.c (bb_valid_for_noce_process_p): Punt on bbs ending with
asm goto.

* gcc.target/i386/pr103908.c: New test.

(cherry picked from commit 80ad67e2af0620d58d57d0406dc22693cf5b8ca9)

libcpp: Fix up ##__VA_OPT__ handling [PR89971]

In the following testcase we incorrectly error about pasting / token
with padding token (which is a result of __VA_OPT__); instead we should
like e.g. for ##arg where arg is empty macro argument clear PASTE_LEFT
flag of the previous token if __VA_OPT__ doesn't add any real tokens
(which can happen either because the macro doesn't have any tokens
passed to ... (i.e. __VA_ARGS__ expands to empty) or when __VA_OPT__
doesn't have any tokens in between ()s).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR preprocessor/89971
libcpp/
* macro.c (replace_args): For ##__VA_OPT__, if __VA_OPT__ expands
to no tokens at all, drop PASTE_LEFT flag from the previous token.
gcc/testsuite/
* c-c++-common/cpp/va-opt-9.c: New test.

(cherry picked from commit 5545d1edcbdb1701443f94dde7ec97c5ce3e1a6c)

shrink-wrapping: Fix up prologue block discovery [PR103860]

The following testcase is miscompiled, because a prologue which
contains subq $8, %rsp instruction is emitted at the start of
a basic block which contains conditional jump that depends on
flags register set in an earlier basic block, the prologue instruction
then clobbers those flags.
Normally this case is checked by can_get_prologue predicate, but this
is done only at the start of the loop. If we update pro later in the
loop (because some bb shouldn't be duplicated) and then don't push
anything further into vec and the vec is already empty (this can happen
when the new pro is already in bb_with bitmask and either has no successors
(that is the case in the testcase where that bb ends with a trap) or
all the successors are already in bb_with, then the loop doesn't iterate
further and can_get_prologue will not be checked.

The following simple patch makes sure we call can_get_prologue even after
the last former iteration when vec is already empty and only break from
the loop afterwards (and only if the updating of pro done because of
!can_get_prologue didn't push anything into vec again).

2021-12-30 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/103860
* shrink-wrap.c (try_shrink_wrapping): Make sure can_get_prologue is
called on pro even if nothing further is pushed into vec.

* gcc.dg/pr103860.c: New test.

(cherry picked from commit 1820137ba624d7eb2004a10f9632498b6bc1696a)

loop-invariant: Fix -fcompare-debug failure [PR103837]

In the following testcase we have a -fcompare-debug failure, because
can_move_invariant_reg doesn't ignore DEBUG_INSNs in its decisions.
In the testcase we have due to uninitialized variable:
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and with -g decide not to move the pseudo84 = invariant before the
loop header; in this case not resetting the debug insns might be fine.
But, we could have also:
  pseudo84 = whatever
  loop_header
    debug_insn using pseudo84
    pseudo84 = invariant
    insn using pseudo84
  end loop
and in that case not resetting the debug insns would result in wrong-debug.
And, we don't really have generally a good substitution on what pseudo84
contains, it could inherit various values from different paths.
So, the following patch ignores DEBUG_INSNs in the decisions, and if there
are any that previously prevented the optimization, resets them before
return true.

2021-12-28  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/103837
* loop-invariant.c (can_move_invariant_reg): Ignore DEBUG_INSNs in
the decisions whether to return false or continue and right before
returning true reset those debug insns that previously caused
returning false.

* gcc.dg/pr103837.c: New test.

(cherry picked from commit 3c5fd3616f73fbcd241cc3a5e09275c2b0c49bd4)

c: Fix ICE on deferred pragma in unknown attribute arguments [PR103587]

We ICE on the following testcase, because c_parser_balanced_token_sequence
when encountering a deferred pragma will just use c_parser_consume_token
which the FE doesn't allow for CPP_PRAGMA tokens (and if that wasn't
the case, it could ICE on CPP_PRAGMA_EOL similarly).
We don't know in what exact context the pragma appears when we don't
know what those arguments semantically mean, so I think we should just
skip over them, like e.g. the C++ FE does. And, I think (/[/{ vs. )/]/}
from outside of the pragma shouldn't be paired with those inside of
the pragma and it doesn't seem to be necessary to check that inside of
the pragma line itself all the paren kinds are balanced.

2021-12-14 Jakub Jelinek <jakub@redhat.com>

PR c/103587
* c-parser.c (c_parser_balanced_token_sequence): For CPP_PRAGMA,
consume the pragma and silently skip to the pragma eol.

* gcc.dg/pr103587.c: New test.

(cherry picked from commit e163dbbc4433e598cad7e6011b255d1d6ad93a3b)

bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]

On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler.  n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
  /* Find real size of result (highest non-zero byte).  */
  if (n->base_addr)
    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
  else
    rsize = n->range;
The shifts then shift uint64_t by 64 bits.  For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.

2021-11-27  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/103435
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
case.

(cherry picked from commit 567d5f3d62fba2a23a9e975f7e7c7b61bb67cf24)

openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384]

As the testcase shows, we weren't handling kind(host) and kind(nohost) properly
in the ACCEL_COMPILERs, the code written in there is valid for the host
compiler only, where if we are maybe offloaded, we defer resolution after IPA,
otherwise return 0 for kind(nohost) and accept it for kind(host).  Note,
omp_maybe_offloaded is false after IPA.  If ACCEL_COMPILER is defined, it is
the other way around, but also we know we are after IPA.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/103384
gcc/
* omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER,
return 0 for kind(host) and continue for kind(nohost).
libgomp/
* testsuite/libgomp.c/declare-variant-2.c: New test.

(cherry picked from commit 5bca26742cf3357bf4e20ec97eee4c7f7de17ce0)

openmp: Fix up handling of reduction clauses on the loop construct [PR102431]

We were using unshare_expr and walk_tree_without_duplicate replacement
of the placeholder vars.  The OMP_CLAUSE_REDUCTION_{INIT,MERGE} can contain
other trees that need to be duplicated though, e.g. BLOCKs referenced in
BIND_EXPR(s), or local VAR_DECLs.  This patch uses the inliner code to copy
all of that.  There is a slight complication that those local VAR_DECLs or
placeholders don't have DECL_CONTEXT set, they will get that only when
they are gimplified later on, so this patch sets DECL_CONTEXT for those
temporarily and resets it afterwards.

2021-11-23  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/102431
* gimplify.c (replace_reduction_placeholders): Remove.
(note_no_context_vars): New function.
(gimplify_omp_loop): For OMP_PARALLEL's BIND_EXPR create a new
BLOCK.  Use copy_tree_body_r with walk_tree instead of unshare_expr
and replace_reduction_placeholders for duplication of
OMP_CLAUSE_REDUCTION_{INIT,MERGE} expressions.  Ensure all mentioned
automatic vars have DECL_CONTEXT set to non-NULL before doing so
and reset it afterwards for those vars and their corresponding
vars.

* c-c++-common/gomp/pr102431.c: New test.
* g++.dg/gomp/pr102431.C: New test.

(cherry picked from commit 5e9b973bd60185f221222022f56db7df3d92250e)

fortran, debug: Fix up DW_AT_rank [PR103315]

For DW_AT_rank we were emitting
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x6     # DW_OP_deref
on 64-bit and
        .uleb128 0x4    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x6     # DW_OP_deref
on 32-bit.  I think this is wrong, as dtype.rank field in the descriptor
has unsigned char type, not pointer type nor pointer sized integral.
E.g. if we have a
    REAL :: a(..)
dummy argument, which is passed as a reference to the function descriptor,
we want to evaluate a->dtype.rank.  The above DWARF expressions perform
*(uintptr_t *)(a + 0x1c)
and
*(uintptr_t *)(a + 0x10)
respectively.  The following patch changes those to:
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x1c
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
and
        .uleb128 0x5    # DW_AT_rank
        .byte   0x97    # DW_OP_push_object_address
        .byte   0x23    # DW_OP_plus_uconst
        .uleb128 0x10
        .byte   0x94    # DW_OP_deref_size
        .byte   0x1
which perform
*(unsigned char *)(a + 0x1c)
and
*(unsigned char *)(a + 0x10)
respectively.

2021-11-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/103315
* trans-types.c (gfc_get_array_descr_info): Use DW_OP_deref_size 1
instead of DW_OP_deref for DW_AT_rank.

(cherry picked from commit da17c304e22ba256eba0b03710aa329115163b08)

c++: Fix up -fstrong-eval-order handling of call arguments [PR70796]

For -fstrong-eval-order (default for C++17 and later) we make sure to
gimplify arguments in the right order, but as the following testcase
shows that is not enough.
The problem is that some lvalues can satisfy the is_gimple_val / fb_rvalue
predicate used by gimplify_arg for is_gimple_reg_type typed expressions,
or is_gimple_lvalue / fb_either used for other types.
E.g. in foo we have:
  C::C (&p,  ++i,  ++i)
before gimplification where i is an automatic int variable and without this
patch gimplify that as:
  i = i + 1;
  i = i + 1;
  C::C (&p, i, i);
which means that the ctor is called with the original i value incremented
by 2 in both arguments, while because the call is CALL_EXPR_ORDERED_ARGS
the first argument should be different.  Similarly in qux we have:
  B::B (&p, TARGET_EXPR <D.2274, *(const struct A &) A::operator++ (&i)>,
        TARGET_EXPR <D.2275, *(const struct A &) A::operator++ (&i)>)
and gimplify it as:
      _1 = A::operator++ (&i);
      _2 = A::operator++ (&i);
      B::B (&p, MEM[(const struct A &)_1], MEM[(const struct A &)_2]);
but because A::operator++ returns the passed in argument, again we have
the same value in both cases due to gimplify_arg doing:
      /* Also strip a TARGET_EXPR that would force an extra copy.  */
      if (TREE_CODE (*arg_p) == TARGET_EXPR)
        {
          tree init = TARGET_EXPR_INITIAL (*arg_p);
          if (init
              && !VOID_TYPE_P (TREE_TYPE (init)))
            *arg_p = init;
        }
which is perfectly fine optimization for calls with unordered arguments,
but breaks the ordered ones.
Lastly, in corge, we have before gimplification:
  D::foo (NON_LVALUE_EXPR <p>, 3,  ++p)
and gimplify it as
  p = p + 4;
  D::foo (p, 3, p);
which is again wrong, because the this argument isn't before the
side-effects but after it.
The following patch adds cp_gimplify_arg wrapper, which if ordered
and is_gimple_reg_type forces non-SSA_NAME is_gimple_variable
result into a temporary, and if ordered, not is_gimple_reg_type
and argument is TARGET_EXPR bypasses the gimplify_arg optimization.
So, in foo with this patch we gimplify it as:
  i = i + 1;
  i.0_1 = i;
  i = i + 1;
  C::C (&p, i.0_1, i);
in qux as:
      _1 = A::operator++ (&i);
      D.2312 = MEM[(const struct A &)_1];
      _2 = A::operator++ (&i);
      B::B (&p, D.2312, MEM[(const struct A &)_2]);
where D.2312 is a temporary and in corge as:
  p.9_1 = p;
  p = p + 4;
  D::foo (p.9_1, 3, p);
The is_gimple_reg_type forcing into a temporary should be really cheap
(I think even at -O0 it should be optimized if there is no modification in
between), the aggregate copies might be more expensive but I think e.g. SRA
or FRE should be able to deal with those if there are no intervening
changes.  But still, the patch tries to avoid those when it is cheaply
provable that nothing bad happens (if no argument following it in the
strong evaluation order doesn't have TREE_SIDE_EFFECTS, then even VAR_DECLs
etc. shouldn't be modified after it).  There is also an optimization to
avoid doing that for this or for arguments with reference types as nothing
can modify the parameter values during evaluation of other argument's
side-effects.

I've tried if e.g.
  int i = 1;
  return i << ++i;
doesn't suffer from this problem as well, but it doesn't, the FE uses
  SAVE_EXPR <i>, SAVE_EXPR <i> << ++i;
in that case which gimplifies the way we want (temporary in the first
operand).

2021-11-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/70796
* cp-gimplify.c (cp_gimplify_arg): New function.
(cp_gimplify_expr): Use cp_gimplify_arg instead of gimplify_arg,
pass true as last argument to it if there are any following
arguments in strong evaluation order with side-effects.

* g++.dg/cpp1z/eval-order11.C: New test.

(cherry picked from commit a84177aff7ca86f501d6aa5ef407fac5e71f56fb)

lim: Reset flow sensitive info even for pointers [PR103192]

Since 2014 is lim clearing SSA_NAME_RANGE_INFO for integral SSA_NAMEs
if moving them from conditional contexts inside of a loop into unconditional
before the loop, but as the miscompilation of gimplify.c shows, we need to
treat pointers the same, even for them we need to reset whether the pointer
can/can't be null or the recorded pointer alignment.

This fixes
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (internal compiler error)
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (test for excess errors)
-UNRESOLVED: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c compilation failed to produce executable
-FAIL: libgomp.c++/target-in-reduction-2.C (internal compiler error)
-FAIL: libgomp.c++/target-in-reduction-2.C (test for excess errors)
-UNRESOLVED: libgomp.c++/target-in-reduction-2.C compilation failed to produce executable
on both x86_64 and i686.

2021-11-17 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/103192
* tree-ssa-loop-im.c (move_computations_worker): Use
reset_flow_sensitive_info instead of manually clearing
SSA_NAME_RANGE_INFO and do it for all SSA_NAMEs, not just ones
with integral types.

(cherry picked from commit 077425c890927eefacb765ab5236060de9859e82)

i386: Fix up x86 atomic_bit_test* expanders for !TARGET_HIMODE_MATH [PR103205]

With !TARGET_HIMODE_MATH, the OPTAB_DIRECT expand_simple_binop fail and so
we ICE. We don't really care if they are done promoted in SImode instead.

2021-11-15 Jakub Jelinek <jakub@redhat.com>

PR target/103205
* config/i386/sync.md (atomic_bit_test_and_set<mode>,
atomic_bit_test_and_complement<mode>,
atomic_bit_test_and_reset<mode>): Use OPTAB_WIDEN instead of
OPTAB_DIRECT.

* gcc.target/i386/pr103205.c: New test.

(cherry picked from commit 625eef42e32e65b3da0e65e23a706d228896d01c)

dwarf2out: Fix up field_byte_offset [PR101378]

For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.

* g++.dg/debug/dwarf2/pr101378.C: New test.

(cherry picked from commit 10db7573014008ff867098206f51012d501ab57b)

openmp: For default(none) ignore variables created by ubsan_create_data [PR64888]

We weren't ignoring the ubsan variables created by c-ubsan.c before gimplification
(others are added later). One way to fix this would be to introduce further
UBSAN_ internal functions and lower it later (sanopt pass) like other ifns,
this patch instead recognizes those magic vars by name/name of type and DECL_ARTIFICIAL
and TYPE_ARTIFICIAL.

2021-10-21 Jakub Jelinek <jakub@redhat.com>

PR middle-end/64888
gcc/c-family/
* c-omp.c (c_omp_predefined_variable): Return true also for
ubsan_create_data created artificial variables.
gcc/testsuite/
* c-c++-common/ubsan/pr64888.c: New test.

(cherry picked from commit 40dd9d839e52f679d8eabc1c5ca0ca17a5ccfd14)

c++: Don't reject calls through PMF during constant evaluation [PR102786]

The following testcase incorrectly rejects the c initializer,
while in the s.*a case cxx_eval_* sees .__pfn reads etc.,
in the s.*&S::foo case get_member_function_from_ptrfunc creates
expressions which use INTEGER_CSTs with type of pointer to METHOD_TYPE.
And cxx_eval_constant_expression rejects any INTEGER_CSTs with pointer
type if they aren't 0.
Either we'd need to make sure we defer such folding till cp_fold but the
function and pfn_from_ptrmemfunc is used from lots of places, or
the following patch just tries to reject only non-zero INTEGER_CSTs
with pointer types if they don't point to METHOD_TYPE in the hope that
all such INTEGER_CSTs with POINTER_TYPE to METHOD_TYPE are result of
folding valid pointer-to-member function expressions.
I don't immediately see how one could create such INTEGER_CSTs otherwise,
cast of integers to PMF is rejected and would have the PMF RECORD_TYPE
anyway, etc.

2021-10-19 Jakub Jelinek <jakub@redhat.com>

PR c++/102786
* constexpr.c (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

* g++.dg/cpp2a/constexpr-virtual19.C: New test.

(cherry picked from commit f45610a45236e97616726ca042898d6ac46a082e)

openmp: Fix up handling of OMP_PLACES=threads(1)

When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core. It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15 Jakub Jelinek <jakub@redhat.com>

* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
after creating count places clean up and return immediately.
* testsuite/libgomp.c/places-6.c: New test.
* testsuite/libgomp.c/places-7.c: New test.
* testsuite/libgomp.c/places-8.c: New test.

(cherry picked from commit 4764049dd620affcd3e2658dc7f03a6616370a29)

var-tracking: Fix a wrong-debug issue caused by my r10-7665 var-tracking change [PR102441]

Since my r10-7665-g33c45e51b4914008064d9b77f2c1fc0eea1ad060 change, we get
wrong-debug on e.g. the following testcase at -O2 -g on x86_64-linux for the
x parameter:
void bar (int *r);
int
foo (int x)
{
  int r = 0;
  bar (&r);
  return r;
}
At the start of function, we have
        subq    $24, %rsp
        leaq    12(%rsp), %rdi
instructions.  The x parameter is passed in %rdi, but isn't used in the
function and so the leaq instruction overwrites %rdi without remembering
%rdi anywhere.  Before the r10-7665 change (which was trying to fix a large
(3% for 32-bit, 1% for 64-bit x86-64) debug info/loc growth introduced with
r10-7515), the leaq insn above resulted in a MO_VAL_SET micro-operation that
said that the value of sp + 12, a cselib_sp_derived_value_p, is stored into
the %rdi register.  The r10-7665 change added a change to add_stores that
added no micro-operation for the leaq store, with the rationale that the sp
based values can be and will be always computable some other more compact
and primarily more stable way (cfa based expression like DW_OP_fbreg, that
is the same in the whole function).  That is true.  But by throwing the
micro-operation on the floor, we miss another important part of the
MO_VAL_SET, in particular that the destination of the store, %rdi in this
case, now has a different value from what it had before, so the vt_*
dataflow code thinks that even after the leaq instruction %rdi still holds
the x argument value (and changes it to DW_OP_entry_value (%rdi) only in the
middle of the call to bar).  Previously and with the patches below,
the location for x changes already at the end of leaq instruction to
DW_OP_entry_value (%rdi).

My first attempt to fix this was instead of dropping the MO_VAL_SET add
a MO_CLOBBER operation:
--- gcc/var-tracking.c.jj       2021-05-04 21:02:24.196799586 +0200
+++ gcc/var-tracking.c  2021-09-24 19:23:16.420154828 +0200
@@ -6133,7 +6133,9 @@ add_stores (rtx loc, const_rtx expr, voi
     {
       if (preserve)
        preserve_value (v);
-      return;
+      mo.type = MO_CLOBBER;
+      mo.u.loc = loc;
+      goto log_and_return;
     }

   nloc = replace_expr_with_values (oloc);
so don't track that the value lives in the loc destination, but track
that the previous value doesn't live there anymore.  That failed bootstrap
miserably, the vt_* code isn't prepared to see MO_CLOBBER of a MEM that
isn't tracked (e.g. has MEM_EXPR on it that the var-tracking code wants
to track, i.e. track_p in add_stores).  On the other side, thinking about
it more, in the most common case where a cselib_sp_derived_value_p value
is stored into the sp register (and which is the reason why PR94495
testcase got larger), dropping the micro-operation on the floor is the
right thing, because we have that cselib_sp_derived_value_p tracking, any
reads from the sp hard register will be treated as
cselib_sp_derived_value_p.
Then I've tried 3 different patches described below and in the end
what is committed is patch2.
Additionally, I've gathered statistics from cc1plus by always reverting the
var-tracking.c change after finished bootstrap/regtest and rebuilding the
stage3 var-tracking.o and cc1plus, such that it would be comparable.
dwlocstat and .debug_{info,loclists} section sizes detailed below.
patch3 uses MO_VAL_SET (i.e. essentially reversion of the r10-7665
change) when destination is not a REG_P and !track_p, otherwise if
destination is sp drops the micro-operation on the floor (i.e. no change),
otherwise adds a MO_CLOBBER.
patch1 is similar, except it checks for destination not equal to sp and
!track_p, i.e. for !track_p REG_P destinations other than sp it will use
MO_VAL_SET rather than MO_CLOBBER.
Finally, patch2, the shortest patch, uses MO_VAL_SET whenever destination
is not sp and otherwise drops the micro-operation on the floor.
All the 3 patches don't affect the PR94495 testcase, all the changes
there were caused by stores of sp based values into %rsp.

While the patch2 (and patch1 which results in exactly the same sizes)
causes the largest debug loclists/info growth from the 3, it is still quite
minor (0.651% on 64-bit and 0.114% on 32-bit) compared
to the 1% and 3% PR94495 was trying to solve, and I actually think it is the
best thing to do.  Because, if we have say
  int q[10];
  int *p = &q[0];
or similar and we load the &q[0] sp based value into some hard register,
by noting in the debug info that p lives in some hard reg for some part
of the function and a user is trying to change the p var in the debugger,
if we say it lives in some register or memory, there is some chance that
the changing of the value could work successfully (of course, nothing
is guaranteed, we don't have tracking of where each var lives at which
moment for changing purposes (i.e. what register, memory or else you need
to change in order to change behavior of the code)), while if we just say
that p's location is DW_OP_fbreg 16 DW_OP_stack_value, that is a read-only
value one can just print but not change.  Now, for stores of variable
values into the sp register, I don't think we have such an issue, you don't
want debugger to change your stack pointer when user asks to change value
of some variable whose value lives in the stack pointer, that would pretty
much always result in misbehavior of the program.
So, my preference from these 3 is patch2 and that is being committed.

64-bit cc1plus
==============
vanilla
cov%    samples cumul
0..10   1064665/37%     1064665/37%
11..20  35972/1%        1100637/38%
21..30  47969/1%        1148606/40%
31..40  45787/1%        1194393/42%
41..50  57529/2%        1251922/44%
51..60  53974/1%        1305896/46%
61..70  112055/3%       1417951/50%
71..80  79420/2%        1497371/52%
81..90  126225/4%       1623596/57%
91..100 1206682/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a44949f 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff5d046 506e947 00      0   0  1
patch1 (same as patch2)
cov%    samples cumul
0..10   1064685/37%     1064685/37%
11..20  36011/1%        1100696/38%
21..30  47975/1%        1148671/40%
31..40  45799/1%        1194470/42%
41..50  57566/2%        1252036/44%
51..60  54011/1%        1306047/46%
61..70  112068/3%       1418115/50%
71..80  79421/2%        1497536/52%
81..90  126171/4%       1623707/57%
91..100 1206571/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a448f27 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff608bc 52070dd 00      0   0  1
patch3
cov%    samples cumul
0..10   1064698/37%     1064698/37%
11..20  36018/1%        1100716/38%
21..30  47977/1%        1148693/40%
31..40  45804/1%        1194497/42%
41..50  57562/2%        1252059/44%
51..60  54018/1%        1306077/46%
61..70  112071/3%       1418148/50%
71..80  79424/2%        1497572/52%
81..90  126172/4%       1623744/57%
91..100 1206534/42%     2830278/100%
  [34] .debug_info       PROGBITS        0000000000000000 2f1c74c a449548 00      0   0  1
  [38] .debug_loclists   PROGBITS        0000000000000000 ff5df39 507acd8 00      0   0  1
So, size of .debug_info+.debug_loclists grows for vanilla -> patch1 (or patch2) by
0.651% and for vanilla -> patch3 by 0.020%.

32-bit cc1plus
==============
vanilla
cov%    samples cumul
0..10   1061892/37%     1061892/37%
11..20  34002/1%        1095894/39%
21..30  43513/1%        1139407/40%
31..40  41667/1%        1181074/42%
41..50  59144/2%        1240218/44%
51..60  47009/1%        1287227/45%
61..70  105069/3%       1392296/49%
71..80  72990/2%        1465286/52%
81..90  125988/4%       1591274/56%
91..100 1208726/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c83d 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebc816e 3fe44fd 00      0   0  1
patch1 (same as patch2)
cov%    samples cumul
0..10   1061999/37%     1061999/37%
11..20  34065/1%        1096064/39%
21..30  43557/1%        1139621/40%
31..40  41690/1%        1181311/42%
41..50  59191/2%        1240502/44%
51..60  47143/1%        1287645/45%
61..70  105045/3%       1392690/49%
71..80  73021/2%        1465711/52%
81..90  125885/4%       1591596/56%
91..100 1208404/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c597 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebca915 401ffad 00      0   0  1
patch3
cov%    samples cumul
0..10   1062006/37%     1062006/37%
11..20  34073/1%        1096079/39%
21..30  43559/1%        1139638/40%
31..40  41693/1%        1181331/42%
41..50  59189/2%        1240520/44%
51..60  47142/1%        1287662/45%
61..70  105054/3%       1392716/49%
71..80  73027/2%        1465743/52%
81..90  125874/4%       1591617/56%
91..100 1208383/43%     2800000/100%
  [33] .debug_info       PROGBITS        00000000 351ab10 8b1c690 00      0   0  1
  [37] .debug_loclists   PROGBITS        00000000 ebca40a 4020a6e 00      0   0  1
So, size of .debug_info+.debug_loclists grows for vanilla -> patch1 (or patch2) by
0.114% and for vanilla -> patch3 by 0.116%.

2021-10-10  Jakub Jelinek  <jakub@redhat.com>

PR debug/102441
* var-tracking.c (add_stores): For cselib_sp_derived_value_p values
use MO_VAL_SET if loc is not sp.

(cherry picked from commit 9583b26f3701ea0456405d84f9a898451a2f7452)

c++: Fix apply_identity_attributes [PR102548]

The following testcase ICEs on x86_64-linux with -m32 due to a bug in
apply_identity_attributes.  The function is being smart and attempts not
to duplicate the chain unnecessarily, if either there are no attributes
that affect type identity or there is possibly empty set of attributes
that do not affect type identity in the chain followed by attributes
that do affect type identity, it reuses that attribute chain.

The function mishandles the cases where in the chain an attribute affects
type identity and is followed by one or more attributes that don't
affect type identity (and then perhaps some further ones that do).

There are two bugs.  One is that when we notice first attribute that
doesn't affect type identity after first attribute that does affect type
identity (with perhaps some further such attributes in the chain after it),
we want to put into the new chain just attributes starting from
(inclusive) first_ident and up to (exclusive) the current attribute a,
but the code puts into the chain all attributes starting with first_ident,
including the ones that do not affect type identity and if e.g. we have
doesn't0 affects1 doesn't2 affects3 affects4 sequence of attributes, the
resulting sequence would have
affects1 doesn't2 affects3 affects4 affects3 affects4
attributes, i.e. one attribute that shouldn't be there and two attributes
duplicated.  That is fixed by the a2 -> a2 != a change.

The second one is that we ICE once we see second attribute that doesn't
affect type identity after an attribute that affects it.  That is because
first_ident is set to error_mark_node after handling the first attribute
that doesn't affect type identity (i.e. after we've copied the
[first_ident, a) set of attributes to the new chain) to denote that from
that time on, each attribute that affects type identity should be copied
whenever it is seen (the if (as && as->affects_type_identity) code does
that correctly).  But that condition is false and first_ident is
error_mark_node, we enter else if (first_ident) and use TREE_PURPOSE
/TREE_VALUE/TREE_CHAIN on error_mark_node, which ICEs.  When
first_ident is error_mark_node and a doesn't affect type identity,
we want to do nothing.  So that is the && first_ident != error_mark_node
chunk.

2021-10-05  Jakub Jelinek  <jakub@redhat.com>

PR c++/102548
* tree.c (apply_identity_attributes): Fix handling of the
case where an attribute in the list doesn't affect type
identity but some attribute before it does.

* g++.target/i386/pr102548.C: New test.

(cherry picked from commit 737f95bab557584d876f02779ab79fe3cfaacacf)

ubsan: Use -fno{,-}sanitize=float-divide-by-zero for float division by zero recovery [PR102515]

We've been using
-f{,no-}sanitize-recover=integer-divide-by-zero to decide on the float
-fsanitize=float-divide-by-zero instrumentation _abort suffix.
This patch fixes it to use -f{,no-}sanitize-recover=float-divide-by-zero
for it instead.

2021-10-01 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>

PR sanitizer/102515
gcc/c-family/
* c-ubsan.c (ubsan_instrument_division): Check the right
flag_sanitize_recover bit, depending on which sanitization
is done.
gcc/testsuite/
* c-c++-common/ubsan/float-div-by-zero-2.c: New test.

(cherry picked from commit 9c1a633d96926357155d4702b66f8a0ec856a81f)

i386: Don't emit fldpi etc. if -frounding-math [PR102498]

i387 has instructions to store some transcedental numbers into the top of
stack.  The problem is that what exact bit in the last place one gets for
those depends on the current rounding mode, the CPU knows the number with
slightly higher precision.  The compiler assumes rounding to nearest when
comparing them against constants in the IL, but at runtime the rounding
can be different and so some of these depending on rounding mode and the
constant could be 1 ulp higher or smaller than expected.
We only support changing the rounding mode at runtime if the non-default
-frounding-mode option is used, so the following patch just disables
using those constants if that flag is on.

2021-09-28  Jakub Jelinek  <jakub@redhat.com>

PR target/102498
* config/i386/i386.c (standard_80387_constant_p): Don't recognize
special 80387 instruction XFmode constants if flag_rounding_math.

* gcc.target/i386/pr102498.c: New test.

(cherry picked from commit 3b7041e8345c2f1030e58620f28e22d64b2c196b)

c++: Fix handling of decls with flexible array members initialized with side-effects [PR88578]

> > Note, if the flexible array member is initialized only with non-constant
> > initializers, we have a worse bug that this patch doesn't solve, the
> > splitting of initializers into constant and dynamic initialization removes
> > the initializer and we don't have just wrong DECL_*SIZE, but nothing is
> > emitted when emitting those vars into assembly either and so the dynamic
> > initialization clobbers other vars that may overlap the variable.
> > I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
> > flexible array member in that case.
>
> Makes sense.

So, the following patch fixes that.

The typeck2.c change makes sure we keep those CONSTRUCTORs around (although
they should be empty because all their elts had side-effects/was
non-constant if it was removed earlier), and the varasm.c change is to avoid
ICEs on those as well as ICEs on other flex array members that had some
initializers without side-effects, but not on the last array element.

The code was already asserting that the (index of the last elt in the
CONSTRUCTOR + 1) times elt size is equal to TYPE_SIZE_UNIT of the local->val
type, which is true for C flex arrays or for C++ if they don't have any
side-effects or the last elt doesn't have side-effects, this patch changes
that to assertion that the TYPE_SIZE_UNIT is greater than equal to the
offset of the end of last element in the CONSTRUCTOR and uses TYPE_SIZE_UNIT
(int_size_in_bytes) in the code later on.

2021-09-15 Jakub Jelinek <jakub@redhat.com>

PR c++/88578
PR c++/102295
gcc/
* varasm.c (output_constructor_regular_field): Instead of assertion
that array_size_for_constructor result is equal to size of
TREE_TYPE (local->val) in bytes, assert that the type size is greater
or equal to array_size_for_constructor result and use type size as
fieldsize.
gcc/cp/
* typeck2.c (split_nonconstant_init_1): Don't throw away empty
initializers of flexible array members if they have non-zero type
size.
gcc/testsuite/
* g++.dg/ext/flexary39.C: New test.
* g++.dg/ext/flexary40.C: New test.

(cherry picked from commit e5d1af8a07ae9fcc40ea5c781c3ad46d20ea12a6)

c++: Update DECL_*SIZE for objects with flexible array members with initializers [PR102295]

The C FE updates DECL_*SIZE for vars which have initializers for flexible
array members for many years, but C++ FE kept DECL_*SIZE the same as the
type size (i.e. as if there were zero elements in the flexible array
member). This results e.g. in ELF symbol sizes being too small.

Note, if the flexible array member is initialized only with non-constant
initializers, we have a worse bug that this patch doesn't solve, the
splitting of initializers into constant and dynamic initialization removes
the initializer and we don't have just wrong DECL_*SIZE, but nothing is
emitted when emitting those vars into assembly either and so the dynamic
initialization clobbers other vars that may overlap the variable.
I think we need keep an empty CONSTRUCTOR elt in DECL_INITIAL for the
flexible array member in that case.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102295
* decl.c (layout_var_decl): For aggregates ending with a flexible
array member, add the size of the initializer for that member to
DECL_SIZE and DECL_SIZE_UNIT.

* g++.target/i386/pr102295.C: New test.

(cherry picked from commit 818c505188ff5cd8eb048eb0e614c4ef732225bd)

c++: Fix __is_*constructible/assignable for templates [PR102305]

is_xible_helper returns error_mark_node (i.e. false from the traits)
for abstract classes by testing ABSTRACT_CLASS_TYPE_P (to) early.
Unfortunately, as the testcase shows, that doesn't work on class templates
that haven't been instantiated yet, ABSTRACT_CLASS_TYPE_P for them is false
until it is instantiated, which is done when the routine later constructs
a dummy object with that type.

The following patch fixes this by calling complete_type first, so that
ABSTRACT_CLASS_TYPE_P test will work properly, while keeping the handling
of arrays with unknown bounds, or incomplete types where it is done
currently.

2021-09-14 Jakub Jelinek <jakub@redhat.com>

PR c++/102305
* method.c (is_xible_helper): Call complete_type on to.

* g++.dg/cpp0x/pr102305.C: New test.

(cherry picked from commit f008fd3a480e3718436156697ebe7eeb47841457)

i386: Fix up @xorsign<mode>3_1 [PR102224]

As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear. We need to use earlyclobber, we require dest and op1 to be
the same but op0 must be different because we overwrite
op1 first.

2021-09-08 Jakub Jelinek <jakub@redhat.com>

PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobber for output operand.

* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.

(cherry picked from commit a7b626d98a9a821ffb33466818d6aa86cac1d6fd)

dwarf2out: Emit DW_AT_location for global register vars during early dwarf [PR101905]

The following patch emits DW_AT_location for global register variables
already during early dwarf, since usually late_global_decl hook isn't even
called for those, as nothing needs to be emitted for them.

2021-08-23 Jakub Jelinek <jakub@redhat.com>

PR debug/101905
* dwarf2out.c (gen_variable_die): Add DW_AT_location for global
register variables already during early_dwarf if possible.

* gcc.dg/guality/pr101905.c: New test.

(cherry picked from commit b284053bb75661fc1bf13c275f3ba5364bb17608)

ubsan: Fix ICEs with DECL_REGISTER tests [PR101624]

The following testcase ICEs, because the base is a CONST_DECL for
the Fortran parameter, and ubsan/sanopt uses DECL_REGISTER macro on it.
/* In VAR_DECL and PARM_DECL nodes, nonzero means declared `register'. */
#define DECL_REGISTER(NODE) (DECL_WRTL_CHECK (NODE)->decl_common.decl_flag_0)
while CONST_DECL doesn't satisfy DECL_WRTL_CHECK.

The following patch checks explicitly for VAR_DECL/PARM_DECL/RESULT_DECL
only before using DECL_REGISTER, assumes other decls aren't DECL_REGISTER.
Not really sure about RESULT_DECL but it at least satisfies DECL_WRTL_CHECK...

2021-07-28 Jakub Jelinek <jakub@redhat.com>

PR middle-end/101624
* ubsan.c (maybe_instrument_pointer_overflow,
instrument_object_size): Only test DECL_REGISTER on VAR_DECLs,
PARM_DECLs or RESULT_DECLs.
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.

* gfortran.dg/ubsan/ubsan.exp: New file.
* gfortran.dg/ubsan/pr101624.f90: New test.

(cherry picked from commit 49e28c02a95a4bee981e69a80950309869580151)

expmed: Fix store_integral_bit_field [PR101562]

Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
     This expression code is used in only one context: as the
     destination operand of a 'set' expression.  In addition, the
     operand of this expression must be a non-paradoxical 'subreg'
     expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.

* gcc.c-torture/compile/pr101562.c: New test.

(cherry picked from commit 8408d34570c9fe9f3d22a25a76df2a4c64f08477)

openmp: Fix up omp_check_private [PR101535]

The target data construct shouldn't affect omp_check_private, unless
the decl there is privatized (use_device_* clauses).  The routine
had some code for that, but it just did continue; in a loop that looped
only if the region type is one of selected 4 kinds, so effectively resulted
in return false; instead of looping again.  And not diagnosing lastprivate
(or reduction etc.) on a variable that is private to containing parallel
results in ICEs later on, as there is no original list item to which store
the last result.
The target construct is unclear as it has an implicit parallel region
and it is not obvious if the data privatization clauses on the construct
shall be treated as data privatization on the implicit parallel or just
on the target.  For now treat those as privatization on the implicit
parallel, but treat map clauses as shared on the implicit parallel.

2021-07-21  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/101535
* gimplify.c (omp_check_private): Properly skip ORT_TARGET_DATA
contexts in which decl isn't privatized and for ORT_TARGET return
false if decl is mapped.

* c-c++-common/gomp/pr101535-1.c: New test.
* c-c++-common/gomp/pr101535-2.c: New test.

(cherry picked from commit b136b7a78774107943fe94051c42b5a968a3ad3f)

c++: Ensure OpenMP reduction with reference type references complete type [PR101516]

The following testcase ICEs because we haven't verified if reduction decl
has reference type that TREE_TYPE of the reference is a complete type,
require_complete_type on the decl doesn't ensure that.

2021-07-21 Jakub Jelinek <jakub@redhat.com>

PR c++/101516
* semantics.c (finish_omp_reduction_clause): Also call
complete_type_or_else and return true if it fails.

* g++.dg/gomp/pr101516.C: New test.

(cherry picked from commit aea199f96cf116ba4c81426207acde371556610c)

rs6000: Fix up easy_vector_constant_msb handling [PR101384]

The following gcc.dg/pr101384.c testcase is miscompiled on
powerpc64le-linux.
easy_altivec_constant has code to try construct vector constants with
different element sizes, perhaps different from CONST_VECTOR's mode. But as
written, that works fine for vspltis[bhw] cases, but not for the vspltisw
x,-1; vsl[bhw] x,x,x case, because that creates always a V16QImode, V8HImode
or V4SImode constant containing broadcasted constant with just the MSB set.
The vspltis_constant function etc. expects the vspltis[bhw] instructions
where the small [-16..15] or even [-32..30] constant is sign-extended to the
remaining step bytes, but that is not the case for the 0x80...00 constants,
with step 1 we can't handle e.g.
{ 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff }
vectors but do want to handle e.g.
{ 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80 }
and similarly with copies 1 we do want to handle e.g.
{ 0x80808080, 0x80808080, 0x80808080, 0x80808080 }.

This is a simpler version of the fix for backports, which limits the EASY_VECTOR_MSB case
matching to step == 1 && copies == 1, because that is the only case the
splitter handles correctly.

2021-07-20 Jakub Jelinek <jakub@redhat.com>

PR target/101384
* config/rs6000/rs6000.c (vspltis_constant): Accept EASY_VECTOR_MSB
only if step and copies are equal to 1.

* gcc.dg/pr101384.c: New test.

(cherry picked from commit dc386b020869ad0095cf58f8c76a40ea457e7a2c)

openmp - Fix up && and || reductions [PR94366]

As the testcase shows, the special treatment of && and || reduction combiners
where we expand them as omp_out = (omp_out != 0) && (omp_in != 0) (or with ||)
is not needed just for &&/|| on floating point or complex types, but for all
&&/|| reductions - when expanded as omp_out = omp_out && omp_in (not in C but
GENERIC) it is actually gimplified into NOP_EXPRs to bool from both operands,
which turns non-zero values multiple of 2 into 0 rather than 1.

This patch just treats all &&/|| the same and furthermore uses bool type
instead of int for the comparisons.

2021-07-01 Jakub Jelinek <jakub@redhat.com>

PR middle-end/94366
gcc/
* omp-low.c (lower_rec_input_clauses): Rename is_fp_and_or to
is_truth_op, set it for TRUTH_*IF_EXPR regardless of new_var's type,
use boolean_type_node instead of integer_type_node as NE_EXPR type.
(lower_reduction_clauses): Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/pr94366.c: New test.

(cherry picked from commit 91c771ec8a3b649765de3e0a7b04cf946c6649ef)

OpenMP: Support complex/float in && and || reduction

C/C++ permit logical AND and logical OR also with floating-point or complex
arguments by doing an unequal zero comparison; the result is an 'int' with
value one or zero. Hence, those are also permitted as reduction variable,
even though it is not the most sensible thing to do.

gcc/c/ChangeLog:

* c-typeck.c (c_finish_omp_clauses): Accept float + complex
for || and && reductions.

gcc/cp/ChangeLog:

* semantics.c (finish_omp_reduction_clause): Accept float + complex
for || and && reductions.

gcc/ChangeLog:

* omp-low.c (lower_rec_input_clauses, lower_reduction_clauses): Handle
&& and || with floating-point and complex arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/clause-1.c: Use 'reduction(&:..)' instead of '...(&&:..)'.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/reduction-2.c: New test.
* testsuite/libgomp.c-c++-common/reduction-3.c: New test.

(cherry picked from commit 1580fc764423bf89e9b853aaa8c65999e37ccb8b)

c++: Optimize away NULLPTR_TYPE comparisons [PR101443]

Comparisons of NULLPTR_TYPE operands cause all kinds of problems in the
middle-end and in fold-const.c, various optimizations assume that if they
see e.g. a non-equality comparison with one of the operands being
INTEGER_CST and it is not INTEGRAL_TYPE_P (which has TYPE_{MIN,MAX}_VALUE),
they can build_int_cst (type, 1) to find a successor.

The following patch fixes it by making sure they don't appear in the IL,
optimize them away at cp_fold time as all can be folded.

Though, I've just noticed that clang++ rejects the non-equality comparisons
instead, foo () > 0 with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') and 'int')
and foo () > nullptr with
invalid operands to binary expression ('decltype(nullptr)' (aka 'nullptr_t') and 'nullptr_t')

Shall we reject those too, in addition or instead of parts of this patch?
If so, wouldn't this patch be still useful for backports, I bet we don't
want to start reject it on the release branches when we used to accept it.

2021-07-15 Jakub Jelinek <jakub@redhat.com>

PR c++/101443
* cp-gimplify.c (cp_fold): For comparisons with NULLPTR_TYPE
operands, fold them right away to true or false.

* g++.dg/cpp0x/nullptr46.C: New test.

(cherry picked from commit 7094a69bd62a14dfa311eaa2fea468f221c7c9f3)

godump: Fix -fdump-go-spec= reproduceability issue [PR101407]

pot_dummy_types is a hash_set from whose traversal the code prints some type
lines. hash_set normally uses default_hash_traits which for pointer types
(the hash set hashes const char *) uses pointer_hash which hashes the
addresses of the pointers except of the least significant 3 bits.
With address space randomization, that results in non-determinism in the
-fdump-go-specs= generated file, each invocation can have different order of
the lines emitted from pot_dummy_types traversal.

This patch fixes it by hashing the string contents instead to make the
hashes reproduceable.

2021-07-14 Jakub Jelinek <jakub@redhat.com>

PR go/101407
* godump.c (godump_str_hash): New type.
(godump_container::pot_dummy_types): Use string_hash instead of
ptr_hash in the hash_set.

(cherry picked from commit 3be762c2ed79e36b9c8faaea2be04725c967a34e)

libgomp: Don't include limits.h instead of hidden visibility block

sem.h is included in between # pragma GCC visibility push(hidden)
and # pragma GCC visibility pop and includes limits.h there, which
since the introduction of sysconf declaration in recent glibcs
in there causes trouble.  libgomp assumes it is compiled by gcc,
so we don't really need to include limits.h there and can use
-__INT_MAX__ - 1 instead (which clang and icc support too for years).

2021-07-13  Jakub Jelinek  <jakub@redhat.com>
    Florian Weimer  <fweimer@redhat.com>

* config/linux/sem.h: Don't include limits.h.
(SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
* config/linux/affinity.c: Include limits.h.

(cherry picked from commit 42f10ba5b57250506d69a0391ea7771c843ea286)

dwarf2out: Handle COMPOUND_LITERAL_EXPR in loc_list_from_tree_1 [PR101266]

In this case dwarf2out_decl is called from the FEs with GENERIC but not
yet gimplified expressions in it.

As loc_list_from_tree_1 has an exhaustive list of tree codes it wants to
handle and for checking asserts no other codes makes it in, we should
handle even GENERIC trees that shouldn't be valid in GIMPLE.

The following patch handles COMPOUND_LITERAL_EXPR by hnadling it like the
underlying VAR_DECL temporary.

Verified the emitted DWARF is correct (but unoptimized, we emit
DW_OP_lit1 DW_OP_lit1 DW_OP_minus for the upper bound).

2021-07-01 Jakub Jelinek <jakub@redhat.com>

PR debug/101266
* dwarf2out.c (loc_list_from_tree_1): Handle COMPOUND_LITERAL_EXPR.

* gcc.dg/pr101266.c: New test.

(cherry picked from commit b0ab968999c9af88d45acf552ca673ef3960306a)

match.pd: Avoid (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST opt in GENERIC when sanitizing [PR101210]

When we have (intptr_t) x == cst where x has REFERENCE_TYPE, this
optimization creates x == cst out of it where cst has REFERENCE_TYPE.
If it is done in GENERIC folding, it can results in ubsan failures
where the INTEGER_CST with REFERENCE_TYPE is instrumented.

Fixed by deferring it to GIMPLE folding in this case.

2021-06-29 Jakub Jelinek <jakub@redhat.com>

PR c++/101210
* match.pd ((intptr_t)x eq/ne CST to x eq/ne (typeof x) CST): Don't
perform the optimization in GENERIC when sanitizing and x has a
reference type.

* g++.dg/ubsan/pr101210.C: New test.

(cherry picked from commit 53fd7544aff6d0a18869017cb9bb921a7f5dcd04)

c: Fix up c_parser_has_attribute_expression [PR101176]

This function keeps src_range member of the result uninitialized, which at
least under valgrind can show up later when those uninitialized location_t's
can make it into the IL or location_t hash tables.

2021-06-24 Jakub Jelinek <jakub@redhat.com>

PR c/101176
* c-parser.c (c_parser_has_attribute_expression): Set source range for
the result.

(cherry picked from commit 178fb8df9315f2f8f45b7fe5faf11a9c2912cc28)

c: Fix C cast error-recovery [PR101171]

The following testcase ICEs during error-recovery, as build_c_cast calls
note_integer_operands on error_mark_node and that wraps it into
C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on.
Seems most other callers of note_integer_operands check early if something
is error_mark_node and return before calling note_integer_operands on it.

The following patch fixes it by not calling on error_mark_node, another
possibility would be to handle error_mark_node in note_integer_operands and
just return it.

2021-06-24 Jakub Jelinek <jakub@redhat.com>

PR c/101171
* c-typeck.c (build_c_cast): Don't call note_integer_operands on
error_mark_node.

* gcc.dg/pr101171.c: New test.

(cherry picked from commit fdc5522fb04b4a820b28c4d1f16f54897f5978de)

openmp: Fix up *_reduction clause handling with UDRs on PARM_DECLs [PR101167]

The following testcase FAILs, because the UDR combiner is invoked incorrectly.
lower_omp_rec_clauses expects that when it sets
DECL_VALUE_EXPR/DECL_HAS_VALUE_EXPR_P
for both the placeholder and the var that everything will be properly
regimplified, but as the variable in question is a PARM_DECL rather than
VAR_DECL, lower_omp_regimplify_p doesn't say that it should be regimplified
and so it is not.

2021-06-23 Jakub Jelinek <jakub@redhat.com>

PR middle-end/101167
* omp-low.c (lower_omp_regimplify_p): Regimplify also PARM_DECLs
and RESULT_DECLs that have DECL_HAS_VALUE_EXPR_P set.

* testsuite/libgomp.c-c++-common/task-reduction-15.c: New test.

(cherry picked from commit 679506c3830ea1a93c755413609bfac3538e2cbd)

inline-asm: Fix ICE with bitfields in "m" operands [PR100785]

Bitfields, while they live in memory, aren't something inline-asm can easily
operate on.
For C and "=m" or "+m", we were diagnosing bitfields in the past in the
FE, where c_mark_addressable had:
      case COMPONENT_REF:
        if (DECL_C_BIT_FIELD (TREE_OPERAND (x, 1)))
          {
            error
              ("cannot take address of bit-field %qD", TREE_OPERAND (x, 1));
            return false;
          }
but that check got moved in GCC 6 to build_unary_op instead and now we
emit an error during expansion and ICE afterwards (i.e. error-recovery).
For "m" it used to be diagnosed in c_mark_addressable too, but since
GCC 6 it is ice-on-invalid.
For C++, this was never diagnosed in the FE, but used to be diagnosed
in the gimplifier and/or during expansion before 4.8.

The following patch does multiple things:
1) diagnoses it in the FEs
2) simplifies during expansion the inline asm if any errors have been
   reported (similarly how e.g. vregs pass if it detects errors on
   inline-asm either deletes them or simplifies to bare minimum -
   just labels), so that we don't have error-recovery ICEs there

2021-06-11  Jakub Jelinek  <jakub@redhat.com>

PR inline-asm/100785
gcc/
* cfgexpand.c (expand_asm_stmt): If errors are emitted,
remove all inputs, outputs and clobbers from the asm and
set template to "".
gcc/c/
* c-typeck.c (c_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/cp/
* typeck.c (cxx_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/testsuite/
* c-c++-common/pr100785.c: New test.

(cherry picked from commit 644c2cc5f2c09506a7bfef293a7f90efa8d7e5fa)

stor-layout: Don't create DECL_BIT_FIELD_REPRESENTATIVE for QUAL_UNION_TYPE [PR101062]

> The following patch does create them, but treats all such bitfields as if
> they were in a structure where the particular bitfield is the only field.

While the patch passed bootstrap/regtest on the trunk, when trying to
backport it to 11 branch the bootstrap failed with
atree.ads:3844:34: size for "Node_Record" too small
errors.  Turns out the error is not about size being too small, but actually
about size being non-constant, and comes from:
/* In a FIELD_DECL of a RECORD_TYPE, this is a pointer to the storage
    representative FIELD_DECL.  */
#define DECL_BIT_FIELD_REPRESENTATIVE(NODE) \
   (FIELD_DECL_CHECK (NODE)->field_decl.qualifier)

/* For a FIELD_DECL in a QUAL_UNION_TYPE, records the expression, which
    if nonzero, indicates that the field occupies the type.  */
  #define DECL_QUALIFIER(NODE) (FIELD_DECL_CHECK (NODE)->field_decl.qualifier)
so by setting up DECL_BIT_FIELD_REPRESENTATIVE in QUAL_UNION_TYPE we
actually set or modify DECL_QUALIFIER and then construct size as COND_EXPRs
with those bit field representatives (e.g. with array type) as conditions
which doesn't fold into constant.

The following patch fixes it by not creating DECL_BIT_FIELD_REPRESENTATIVEs
for QUAL_UNION_TYPE as there is nowhere to store them,

Shall we change tree.h to document that DECL_BIT_FIELD_REPRESENTATIVE
is valid also on UNION_TYPE?
I see:
tree-ssa-alias.c-  if (TREE_CODE (type1) == RECORD_TYPE
tree-ssa-alias.c:      && DECL_BIT_FIELD_REPRESENTATIVE (field1))
tree-ssa-alias.c:    field1 = DECL_BIT_FIELD_REPRESENTATIVE (field1);
tree-ssa-alias.c-  if (TREE_CODE (type2) == RECORD_TYPE
tree-ssa-alias.c:      && DECL_BIT_FIELD_REPRESENTATIVE (field2))
tree-ssa-alias.c:    field2 = DECL_BIT_FIELD_REPRESENTATIVE (field2);
Shall we change that to || == UNION_TYPE or do we assume all fields
are overlapping in a UNION_TYPE already?
At other spots (asan, ubsan, expr.c) it is unclear what will happen
if they see a QUAL_UNION_TYPE with a DECL_QUALIFIER (or does the Ada FE
lower that somehow)?

2021-06-18  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/101062
* stor-layout.c (finish_bitfield_layout): Don't add bitfield
representatives in QUAL_UNION_TYPE.

(cherry picked from commit 3587c2c241eda0f3ab54ea60d46e9caf12d69b5a)

stor-layout: Create DECL_BIT_FIELD_REPRESENTATIVE even for bitfields in unions [PR101062]

The following testcase is miscompiled on x86_64-linux, the bitfield store
is implemented as a RMW 64-bit operation at d+24 when the d variable has
size of only 28 bytes and scheduling moves in between the R and W part
a store to a different variable that happens to be right after the d
variable.

The reason for this is that we weren't creating
DECL_BIT_FIELD_REPRESENTATIVEs for bitfields in unions.

The following patch does create them, but treats all such bitfields as if
they were in a structure where the particular bitfield is the only field.

2021-06-16 Jakub Jelinek <jakub@redhat.com>

PR middle-end/101062
* stor-layout.c (finish_bitfield_representative): For fields in unions
assume nextf is always NULL.
(finish_bitfield_layout): Compute bit field representatives also in
unions, but handle it as if each bitfield was the only field in the
aggregate.

* gcc.dg/pr101062.c: New test.

(cherry picked from commit b4b50bf2864e09f028a39a3f460222632c4d7348)

testsuite: Use noipa attribute instead of noinline, noclone

I've noticed this test now on various arches sometimes FAILs, sometimes
PASSes (the line 12 test in particular).

The problem is that a = 0; initialization in the caller no longer happens
before the f(&a) call as what the argument points to is only used in
debug info.

Making the function noipa forces the caller to initialize it and still
tests what the test wants to test, namely that we don't consider *p as
valid location for the c variable at line 18 (after it has been overwritten
with *p = 1;).

2021-06-16 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/guality/pr49888.c (f): Use noipa attribute instead of
noinline, noclone.

(cherry picked from commit a490b1dc0b3c26bff2ee00ac0da2d606d2009e3a)

libffi: Fix up x86_64 classify_argument

As the following testcase shows, libffi didn't handle properly
classify_arguments of structures at byte offsets not divisible by
UNITS_PER_WORD.  The following patch adjusts it to match what
config/i386/ classify_argument does for that and also ports the
PR38781 fix there (the second chunk).

This has been committed to upstream libffi already:
https://github.com/libffi/libffi/commit/5651bea284ad0822eafe768e3443c2f4d7da2c8f

2021-06-16  Jakub Jelinek  <jakub@redhat.com>

* src/x86/ffi64.c (classify_argument): For FFI_TYPE_STRUCT set words
to number of words needed for type->size + byte_offset bytes rather
than just type->size bytes.  Compute pos before the loop and check
total size of the structure.
* testsuite/libffi.call/nested_struct12.c: New test.

(cherry picked from commit 041f74177072df1d66502319205990a4d970c92a)

expr: Fix up VEC_PACK_TRUNC_EXPR expansion [PR101046]

The following testcase ICEs, because we have a mode mismatch.
VEC_PACK_TRUNC_EXPR's operands have different modes from the result
(same vector mode size but twice as large element),
but we were passing non-NULL subtarget with the mode of the result
to the expansion of its arguments, so the VEC_PERM_EXPR in one of the
operands which had V8SImode operands and result had V16HImode target.

Fixed by clearing the subtarget if we are changing mode.

2021-06-15 Jakub Jelinek <jakub@redhat.com>

PR target/101046
* expr.c (expand_expr_real_2) <case VEC_PACK_FIX_TRUNC_EXPR,
case VEC_PACK_TRUNC_EXPR>: Clear subtarget when changing mode.

(cherry picked from commit 008153c8435ca3bf587e11654c31f05c0f99b43a)

simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR [PR101008]

simplify_relational_operation callees typically return just const0_rtx
or const_true_rtx and then simplify_relational_operation attempts to fix
that up if the comparison result has vector mode, or floating mode,
or punt if it has scalar mode and vector mode operands (it doesn't know how
exactly to deal with the scalar masks).
But, simplify_logical_relational_operation has a special case, where
it attempts to fold (x < y) | (x >= y) etc. and if it determines it is
always true, it just returns const_true_rtx, without doing the dances that
simplify_relational_operation does.
That results in an ICE on the following testcase, where such folding happens
during expansion (of debug stmts into DEBUG_INSNs) and we ICE because
all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode)
rtx.

The following patch fixes that by moving the adjustement into a separate
helper routine and using it from both simplify_relational_operation and
simplify_logical_relational_operation.

2021-06-11 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/101008
* simplify-rtx.c (relational_result): New function.
(simplify_logical_relational_operation,
simplify_relational_operation): Use it.

(cherry picked from commit 4bdcdd8fa8d7659e5a19a930cf2f0332127f8a46)

fold-const: Fix up fold_read_from_vector [PR100887]

The callers of fold_read_from_vector expect that the index they pass is
an index of an element in the vector and the function does that most of the
time. But we allow CONSTRUCTORs with VECTOR_TYPE to have VECTOR_TYPE
elements and in that case every CONSTRUCTOR element represents not just one
index (with the exception of V1 vectors), but multiple.
So returning zero vector if i >= CONSTRUCTOR_NELTS or returning some
CONSTRUCTOR_ELT's value might not be what the callers expect.

Fixed by punting if the first element has vector type.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

In theory we could instead recurse (and assert that for CONSTRUCTORs of
vector elements we have always all elements specified like tree-cfg.c
verifies?) after adjusting the index appropriately.

2021-06-07 Jakub Jelinek <jakub@redhat.com>

PR target/100887
* fold-const.c (fold_read_from_vector): Return NULL if trying to
read from a CONSTRUCTOR with vector type elements.

(cherry picked from commit e1521b170b44be5cd5d36a98b6b760457b68f566)

tree-inline: Fix up __builtin_va_arg_pack handling [PR100898]

The following testcase ICEs, because gimple_call_arg_ptr (..., 0)
asserts that there is at least one argument, while we were using
it even if we didn't copy anything just to get a pointer from/to which
the zero arguments should be copied.

Fixed by guarding the memcpy calls.  Also, the code was calling
gimple_call_num_args too many times - 5 times instead of 2, so the patch
adds two temporaries for those.

2021-06-07  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/100898
* tree-inline.c (copy_bb): Only use gimple_call_arg_ptr if memcpy
should copy any arguments.  Don't call gimple_call_num_args
on id->call_stmt or call_stmt more than once.

* g++.dg/ext/va-arg-pack-3.C: New test.

(cherry picked from commit d66a703c8ba86f3ca04cc10c3071696e6d014de6)

x86: Fix ix86_expand_vector_init for V*TImode [PR100887]

We have vec_initv4tiv2ti and vec_initv2titi patterns which call
ix86_expand_vector_init and assume it works for those modes.  For the
case of construction from two half-sized vectors, the code assumes it
will always succeed, but we have only insn patterns with SImode and DImode
element types.  QImode and HImode element types are already handled
by performing it with same sized vectors with SImode elements and the
following patch extends that to V*TImode vectors.

2021-06-04  Jakub Jelinek  <jakub@redhat.com>

PR target/100887
* config/i386/i386-expand.c (ix86_expand_vector_init): Handle
concatenation from half-sized modes with TImode elements.

(cherry picked from commit b7dd2e4eeb44bc8678ecde8a6c7401de85e63561)

c++: Avoid -Wunused-value false positives on nullptr passed to ellipsis [PR100666]

When passing expressions with decltype(nullptr) type with side-effects to
ellipsis, we pass (void *)0 instead, but for the side-effects evaluate them
on the lhs of a COMPOUND_EXPR. Unfortunately that means we warn about it
if the expression is a call to nodiscard marked function, even when the
result is really used, just needs to be transformed.

Fixed by adding a warning_sentinel.

2021-05-25 Jakub Jelinek <jakub@redhat.com>

PR c++/100666
* call.c (convert_arg_to_ellipsis): For expressions with NULLPTR_TYPE
and side-effects, temporarily disable -Wunused-result warning when
building COMPOUND_EXPR.

* g++.dg/cpp1z/nodiscard8.C: New test.
* g++.dg/cpp1z/nodiscard9.C: New test.

(cherry picked from commit ad52d89808a947264397e920d7483090d4108f7b)

function: Set dummy DECL_ASSEMBLER_NAME in push_dummy_function [PR100580]

Last year I've added cgraph_node::get_create calls for the dummy
functions used for -fdump-passes, so that it interacts well with pass
disabling/enabling which is cgraph uid based.
Unfortunately, as the following testcase shows, when assembler hash
is present, that wants to compute DECL_ASSEMBLER_NAME and the C++ FE
is unprepared to handle it on the dummy functions which don't have
DECL_NAME etc.
The following patch fixes it by setting up a dummy DECL_ASSEMBLER_NAME
on these, so that the FEs don't need to compute it.

2021-05-18 Jakub Jelinek <jakub@redhat.com>

PR c++/100580
* function.c (push_dummy_function): Set DECL_ARTIFICIAL and
DECL_ASSEMBLER_NAME on the fn_decl.

* g++.dg/other/pr100580.C: New test.

(cherry picked from commit 978b62e554ffb4b34844c72d259ce71fcbd87591)

regcprop: Fix another cprop_hardreg bug [PR100342]

On Tue, Jan 19, 2021 at 04:10:33PM +0000, Richard Sandiford via Gcc-patches wrote:
> Ah, ok, thanks for the extra context.
>
> So AIUI the problem when recording xmm2<-di isn't just:
>
>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>
> but also that:
>
>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>
> For example, all registers in this sequence can be part of the same chain:
>
>     (set (reg:HI R1) (reg:HI R0))
>     (set (reg:SI R2) (reg:SI R1)) // [A]
>     (set (reg:DI R3) (reg:DI R2)) // [A]
>     (set (reg:SI R4) (reg:SI R[0-3]))
>     (set (reg:HI R5) (reg:HI R[0-4]))
>
> But:
>
>     (set (reg:SI R1) (reg:SI R0))
>     (set (reg:HI R2) (reg:HI R1))
>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>
> is problematic because it dips below the precision of the oldest regno
> and then increases again.
>
> When this happens, I guess we have two choices:
>
> (1) what the patch does: treat R3 as the start of a new chain.
> (2) pretend that the copy occured in vd->e[sr].mode instead
>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>
> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
> Maybe the optimisation provided by (2) compared to (1) isn't common
> enough to be worth the complication.
>
> I think we should test [B] as well as [A] though.  The pass is set
> up to do some quite elaborate mode changes and I think rejecting
> [A] on its own would make some of the other code redundant.
> It also feels like it should be a seperate “if” or “else if”,
> with its own comment.

Unfortunately, we now have a testcase that shows that testing also [B]
is a problem (unfortunately now latent on the trunk, only reproduces
on 10 and 11 branches).

The comment in the patch tries to list just the interesting instructions,
we have a 64-bit value, copy low 8 bit of those to another register,
copy full 64 bits to another register and then clobber the original register.
Before that (set (reg:DI r14) (const_int ...)) we have a chain
DI r14, QI si, DI bp , that instruction drops the DI r14 from that chain, so
we have QI si, DI bp , si being the oldest_regno.
Next DI si is copied into DI dx.  Only the low 8 bits of that are defined,
the rest is unspecified, but we would add DI dx into that same chain at the
end, so QI si, DI bp, DI dx [*].  Next si is overwritten, so the chain is
DI bp, DI dx.  And then we see (set (reg:DI dx) (reg:DI bp)) and remove it
as redundant, because we think bp and dx are already equivalent, when in
reality that is true only for the lowpart 8 bits.
I believe the [*] marked step above is where the bug is.

The committed regcprop.c (copy_value) change (but only committed to
trunk/11, not to 10) added
  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
           && partial_subreg_p (vd->e[sr].mode,
                                vd->e[vd->e[sr].oldest_regno].mode))
    return;
and while the first partial_subreg_p call returns true, the second one
doesn't; before the (set (reg:DI r14) (const_int ...)) insn it would be
true and we'd return, but as that reg got clobbered, si became the oldest
regno in the chain and so vd->e[vd->e[sr].oldest_regno].mode is QImode
and vd->e[sr].mode is QImode too, so the second partial_subreg_p is false.
But as the testcase shows, what is the oldest_regno in the chain is
something that changes over time, so relying on it for anything is
problematic, something could have a different oldest_regno and later
on get a different oldest_regno (perhaps with different mode) because
the oldest_regno got overwritten and it can change both ways.

The following patch effectively implements your (2) above.

2021-05-15  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded destination
mode too or punt if !REG_CAN_CHANGE_MODE_P.

* gcc.target/i386/pr100342.c: New test.

(cherry picked from commit 425ad87dcfacbb326d8f448a0f2b4d6b53dcd98f)

Fix incorrect optimization by cprop_hardreg.

If SRC had been assigned a mode narrower than the copy, we can't
always link DEST into the chain even they have same
hard_regno_nregs(i.e. HImode/SImode in i386 backend).

i.e
        kmovw   %k0, %edi
        vmovd   %edi, %xmm2
vpshuflw        $0, %xmm2, %xmm0
        kmovw   %k0, %r8d
        kmovd   %k0, %r9d
...
- movl %r9d, %r11d
+ vmovd %xmm2, %r11d

gcc/ChangeLog:

PR rtl-optimization/98694
* regcprop.c (copy_value): If SRC had been assigned a mode
narrower than the copy, we can't link DEST into the chain even
they have same hard_regno_nregs(i.e. HImode/SImode in i386
backend).

gcc/testsuite/ChangeLog:

PR rtl-optimization/98694
* gcc.target/i386/pr98694.c: New test.

(cherry picked from commit e711b67a9081ae84c66174a50705dc98ba993a43)

testsuite: Add testcase for already fixed PR

2021-05-14 Jakub Jelinek <jakub@redhat.com>

* g++.dg/cpp1y/pr88872.C: New test.

(cherry picked from commit f05627d404038368b99e92ac4df4c29f4ae4a5fa)

expand: Don't reuse DEBUG_EXPRs with vector type if they have different modes [PR100508]

The inliner doesn't remap DEBUG_EXPR_DECLs, so the same decls can appear
in multiple functions.
Furthermore, expansion reuses corresponding DEBUG_EXPRs too, so they again
can be reused in multiple functions.
Neither of that is a major problem, DEBUG_EXPRs are just magic value holders
and what value they stand for is independent in each function and driven by
what debug stmts or DEBUG_INSNs they are bound to.
Except for DEBUG_EXPR*s with vector types, TYPE_MODE can be either BLKmode
or some vector mode depending on whether current function's enabled ISAs
support that vector mode or not. On the following testcase, we expand it
first in foo function without AVX2 enabled and so the DEBUG_EXPR is
BLKmode, but later the same DEBUG_EXPR_DECL is used in a simd clone with
AVX2 enabled and expansion ICEs because of a mode mismatch.

The following patch fixes that by forcing recreation of a DEBUG_EXPR if
there is a mode mismatch for vector typed DEBUG_EXPR_DECL, DEBUG_EXPRs
will be still reused in between functions otherwise and within the same
function the mode should be always the same.

2021-05-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/100508
* cfgexpand.c (expand_debug_expr): For DEBUG_EXPR_DECL with vector
type, don't reuse DECL_RTL if it has different mode, instead force
creation of a new DEBUG_EXPR.

* gcc.dg/gomp/pr100508.c: New test.

(cherry picked from commit 19040050aa2c8ee890fc58dda48639fc91bf0af0)

openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471]

When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions.  But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration.  The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes.  We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.

2021-05-11  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/100471
* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
is 0, bypass the reduction loop including
GOMP_taskgroup_reduction_unregister call.

* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
reduction pointer.
* testsuite/libgomp.c/task-reduction-4.c: New test.

(cherry picked from commit 98acbb3111fcb5e57d5e63d46c0d92f4e53e3c2a)

Fix internal error with vectorization on SPARC

This is a regression present since the 10.x series, but the underlying issue
has been there since the TARGET_VEC_PERM_CONST hook was implemented, in the
form of an ICE when expanding a constant VEC_PERM_EXPR in V4QI, while the
back-end only supports V8QI constant VEC_PERM_EXPRs.

gcc/
PR target/105292
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Return
true only for 8-byte vector modes.

gcc/testsuite/
* gcc.target/sparc/20220510-1.c: New test.

Daily bump.

c++: ICE with requires-expr and -Wsequence-point [PR105304]

Here we're crashing from verify_sequence_points for this requires-expr
condition because it contains a templated CAST_EXPR with empty operand,
and verify_tree doesn't ignore this empty operand only because the
manual tail recursion that it performs for unary expression trees skips
the NULL test.

PR c++/105304

gcc/c-family/ChangeLog:

* c-common.c (verify_tree) [restart]: Move up to before the
NULL test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires30.C: New test.

(cherry picked from commit c83b9c54d9dee2dce5d8268472a745b013d166cc)

c++: ICE when building builtin operator->* set [PR103455]

Here when constructing the builtin operator->* candidate set according
to the available conversion functions for the operand types, we end up
considering a candidate with C1=T (through B's dependent conversion
function) and C2=F, during which we crash from DERIVED_FROM_P because
dependent_type_p sees a TEMPLATE_TYPE_PARM outside of a template
context.

Sidestepping the question of whether we should be considering such a
dependent conversion function here in the first place, it seems futile
to test DERIVED_FROM_P for anything other than an actual class type, so
this patch fixes this ICE by simply guarding the DERIVED_FROM_P test
with CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

PR c++/103455

gcc/cp/ChangeLog:

* call.c (add_builtin_candidate) <case MEMBER_REF>: Test
CLASS_TYPE_P instead of MAYBE_CLASS_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/overload/builtin6.C: New test.

(cherry picked from commit 04f19580e8dbdbc7366d0f5fd068aa0cecafdc9d)

c++: double non-dep folding from finish_compound_literal [PR104565]

In finish_compound_literal, we perform non-dependent expr folding before
the call to check_narrowing ever since r9-5973. But ever since r10-7096,
check_narrowing also performs non-dependent expr folding of its own.
This double folding means tsubst will see non-templated trees during the
second folding, which causes a spurious error in the below testcase.

This patch removes the former folding operation; it seems obviated by
the latter one.

PR c++/104565

gcc/cp/ChangeLog:

* semantics.c (finish_compound_literal): Don't perform
non-dependent expr folding before calling check_narrowing.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent22.C: New test.

(cherry picked from commit 6bbd8afee0036c274f5ebb5b48d6fdc2091bd046)

c++: deleted fn and noexcept inst [PR101532, PR104225]

Here when attempting to use B's implicitly deleted default constructor,
mark_used rightfully returns false, but for the wrong reason: it
tries to instantiate the synthesized noexcept specifier which then only
silently fails because get_defaulted_eh_spec suppresses diagnostics
for deleted functions. This lack of diagnostics causes us to crash on
the first testcase below (thanks to the assert in finish_expr_stmt), and
silently accept the second testcase.

To fix this, this patch makes mark_used avoid attempting to instantiate
the noexcept specifier of a deleted function, so that we'll instead
directly reject (and diagnose) the function due to its deletedness.

PR c++/101532
PR c++/104225

gcc/cp/ChangeLog:

* decl2.c (mark_used): Don't consider maybe_instantiate_noexcept
on a deleted function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template21.C: New test.
* g++.dg/cpp0x/nsdmi-template21a.C: New test.

(cherry picked from commit bc90dd0ecf02e11d47d1af7f627e2e2acaa40106)

libstdc++: Use LTLIBICONV when linking libstdc++.so [PR93602]

This fixes missing libiconv symbols when libstdc++ is built on a system
that has libiconv installed. If the libiconv headers are found then
libstdc++ depends on libiconv_open etc instead of libc's iconv_open. But
without this fix libstdc++ is not linked to the libiconv library that
provides the definitions of those symbols.

As discussed in PR 93602 this changed means that libstdc++.so.6 might
have an rpath pointing to the location of the libiconv.so library. If
that is not desired, then GCC must be configured to link to a static
libiconv.a instead, using either --with-libiconv-type=static or an
in-tree build of libiconv.

libstdc++-v3/ChangeLog:

PR libstdc++/93602
* doc/xml/manual/prerequisites.xml: Document libiconv
workarounds.
* doc/html/manual/setup.html: Regenerate.
* src/Makefile.am (CXXLINK): Add $(LTLIBICONV).
* src/Makefile.in: Regenerate.

(cherry picked from commit c644b7df11afc818d6308c3776813e50e4ebe551)