git.ipfire.org Git - thirdparty/gcc.git/log

Update ChangeLog and version files for release

Daily bump.

aarch64: Fix ICE in final_scan_insn_1 [PR87839]

2021-05-07 Jakub Jelinek <jakub@redhat.com>

PR target/87839
* config/aarch64/atomics.md (aarch64_compare_and_swap<mode>): Use
rIJ constraint for aarch64_plus_operand rather than rn.

* gcc.target/aarch64/pr87839.c: New test.

libcpp: Fix up pragma preprocessing [PR100450]

Since the r0-85991-ga25a8f3be322fe0f838947b679f73d6efc2a412c
https://gcc.gnu.org/legacy-ml/gcc-patches/2008-02/msg01329.html
changes, so that we handle macros inside of pragmas that should expand
macros, during preprocessing we print those pragmas token by token,
with CPP_PRAGMA printed as
      fputs ("#pragma ", print.outf);
      if (space)
        fprintf (print.outf, "%s %s", space, name);
      else
        fprintf (print.outf, "%s", name);
where name is some identifier (so e.g. print
#pragma omp parallel
or
#pragma omp for
etc.).  Because it ends in an identifier, we need to handle it like
an identifier (i.e. CPP_NAME) for the decision whether a space needs
to be emitted in between that #pragma whatever or #pragma whatever whatever
and following token, otherwise the attached testcase is preprocessed as
#pragma omp forreduction(+:red)
rather than
#pragma omp for reduction(+:red)
The cpp_avoid_paste function is only called for this purpose.

2021-05-07  Jakub Jelinek  <jakub@redhat.com>

PR c/100450
* lex.c (cpp_avoid_paste): Handle token1 CPP_PRAGMA like CPP_NAME.

* c-c++-common/gomp/pr100450.c: New test.

(cherry picked from commit 170c850e4bd46745e2a5130b5eb09f9fceb98416)

aarch64: Fix gcc.target/aarch64/pr99808.c for ILP32

Fix test for -mabi=ilp32

gcc/testsuite/ChangeLog:

PR target/99808
* gcc.target/aarch64/pr99808.c: Use ULL constant suffix.

aarch64: PR target/99037 Fix RTL represntation in move_lo_quad patterns

This patch fixes the RTL representation of the move_lo_quad patterns to use aarch64_simd_or_scalar_imm_zero
for the zero part rather than a vec_duplicate of zero or a const_int 0.
The expander that generates them is also adjusted so that we use and match the correct const_vector forms throughout.

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog:

PR target/99037
PR target/100441
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): Use
aarch64_simd_or_scalar_imm_zero to match zeroes. Remove pattern
matching const_int 0.
(move_lo_quad_internal_be_<mode>): Likewise.
(move_lo_quad_<mode>): Update for the above.
* config/aarch64/iterators.md (VQ_2E): Delete.

gcc/testsuite/ChangeLog:

PR target/99808
* gcc.target/aarch64/pr99808.c: New test.

Daily bump.

modulo-sched: skip loops with strange register defs [PR100225]

PR84878 fix adds an assertion which can fail, e.g. when stack pointer
is adjusted inside the loop. We have to prevent it and search earlier
for any 'strange' instruction. The solution is to skip the whole loop
if using 'note_stores' we found that one of hard registers is in
'df->regular_block_artificial_uses' set.

Also patch properly prohibit not single-set instruction in loop body.

gcc/ChangeLog:

PR rtl-optimization/100225
PR rtl-optimization/84878
* modulo-sched.c (sms_schedule): Use note_stores to skip loops
where we have an instruction which touches (writes) any hard
register from df->regular_block_artificial_uses set.
Allow not-single-set instruction only right before basic block
tail.

gcc/testsuite/ChangeLog:

PR rtl-optimization/100225
PR rtl-optimization/84878
* gcc.dg/pr100225.c: New test.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test.

(cherry picked from commit 4cf3b10f27b1994cf4a9eb12079d85412ebc7cad)

Daily bump.

PR rtl-optimization/100263: Ensure register can change mode

For move2add_valid_value_p we also have to ask the target whether a
register can be accessed in a different mode than it was set before.

gcc/ChangeLog:

PR rtl-optimization/100263
* postreload.c (move2add_valid_value_p): Ensure register can
change mode.

(cherry picked from commit bb283170e7a1f39bf533651418daf10ad18eccfc)

tree-optimization/98786 - fix issue with phiopt and abnormals

This fixes factor_out_conditional_conversion to avoid creating overlapping
lifetimes for abnormals.

2021-01-22 Richard Biener <rguenther@suse.de>

PR tree-optimization/98786
* tree-ssa-phiopt.c (factor_out_conditional_conversion): Avoid
adding new uses of abnormals.

* gcc.dg/torture/pr98786.c: New testcase.

(cherry picked from commit 329f730fd1daa7cdae4a637244d4e215f9bb9a8c)

early-remat.c: Fix new/delete mismatch [PR100230]

This simple patch fixes a mistmatched operator new/delete in
early-remat.c which triggers ASan errors on (at least) AArch64 when
compiling SVE code.

gcc/ChangeLog:

PR rtl-optimization/100230
* early-remat.c (early_remat::sort_candidates): Use delete[]
instead of delete for array allocated with new[].

(cherry picked from commit 5d87c2251c441f056e0a44f928ffcb8a8a679b6b)

Daily bump.

nvptx: Fix up nvptx build against latest libstdc++ [PR100375]

The r12-220-gd96db15967e78d7cecea3b1cf3169ceb924678ac change
deprecated some non-standard std::pair constructors and that apparently
broke nvptx.c build, where pseudo_node_t is std::pair<struct basic_block_def *, int>
and so nullptr (or NULL) needs to be used for the first argument of the
ctors instead of 0.

2021-05-02 Jakub Jelinek <jakub@redhat.com>

PR target/100375
* config/nvptx/nvptx.c (nvptx_sese_pseudo): Use NULL instead of 0
as first argument of pseudo_node_t constructors.

(cherry picked from commit 7911a905276781c20f704f5a91b5125e0184d072)

aarch64: Fix ICE in aarch64_add_offset_1_temporaries [PR100302]

In PR94121 I've changed aarch64_add_offset_1 to use absu_hwi instead of
abs_hwi because offset can be HOST_WIDE_INT_MIN. As can be seen with
the testcase below, aarch64_add_offset_1_temporaries suffers from the same
problem and should be in sync with aarch64_add_offset_1, i.e. for
HOST_WIDE_INT_MIN it needs a temporary.

2021-04-29 Jakub Jelinek <jakub@redhat.com>

PR target/100302
* config/aarch64/aarch64.c (aarch64_add_offset_1_temporaries): Use
absu_hwi instead of abs_hwi.

(cherry picked from commit 1bb3e2c0ce6ed363c72caf814a6ba6d7b17c3e0a)

cfgcleanup: Fix -fcompare-debug issue in outgoing_edges_match [PR100254]

The following testcase fails with -fcompare-debug. The problem is that
outgoing_edges_match behaves differently between -g0 and -g, if
some load/store with REG_EH_REGION is followed by DEBUG_INSNs, the
REG_EH_REGION check is not done, while when there are no DEBUG_INSNs, it is
done.

We already compute last1 and last2 as BB_END (bb{1,2}) with skipped debug
insns and notes, so this patch just uses those.

2021-04-27 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/100254
* cfgcleanup.c (outgoing_edges_match): Check REG_EH_REGION on
last1 and last2 insns rather than BB_END (bb1) and BB_END (bb2) insns.

* g++.dg/opt/pr100254.C: New test.

(cherry picked from commit e600df51a15b2ec7a72731921a2464ffe59cf5ab)

vmsdbgout: Remove useless register keywords

register keyword was removed in C++17, and in vmsdbgout.c it served no
useful purpose.

2021-04-26 Jakub Jelinek <jakub@redhat.com>

PR debug/100255
* vmsdbgout.c (ASM_OUTPUT_DEBUG_STRING, vmsdbgout_begin_block,
vmsdbgout_end_block, lookup_filename, vmsdbgout_source_line): Remove
register keywords.

(cherry picked from commit 297bfacdb448c0d29b8dfac2818350b90902bc75)

cprop: Fix -fcompare-debug bug in constprop_register [PR100148]

The following testcase shows different behavior between -g and -g0
in constprop_register, if a flags register setter is separated
from a conditional jump using those flags with -g by a DEBUG_INSN.
As it uses just NEXT_INSN, for -g it will look at the DEBUG_INSN which is
not a conditional jump, while otherwise it would look at the conditional
jump and call cprop_jump.

2021-04-21 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/100148
* cprop.c (constprop_register): Use next_nondebug_insn instead of
NEXT_INSN.

* g++.dg/opt/pr100148.C: New test.

(cherry picked from commit 022f6ee3ad67ee30f62c8c2aeeb4156494f3284e)

Add missing alignment checks in epilogue loop vectorisation (PR 86877)

Epilogue loop vectorisation skips vect_enhance_data_refs_alignment
since it doesn't make sense to version or peel the epilogue loop
(that will already have happened for the main loop).  But this means
that it also fails to check whether the accesses are suitably aligned
for the new vector subarch.

We don't seem to carry alignment information from the (potentially
peeled or versioned) main loop to the epilogue loop, which would be
good to fix at some point.  I think we want this patch regardless,
since there's no guarantee that the alignment requirements are the
same for every subarch.

2018-09-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
PR tree-optimization/86877
* tree-vect-loop.c (vect_analyze_loop_2): Call
vect_verify_datarefs_alignment.

gcc/testsuite/
PR tree-optimization/86877
* gfortran.dg/vect/vect-8-epilogue.F90: New test.

(cherry picked from commit 508a909eca536f7f6a60af9bd7ecea761bd2e8f1)

re PR tree-optimization/86159 (g++ ICE at -O1 and above on valid code: incorrect type of vector CONSTRUCTOR elements)

2018-06-15 Richard Biener <rguenther@suse.de>

PR middle-end/86159
* tree-cfg.c (gimplify_build3): Do not strip sign conversions,
leave useless conversion stripping to force_gimple_operand_gsi.
(gimplify_build2): Likewise.
(gimplify_build1): Likewise.

* g++.dg/pr86159.C: New testcase.

(cherry picked from commit fa6852317327d978d4069175952109505204f6ae)

haifa-sched: handle fallthru edge to EXIT block (PR 85899)

2019-03-01 Alexander Monakov <amonakov@ispras.ru>

PR rtl-optimization/85899
* haifa-sched.c (find_fallthru_edge_from): Relax assert to account for
fallthru edges leading to the exit block.

* gcc.dg/pr85899.c: New test.

(cherry picked from commit 5055060066723e409519376c8e571e51cff1eb30)

re PR rtl-optimization/81025 (gcc ICE while building glibc for MIPS soft-float multi-lib variant)

2019-04-03 Jeff Law <law@redhat.com>

PR rtl-optimization/81025
* reorg.c (skip_consecutive_labels): Do not skip past a BARRIER.

(cherry picked from commit 9427422ddacdf1c2914adfb6e8edca87f250fdfc)

Daily bump.

libstdc++: Fix inconsistent feature test macro

The __cpp_lib_constexpr_string feature test macro is not defined
consistently in <version> and <string>.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__cpp_lib_constexpr_string):
Only define for C++17 and later.

(cherry picked from commit 3215d4f5b3d08e0087a88df9e155c779927ace1a)

testsuite: Adjust expected error in a testcase [PR98358]

On Fri, Apr 23, 2021 at 05:24:00PM -0400, Jason Merrill via Gcc wrote:
> > We have one P1 bug though - P98358 - it is unclear if just changing the
> > testcase is all that is needed or if some C++ FE changes are needed.
>
> The error message quoted in the PR looks correct, so I think changing
> the testcase is sufficient.

2021-04-30 Jakub Jelinek <jakub@redhat.com>

PR c++/98358
* g++.dg/template/pr98297.C: Expect error about shadowing template template
parameter rather than does not declare anything error.

c++/98032 - add testcase

This adds another testcase for PR95719.

2021-04-30 Richard Biener <rguenther@suse.de>

PR c++/98032
* g++.dg/pr98032.C: New testcase.

(cherry picked from commit dfc70841eb0ca42637826177f329cf6c98ee00ad)

c++: Fix ICE with using and virtual function. [PR95719]

conversion_path points to the base where we found the using-declaration, not
where the function is actually a member; look up the actual base. And then
maybe look back to the derived class if the base is primary.

gcc/cp/ChangeLog:

PR c++/95719
* call.c (build_over_call): Look up the overrider in base_binfo.
* class.c (lookup_vfn_in_binfo): Look through BINFO_PRIMARY_P.

gcc/testsuite/ChangeLog:

PR c++/95719
* g++.dg/tree-ssa/final4.C: New test.

(cherry picked from commit 710df943389a854ef00c15ed0ad219b41db236aa)

bootstrap/87338 - gcc 8.2 fails to bootstrap on ia64

This uses ASM_OUTPUT_DEBUG_LABEL instead of ASM_GENERATE_INTERNAL_LABEL
and ASM_OUTPUT_LABEL.

2019-05-21 James Clarke <jrtc27@jrtc27.com>

PR bootstrap/87338
* dwarf2out.c (dwarf2out_inline_entry): Use ASM_OUTPUT_DEBUG_LABEL
instead of ASM_GENERATE_INTERNAL_LABEL and ASM_OUTPUT_LABEL.

(cherry picked from commit fe18e4e0845c1821ca51e0a12c284b24a2b60f3a)

Daily bump.

libstdc++: Define __cpp_lib_constexpr_string macro

As noted in r11-1339-gb6ab9ecd550227684643b41e9e33a4d3466724d8 we define
a non-standard __cpp_lib_constexpr_char_traits feature test macro to
indicate support for P0426R1 and P1032R1. At some point last year the
__cpp_lib_constexpr_string macro was retconned to indicate support for
those papers. This adds the new macro (which we didn't previously
define, because it referred to P0980R1 "Making std::string constexpr"
which we don't support).

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__cpp_lib_constexpr_string): Define.
* testsuite/21_strings/char_traits/requirements/constexpr_functions_c++17.cc:
Check for __cpp_lib_constexpr_string.

(cherry picked from commit 3da80ed7efd582575e7850a403ce693ec882d087)

tree-optimization/99954 - fix loop distribution memcpy classification

This fixes bogus classification of a copy as memcpy.  We cannot use
plain dependence analysis to decide between memcpy and memmove when
it computes no dependence.  Instead we have to try harder later which
the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting
to tree-affine to compute the difference between src and dest and
compare against the copy size.

2021-04-07  Richard Biener  <rguenther@suse.de>

PR tree-optimization/99954
* tree-loop-distribution.c: Include tree-affine.h.
(generate_memcpy_builtin): Try using tree-affine to prove
non-overlap.
(loop_distribution::classify_builtin_ldst): Always classify
as PKIND_MEMMOVE.

* gcc.dg/torture/pr99954.c: New testcase.

(cherry picked from commit b091cb1efa1881e93fb2e264daaab8876acf6800)

Daily bump.

i386: Fix atomic FP peepholes [PR100182]

64bit loads to/stores from x87 and SSE registers are atomic also on 32-bit
targets, so there is no need for additional atomic moves to a temporary
register.

Introduced load peephole2 patterns assume that there won't be any additional
loads from the load location outside the peepholed sequence and wrongly
removed the source location initialization.

OTOH, introduced store peephole2 patterns assume there won't be any additional
loads from the stored location outside the peepholed sequence and wrongly
removed the destination location initialization.  Note that we can't use plain
x87 FST instruction to initialize destination location because FST converts
the value to the double-precision format, changing bits during move.

The patch restores removed initializations in load and store patterns.
Additionally, plain x87 FST in store peephole2 patterns is prevented by
limiting the store operand source to SSE registers.

2021-04-23  Uroš Bizjak  <ubizjak@gmail.com>

gcc/
PR target/100182
* config/i386/sync.md (FILD_ATOMIC/FIST_ATOMIC FP load peephole2):
Copy operand 3 to operand 4.  Use sse_reg_operand
as operand 3 predicate.
(FILD_ATOMIC/FIST_ATOMIC FP load peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP load peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP load peephole2 with mem blockage): Ditto.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2):
Copy operand 1 to operand 0.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP store peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP store peephole2 with mem blockage): Ditto.

gcc/testsuite/

PR target/100182
* gcc.target/i386/pr100182.c: New test.
* gcc.target/i386/pr71245-1.c (dg-final): Xfail scan-assembler-not.
* gcc.target/i386/pr71245-2.c (dg-final): Ditto.

(cherry picked from commit d2324a5ab3ff097864ae6828cb1db4dd013c70d1)

Daily bump.

[PATCH] Backport fix for PR target/989r2

The test in the PowerPC 32-bit trampoline support is backwards.  It aborts
if the trampoline size is greater than the expected size.  It should abort
when the trampoline size is less than the expected size.  I fixed the test
so the operands are reversed.  I then folded the load immediate into the
compare instruction.

I verified this by creating a 32-bit trampoline program and manually
changing the size of the trampoline to be 48 instead of 40.  The program
aborted with the larger size.  I updated this code and ran the test again
and it passed.

I added a test case that runs on PowerPC 32-bit Linux systems and it calls
the __trampoline_setup function with a larger buffer size than the
compiler uses.  The test is not run on 64-bit systems, since the function
__trampoline_setup is not called.  I also limited the test to just Linux
systems, in case trampolines are handled differently in other systems.

libgcc/
2021-04-26  Michael Meissner  <meissner@linux.ibm.com>

PR target/98952
* config/rs6000/tramp.S (__trampoline_setup, elfv1 #ifdef): Fix
trampoline size comparison in 32-bit by reversing test and
combining load immediate with compare.  Fix backported from trunk
change on 4/23, 886b6c1e8af502b69e3f318b9830b73b88215878.
(__trampoline_setup, elfv2 #ifdef): Fix trampoline size comparison
in 32-bit by reversing test and combining load immediate with
compare.

gcc/testsuite/
2021-04-26  Michael Meissner  <meissner@linux.ibm.com>

PR target/98952
* gcc.target/powerpc/pr98952.c: New test.  Test backported from
trunk change on 4/23, 886b6c1e8af502b69e3f318b9830b73b88215878.

lto/96385 - avoid unused global UNDEFs in debug objects

Unused global UNDEFs can have side-effects in some circumstances so
the following patch avoids them by treating them the same as other
to be discarded DEFs - make them local.

2020-08-03 Richard Biener <rguenther@suse.de>

PR lto/96385
libiberty/
* simple-object-elf.c
(simple_object_elf_copy_lto_debug_sections): Localize global
UNDEFs and reuse the prevailing name.

(cherry picked from commit b32c5d0b72fda2588b4e170e75a9c64e4bf266c7)

lto/96591 - walk VECTOR_CST elements in walk_tree

This implements walking of VECTOR_CST elements in walk_tree, mimicing
the walk of COMPLEX_CST elements. Without this free-lang-data fails
to see some types in case they are only refered to via tree constants
used only as VECTOR_CST elements.

2021-02-08 Richard Biener <rguenther@suse.de>

PR lto/96591
* tree.c (walk_tree_1): Walk VECTOR_CST elements.

* g++.dg/lto/pr96591_0.C: New testcase.

(cherry picked from commit d4536e431316b4568e236afd7a6017e5efd1b0a1)

tree-optimization/98117 - fix range set by vectorization on niter IVs

This avoids the degenerate case of a TYPE_MAX_VALUE latch iteration
count value causing wrong range info for the vector IV. There's
still the case of VF == 1 where if we don't know whether we hit the
above case we cannot emit a range.

2020-12-07 Richard Biener <rguenther@suse.de>

PR tree-optimization/98117
* tree-vect-loop-manip.c (vect_gen_vector_loop_niters):
Properly handle degenerate niter when setting the vector
loop IV range.

* gcc.dg/torture/pr98117.c: New testcase.

(cherry picked from commit 69894ce172412996c10c89838717980ede7c9003)

Check for matching CONST_VECTOR encodings [PR99929]

PR99929 is one of those “how did we get away with this for so long”
bugs: the equality routines weren't checking whether two variable-length
CONST_VECTORs had the same encoding.  This meant that:

   { 1, 0, 0, 0, 0, 0, ... }

would appear to be equal to:

   { 1, 0, 1, 0, 1, 0, ... }

since both are represented using the elements { 1, 0 }.

gcc/
PR rtl-optimization/99929
* rtl.h (same_vector_encodings_p): New function.
* cse.c (exp_equiv_p): Check that CONST_VECTORs have the same encoding.
* cselib.c (rtx_equal_for_cselib_1): Likewise.
* jump.c (rtx_renumbered_equal_p): Likewise.
* lra-constraints.c (operands_match_p): Likewise.
* reload.c (operands_match_p): Likewise.
* rtl.c (rtx_equal_p_cb, rtx_equal_p): Likewise.

(cherry picked from commit a87d3f964df31d4fbceb822c6d293e85c117d992)

aarch64: Tweak post-RA handling of CONST_INT moves [PR98136]

This PR is a regression caused by r8-5967, where we replaced
a call to aarch64_internal_mov_immediate in aarch64_add_offset
with a call to aarch64_force_temporary, which in turn uses the
normal emit_move_insn{,_1} routines.

The problem is that aarch64_add_offset can be called while
outputting a thunk, where we require all instructions to be
valid without splitting. However, the move expanders were
not splitting CONST_INT moves themselves.

I think the right fix is to make the move expanders work
even in this scenario, rather than require callers to handle
it as a special case.

gcc/
PR target/98136
* config/aarch64/aarch64.md (mov<mode>): Pass multi-instruction
CONST_INTs to aarch64_expand_mov_immediate when called after RA.

gcc/testsuite/
PR target/98136
* g++.dg/pr98136.C: New test.

(cherry picked from commit 48c79f054bf435051c95ee093c45a0f8c9de5b4e)

early-remat: Handle sets of multiple candidate regs [PR94605]

early-remat.c:process_block wasn't handling insns that set multiple
candidate registers, which led to an assertion failure at the end
of the main loop.

Instructions that set two pseudos aren't rematerialisation candidates in
themselves, but we still need to track them if another instruction that
sets the same register is a rematerialisation candidate.

gcc/
PR rtl-optimization/94605
* early-remat.c (early_remat::process_block): Handle insns that
set multiple candidate registers.

gcc/testsuite/
PR rtl-optimization/94605
* gcc.target/aarch64/sve/pr94605.c: New test.

(cherry picked from commit 3c3f12e2a7625c9a2f5d74a47dbacb2fd1ae5643)

Daily bump.

sanitizer: Fix asan against glibc 2.34 [PR100114]

As mentioned in the PR, SIGSTKSZ is no longer a compile time constant in
glibc 2.34 and later, so
static const uptr kAltStackSize = SIGSTKSZ * 4;
needs dynamic initialization, but is used by a function called indirectly
from .preinit_array and therefore before the variable is constructed.
This results in using 0 size instead and all asan instrumented programs
die with:
==91==ERROR: AddressSanitizer failed to allocate 0x0 (0) bytes of SetAlternateSignalStack (error code: 22)

Here is a cherry-pick from upstream to fix this.

2021-04-17 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/100114
* sanitizer_common/sanitizer_posix_libcdep.cc: Cherry-pick
llvm-project revisions 82150606fb11d28813ae6da1101f5bda638165fe
and b93629dd335ffee2fc4b9b619bf86c3f9e6b0023.

(cherry picked from commit 950bac27d63c1c2ac3a6ed867692d6a13f21feb3)

intl: Add --enable-host-shared support [PR100096]

As mentioned in the PR, building gcc with jit enabled and
--enable-host-shared doesn't work on NetBSD/i?86, as libgccjit.so.0
has text relocations.
The r0-125846-g459260ecf8b420b029601a664cdb21c185268ecb changes
added --enable-host-shared support to various libraries, but didn't
add it to intl/ subdirectory; on Linux it isn't really needed, because
all: all-no
all-no: #nothing
but on other OSes intl/libintl.a is built.

The following patch makes sure it is built with -fPIC when
--enable-host-shared is used.

2021-04-16 Jakub Jelinek <jakub@redhat.com>

PR jit/100096
* configure.ac: Add --enable-host-shared support.
* Makefile.in: Update copyright. Add @PICFLAG@ to CFLAGS.
* configure: Regenerated.

(cherry picked from commit a11f31102706e33f66b60367d6863613ab3bd051)

c++: Fix up handling of structured bindings in extract_locals_r [PR99833]

The following testcase ICEs in tsubst_decomp_names because the assumptions
that the structured binding artificial var is followed in DECL_CHAIN by
the corresponding structured binding vars is violated.
I've tracked it to extract_locals* which is done for the constexpr
IF_STMT. extract_locals_r when it sees a DECL_EXPR adds that decl
into a hash set so that such decls aren't returned from extract_locals*,
but in the case of a structured binding that just means the artificial var
and not the vars corresponding to structured binding identifiers.
The following patch fixes it by pushing not just the artificial var
for structured bindings but also the other vars.

2021-04-16 Jakub Jelinek <jakub@redhat.com>

PR c++/99833
* pt.c (extract_locals_r): When handling DECL_EXPR of a structured
binding, add to data.internal also all corresponding structured
binding decls.

* g++.dg/cpp1z/pr99833.C: New test.

(cherry picked from commit 06d50ebc9fb2761ed2bdda5e76adb4d47a8ca983)

combine: Fix up expand_compound_operation [PR99905]

The following testcase is miscompiled on x86_64-linux.
expand_compound_operation is called on
(zero_extract:DI (mem/c:TI (reg/f:DI 16 argp) [3 i+0 S16 A128])
    (const_int 16 [0x10])
    (const_int 63 [0x3f]))
so mode is DImode, inner_mode is TImode, pos 63, len 16 and modewidth 64.

A couple of lines above the problematic spot we have:
  if (modewidth >= pos + len)
    {
      tem = gen_lowpart (mode, XEXP (x, 0));
where the code uses gen_lowpart and then shift left/right to extract it
in mode.  But the guarding condition is false - 64 >= 63 + 16
and so we enter the next condition, where the code shifts XEXP (x, 0)
right by pos and then adds AND.  It does so incorrectly though.
Given the modewidth < pos + len, inner_mode must be necessarily larger
than mode and XEXP (x, 0) has the innermode, but it was calling
simplify_shift_const with mode rather than inner_mode, which meant
inconsistent arguments to simplify_shift_const and in this case made
a DImode MEM shift out of it.

The following patch fixes it, by doing the shift in inner_mode properly
and then after the shift doing the lowpart subreg and masking already
in mode.

2021-04-13  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/99905
* combine.c (expand_compound_operation): If pos + len > modewidth,
perform the right shift by pos in inner_mode and then convert to mode,
instead of trying to simplify a shift of rtx with inner_mode by pos
as if it was a shift in mode.

* gcc.target/i386/pr99905.c: New test.

(cherry picked from commit c965254e5af9dc68444e0289250c393ae0cd6131)

combine: Don't fold away side-effects in simplify_and_const_int_1 [PR99830]

Here is an alternate patch for the PR99830 bug.
As discussed on IRC and in the PR, the reason why a (clobber:TI (const_int 0))
has been propagated into the debug insns is that it got optimized away
during simplification from the i3 instruction pattern.

And that happened because
simplify_and_const_int_1 (SImode, varop, 255)
with varop of
(ashift:SI (subreg:SI (and:TI (clobber:TI (const_int 0 [0]))
                              (const_int 255 [0xff])) 0)
           (const_int 16 [0x10]))
was called and through nonzero_bits determined that (whatever << 16) & 255
is const0_rtx.
It is, but if there are side-effects in varop and such clobbers are
considered as such, we shouldn't optimize those away.

2021-04-13  Jakub Jelinek  <jakub@redhat.com>

PR debug/99830
* combine.c (simplify_and_const_int_1): Don't optimize varop
away if it has side-effects.

* gcc.dg/pr99830.c: New test.

(cherry picked from commit 4ac7483ede91fef7cfd548ff6e30e46eeb9d95ae)

c: Avoid clobbering TREE_TYPE (error_mark_node) [PR99990]

The following testcase ICEs during error recovery, because finish_decl
overwrites TREE_TYPE (error_mark_node), which better should stay always
to be error_mark_node.

2021-04-10 Jakub Jelinek <jakub@redhat.com>

PR c/99990
* c-decl.c (finish_decl): Don't overwrite TREE_TYPE of
error_mark_node.

* gcc.dg/pr99990.c: New test.

(cherry picked from commit 91e076f3a66c1c9f6aa51e9d53d07803606e3bf1)

expand: Fix up LTO ICE with COMPOUND_LITERAL_EXPR [PR99849]

The gimplifier optimizes away COMPOUND_LITERAL_EXPRs, but they can remain
in the form of ADDR_EXPR of COMPOUND_LITERAL_EXPRs in static initializers.
By the TREE_STATIC check I meant to check that the underlying decl of
the compound literal is a global rather than automatic variable which
obviously can't be referenced in static initializers, but unfortunately
with LTO it might end up in another partition and thus be DECL_EXTERNAL
instead.

2021-04-10 Jakub Jelinek <jakub@redhat.com>

PR lto/99849
* expr.c (expand_expr_addr_expr_1): Test is_global_var rather than
just TREE_STATIC on COMPOUND_LITERAL_EXPR_DECLs.

* gcc.dg/lto/pr99849_0.c: New test.

(cherry picked from commit 2e57bc7eedb084869d17fe07b538d907b8fee819)

rtlanal: Another fix for VOIDmode MEMs [PR98601]

This is a sequel to the PR85022 changes, inline-asm can (unfortunately)
introduce VOIDmode MEMs and in PR85022 they have been changed so that
we don't pretend we know their size (as opposed to assuming they have
zero size).

This time we ICE in rtx_addr_can_trap_p_1 because it assumes that
all memory but BLKmode has known size.  The patch just treats VOIDmode
MEMs like BLKmode in that regard.  And, the STRICT_ALIGNMENT change
is needed because VOIDmode has GET_MODE_SIZE of 0 and we don't want to
check if something is a multiple of 0.

2021-04-10  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/98601
* rtlanal.c (rtx_addr_can_trap_p_1): Allow in assert unknown size
not just for BLKmode, but also for VOIDmode.  For STRICT_ALIGNMENT
unaligned_mems handle VOIDmode like BLKmode.

* gcc.dg/torture/pr98601.c: New test.

(cherry picked from commit e68ac8c2b46997af1464f2549ac520a192c928b1)

dse: Fix up hard reg conflict checking in replace_read [PR99863]

Since PR37922 fix RTL DSE has hard register conflict checking
in replace_read, so that if the replacement sequence sets (or typically just
clobbers) some hard register (usually condition codes) we verify that
hard register is not live.
Unfortunately, it compares the hard reg set clobbered/set by the sequence
(regs_set) against the currently live hard register set, but it then
emits the insn sequence not at the current insn position, but before
store_insn->insn.
So, we should not compare against the current live hard register set,
but against the hard register live set at the point of the store insn.
Fortunately, we already have that remembered in store_insn->fixed_regs_live.

In addition to bootstrapping/regtesting this patch on x86_64-linux and
i686-linux, I've also added statistics gathering and it seems the only
place where we end up rejecting the replace_read is the newly added
testcase (the PR37922 is no longer effective at that) and fixed_regs_live
has been always non-NULL at the if (store_insn->fixed_regs_live) spot.
Rather than having there an assert, I chose to just keep regs_set
as is, which means in that hypothetical case where fixed_regs_live wouldn't
be computed for some store we'd still accept sequences that don't
clobber/set any hard registers and just punt on those that clobber/set
those.

2021-04-03 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/99863
* dse.c (replace_read): Drop regs_live argument. Instead of
regs_live, use store_insn->fixed_regs_live if non-NULL,
otherwise punt if insns sequence clobbers or sets any hard
registers.

* gcc.target/i386/pr99863.c: New test.

(cherry picked from commit 7a2f91d413eb7a3eb0ba52c7ac9618a35addd12a)

c++: Fix ICE on PTRMEM_CST in lambda in inline var initializer [PR99790]

The following testcase ICEs (since the addition of inline var support),
because the lambda contains PTRMEM_CST but finish_function is called for the
lambda quite early during parsing it (from finish_lambda_function) when
the containing class is still incomplete. That means that during
genericization cplus_expand_constant keeps the PTRMEM_CST unmodified, but
later nothing lowers it when the class is finalized.
Using sizeof etc. on the class in such contexts is rejected by both g++ and
clang++, and when the PTRMEM_CST appears e.g. in static var initializers
rather than in functions, we handle it correctly because c_parse_final_cleanups
-> lower_var_init will handle those cplus_expand_constant when all classes
are already finalized.

The following patch fixes it by calling cplus_expand_constant again during
gimplification, as we are now unconditionally unit at a time, I'd think
everything that could be completed will be before we start gimplification.

2021-03-30 Jakub Jelinek <jakub@redhat.com>

PR c++/99790
* cp-gimplify.c (cp_gimplify_expr): Handle PTRMEM_CST.

* g++.dg/cpp1z/pr99790.C: New test.

(cherry picked from commit 7cdd30b43a63832d6f908b2dd64bd19a0817cd7b)

fold-const: Fix ICE in extract_muldiv_1 [PR99777]

extract_muldiv{,_1} is apparently only prepared to handle scalar integer
operations, the callers ensure it by only calling it if the divisor or
one of the multiplicands is INTEGER_CST and because neither multiplication
nor division nor modulo are really supported e.g. for pointer types, nullptr
type etc.  But the CASE_CONVERT handling doesn't really check if it isn't
a cast from some other type kind, so on the testcase we end up trying to
build MULT_EXPR in POINTER_TYPE which ICEs.  A few years ago Marek has
added ANY_INTEGRAL_TYPE_P checks to two spots, but the code uses
TYPE_PRECISION which means something completely different for vector types,
etc.
So IMNSHO we should just punt on conversions from non-integrals or
non-scalar integrals.

2021-03-29  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/99777
* fold-const.c (extract_muldiv_1): For conversions, punt on casts from
types other than scalar integral types.

* g++.dg/torture/pr99777.C: New test.

(cherry picked from commit afe9a630eae114665e77402ea083201c9d406e99)

dwarf2cfi: Defer queued register saves some more [PR99334]

On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
   0:   4c 8d 54 24 08          lea    0x8(%rsp),%r10
   5:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
   9:   41 ff 72 f8             pushq  -0x8(%r10)
   d:   55                      push   %rbp
   e:   48 89 e5                mov    %rsp,%rbp
  11:   41 57                   push   %r15
  13:   41 56                   push   %r14
  15:   41 55                   push   %r13
  17:   41 54                   push   %r12
  19:   41 52                   push   %r10
  1b:   53                      push   %rbx
  1c:   48 83 ec 20             sub    $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:     1b
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8
  DW_CFA_nop
  DW_CFA_nop

00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
  DW_CFA_advance_loc: 5 to 0000000000000005
  DW_CFA_def_cfa: r10 (r10) ofs 0
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 13 to 000000000000001b
  DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
  DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
  DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
  DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
  DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that.  The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct.  So, either we'd need to temporarily
say:
  DW_CFA_advance_loc: 9 to 000000000000000e
  DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
  DW_CFA_advance_loc: 3 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
  DW_CFA_advance_loc: 12 to 0000000000000011
  DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
  DW_CFA_advance_loc: 10 to 000000000000001b

I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
from the backend, but that failed miserably as explained in the PR,
dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
dynamic stack realignment using drap register that only the i386 backend
does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
can't emulate those rules.  The following patch instead does the deferring
of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
emits it on the (set hfp sp) assignment that must appear shortly after it
and adds assertion that it is the case.

The difference before/after the patch on the assembly is:
--- pr99334.s~  2021-03-26 15:42:40.881749380 +0100
+++ pr99334.s   2021-03-26 17:38:05.729161910 +0100
@@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
        andq    $-16, %rsp
        pushq   -8(%r10)
        pushq   %rbp
-       .cfi_escape 0x10,0x6,0x2,0x76,0
        movq    %rsp, %rbp
+       .cfi_escape 0x10,0x6,0x2,0x76,0
        pushq   %r15
        pushq   %r14
        pushq   %r13
i.e. does just what we IMHO need, after pushq %rbp %rbp
still contains parent's frame value and so the save rule doesn't
need to be overridden there, ditto at the start of the next insn
before the side-effect took effect, and we override it only after
it when %rbp already has the right value.

If some other target adds dynamic stack realignment in the future and
the offset 0 case wouldn't be true there, the code can be adjusted so that
it works on all the drap architectures, I'm pretty sure the code would
need other adjustments too.

For the rule 18 and for the (set hfp sp) after it we already have asserts
for the drap cases that check whether the code looks the way i?86/x86_64
emit it currently.

2021-03-26  Jakub Jelinek  <jakub@redhat.com>

PR debug/99334
* dwarf2out.h (struct dw_fde_node): Add rule18 member.
* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
assignment with drap_reg active, queue reg save for hfp with offset 0
and flush queued reg saves.  When handling a push with rule18,
defer queueing reg save for hfp and just assert the offset is 0.
(scan_trace): Assert that fde->rule18 is false.

(cherry picked from commit f5df18504c1790413f293bfb50d40faa7f1ea860)

c++: Diagnose bare parameter packs in bitfield widths [PR99745]

The following invalid tests ICE because we don't diagnose (and drop) bare
parameter packs in bitfield widths.

2021-03-25 Jakub Jelinek <jakub@redhat.com>

PR c++/99745
* decl2.c (grokbitfield): Diagnose bitfields containing bare parameter
packs and don't set DECL_BIT_FIELD_REPRESENTATIVE in that case.

* g++.dg/cpp0x/variadic181.C: New test.

(cherry picked from commit f8780caf07340f5d5e55cf5fb1b2be07cabab1ea)

c++: Diagnose references to void in structured bindings [PR99650]

We ICE on the following testcase, because std::tuple_element<...,...>::type
is void and for structured bindings we therefore need to create
void & or void && which is invalid. We created such REFERENCE_TYPE and
later ICEd in the middle-end.
The following patch fixes it by diagnosing that.

2021-03-23 Jakub Jelinek <jakub@redhat.com>

PR c++/99650
* decl.c (cp_finish_decomp): Diagnose void initializers when
using tuple_element and get.

* g++.dg/cpp1z/decomp55.C: New test.

(cherry picked from commit d5e379e3fe19362442b5d0ac608fb8ddf67fecd3)

dwarf2out: Fix debug info for 2 byte floats [PR99388]

Aarch64, ARM and a couple of other architectures have 16-bit floats, HFmode.
As can be seen e.g. on
void
foo (void)
{
  __fp16 a = 1.0;
  asm ("nop");
  a = 2.0;
  asm ("nop");
  a = 3.0;
  asm ("nop");
}
testcase, GCC mishandles this on the dwarf2out.c side by assuming all
floating point types have sizes in multiples of 4 bytes, so what GCC emits
is it says that e.g. the DW_OP_implicit_value will be 2 bytes but then
doesn't emit anything and so anything emitted after it is treated by
consumers as the value and then they get out of sync.
real_to_target which insert_float uses indeed fills it that way, but putting
into an array of long 32 bits each time, but for the half floats it puts
everything into the least significant 16 bits of the first long no matter
what endianity host or target has.

The following patch fixes it.  With the patch the -g -O2 -dA output changes
(in a cross without .uleb128 support):
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x3c00  // fp or vector constant word 0
        .byte   0x7     // DW_LLE_start_end (*.LLST0)
        .8byte  .LVL1   // Location list begin address (*.LLST0)
        .8byte  .LVL2   // Location list end address (*.LLST0)
        .byte   0x4     // uleb128 0x4; Location expression size
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x4000  // fp or vector constant word 0
        .byte   0x7     // DW_LLE_start_end (*.LLST0)
        .8byte  .LVL2   // Location list begin address (*.LLST0)
        .8byte  .LFE0   // Location list end address (*.LLST0)
        .byte   0x4     // uleb128 0x4; Location expression size
        .byte   0x9e    // DW_OP_implicit_value
        .byte   0x2     // uleb128 0x2
+       .2byte  0x4200  // fp or vector constant word 0
        .byte   0       // DW_LLE_end_of_list (*.LLST0)

Bootstrapped/regtested on x86_64-linux, aarch64-linux and
armv7hl-linux-gnueabi, ok for trunk?

I fear the CONST_VECTOR case is still broken, while HFmode elements of vectors
should be fine (it uses eltsize of the element sizes) and likewise SFmode could
be fine, DFmode vectors are emitted as two 32-bit ints regardless of endianity
and I'm afraid it can't be right on big-endian.  But I haven't been able to
create a testcase that emits a CONST_VECTOR, for e.g. unused vector vars
with constant operands we emit CONCATN during expansion and thus ...
DW_OP_*piece for each element of the vector and for
DW_TAG_call_site_parameter we give up (because we handle CONST_VECTOR only
in loc_descriptor, not mem_loc_descriptor).

2021-03-21  Jakub Jelinek  <jakub@redhat.com>

PR debug/99388
* dwarf2out.c (insert_float): Change return type from void to
unsigned, handle GET_MODE_SIZE (mode) == 2 and return element size.
(mem_loc_descriptor, loc_descriptor, add_const_value_attribute):
Adjust callers.

(cherry picked from commit d3dd3703f1d42b14c88b91e51a2a775fe00a2974)

c: Fix up -Wunused-but-set-* warnings for _Atomics [PR99588]

As the following testcases show, compared to -D_Atomic= case we have many
-Wunused-but-set-* warning false positives.
When an _Atomic variable/parameter is read, we call mark_exp_read on it in
convert_lvalue_to_rvalue, but build_atomic_assign does not.
For consistency with the non-_Atomic case where we mark_exp_read the lhs
for lhs op= ... but not for lhs = ..., this patch does that too.
But furthermore we need to pattern match the trees emitted by _Atomic store,
so that _Atomic store itself is not marked as being a variable read, but
when the result of the store is used, we mark it.

2021-03-19 Jakub Jelinek <jakub@redhat.com>

PR c/99588
* c-typeck.c (mark_exp_read): Recognize what build_atomic_assign
with modifycode NOP_EXPR produces and mark the _Atomic var as read
if found.
(build_atomic_assign): For modifycode of NOP_EXPR, use COMPOUND_EXPRs
rather than STATEMENT_LIST. Otherwise call mark_exp_read on lhs.
Set TREE_SIDE_EFFECTS on the TARGET_EXPR.

* gcc.dg/Wunused-var-5.c: New test.
* gcc.dg/Wunused-var-6.c: New test.

(cherry picked from commit b1fc1f1c4b2e9005c40ed476b067577da2d2ce84)

c++: Ensure correct destruction order of local statics [PR99613]

As mentioned in the PR, if end of two constructions of local statics
is strongly ordered, their destructors should be run in the reverse order.
As we run __cxa_guard_release before calling __cxa_atexit, it is possible
that we have two threads that access two local statics in the same order
for the first time, one thread wins the __cxa_guard_acquire on the first
one but is rescheduled in between the __cxa_guard_release and __cxa_atexit
calls, then the other thread is scheduled and wins __cxa_guard_acquire
on the second one and calls __cxa_quard_release and __cxa_atexit and only
afterwards the first thread calls its __cxa_atexit.  This means a variable
whose completion of the constructor strongly happened after the completion
of the other one will be destructed after the other variable is destructed.

The following patch fixes that by swapping the __cxa_guard_release and
__cxa_atexit calls.

2021-03-16  Jakub Jelinek  <jakub@redhat.com>

PR c++/99613
* decl.c (expand_static_init): For thread guards, call __cxa_atexit
before calling __cxa_guard_release rather than after it.  Formatting
fixes.

(cherry picked from commit 1703937a05b8b95bc29d2de292387dfd9eb7c9a3)

expand: Fix ICE in store_bit_field_using_insv [PR93235]

The following testcase ICEs on aarch64. The problem is that
op0 is (subreg:HI (reg:HF ...) 0) and because we can't create a SUBREG of a
SUBREG and aarch64 doesn't have HImode insv, only SImode insv,
store_bit_field_using_insv tries to create (subreg:SI (reg:HF ...) 0)
which is not valid for the target and so gen_rtx_SUBREG ICEs.

The following patch fixes it by punting if the to be created SUBREG
doesn't validate, callers of store_bit_field_using_insv can handle
the fallback.

2021-03-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/93235
* expmed.c (store_bit_field_using_insv): Return false of xop0 is a
SUBREG and a SUBREG to op_mode can't be created.

* gcc.target/aarch64/pr93235.c: New test.

(cherry picked from commit 510ff5def87c70836fdbf832228661ae28e524b6)

c++: Fix -fstrong-eval-order for operator &&, || and , [PR82959]

P0145R3 added
"However, the operands are sequenced in the order prescribed for the built-in
operator" rule for overloaded operator calls when using the operator syntax.
op_is_ordered follows that, but added just the overloaded operators
added in that paper. &&, || and comma operators had rules that
lhs is sequenced before rhs already in C++98.
The following patch adds those cases to op_is_ordered.

2021-03-03 Jakub Jelinek <jakub@redhat.com>

PR c++/82959
* call.c (op_is_ordered): Handle TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR
and COMPOUND_EXPR.

* g++.dg/cpp1z/eval-order10.C: New test.

(cherry picked from commit 529e3b3402bd2a97b02318bd834df72815be5f0f)

c-family: Avoid ICE on va_arg [PR99324]

build_va_arg calls the middle-end mark_addressable, which e.g. requires that
cfun is non-NULL.  The following patch calls instead c_common_mark_addressable_vec
which is the c-family variant similarly to the FE c_mark_addressable and
cxx_mark_addressable, except that it doesn't error on addresses of register
variables.  As the taking of the address is artificial for the .VA_ARG
ifn and when that is lowered goes away, it is similar case to the vector
subscripting for which c_common_mark_addressable_vec has been added.

2021-03-03  Jakub Jelinek  <jakub@redhat.com>

PR c/99324
* c-common.c (build_va_arg): Call c_common_mark_addressable_vec
instead of mark_addressable.  Fix a comment typo -
neutrallly -> neutrally.

* gcc.c-torture/compile/pr99324.c: New test.

(cherry picked from commit 0e87dc86eb56f732a41af2590f0b807031003fbe)

c++: Fix operator() lookup in lambdas [PR95451]

During name lookup, name-lookup.c uses:
            if (!(!iter->type && HIDDEN_TYPE_BINDING_P (iter))
                && (bool (want & LOOK_want::HIDDEN_LAMBDA)
                    || !is_lambda_ignored_entity (iter->value))
                && qualify_lookup (iter->value, want))
              binding = iter->value;
Unfortunately as the following testcase shows, this doesn't work in
generic lambdas, where we on the auto b = ... lambda ICE and on the
auto d = lambda reject it even when it should be valid.  The problem
is that the binding doesn't have a FUNCTION_DECL with
LAMBDA_FUNCTION_P for the operator(), but an OVERLOAD with
TEMPLATE_DECL for such FUNCTION_DECL.

The following patch fixes that in is_lambda_ignored_entity, other
possibility would be to do that before calling is_lambda_ignored_entity
in name-lookup.c.

2021-02-26  Jakub Jelinek  <jakub@redhat.com>

PR c++/95451
* lambda.c (is_lambda_ignored_entity): Before checking for
LAMBDA_FUNCTION_P, use OVL_FIRST.  Drop FUNCTION_DECL check.

* g++.dg/cpp1y/lambda-generic-95451.C: New test.

(cherry picked from commit 8f9308936cf1df134d5aac1f890eb67266530ab5)

fold-const: Fix up ((1 << x) & y) != 0 folding for vectors [PR99225]

This optimization was written purely with scalar integers in mind,
can work fine even with vectors, but we can't use build_int_cst but
need to use build_one_cst instead.

2021-02-24 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99225
* fold-const.c (fold_binary_loc) <case NE_EXPR>: In (x & (1 << y)) != 0
to ((x >> y) & 1) != 0 simplifications use build_one_cst instead of
build_int_cst (..., 1). Formatting fixes.

* gcc.c-torture/compile/pr99225.c: New test.

(cherry picked from commit 4de402ab60c54fff48cb7371644b024d10d7e5bb)

fold-const: Fix ICE in fold_read_from_constant_string on invalid code [PR99204]

fold_read_from_constant_string and expand_expr_real_1 have code to optimize
constant reads from string (tree vs. rtl).
If the STRING_CST array type has zero low bound, index is fold converted to
sizetype and so the compare_tree_int works fine, but if it has some other
low bound, it calls size_diffop_loc and that function from 2 sizetype
operands creates a ssizetype difference. expand_expr_real_1 then uses
tree_fits_uhwi_p + compare_tree_int and so works fine, but fold-const.c
only checked if index is INTEGER_CST and calls compare_tree_int, which means
for negative index it will succeed and result in UB in the compiler.

2021-02-23 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/99204
* fold-const.c (fold_read_from_constant_string): Check that
tree_fits_uhwi_p (index) rather than just that index is INTEGER_CST.

* gfortran.dg/pr99204.f90: New test.

(cherry picked from commit f53a9b563b5017af179f1fd900189c0ba83aa2ec)

libstdc++: Fix up constexpr std::char_traits<char>::compare [PR99181]

Because of LWG 467, std::char_traits<char>::lt compares the values
cast to unsigned char rather than char, so even when char is signed
we get unsigned comparision.  std::char_traits<char>::compare uses
__builtin_memcmp and that works the same, but during constexpr evaluation
we were calling __gnu_cxx::char_traits<char_type>::compare.  As
char_traits::lt is not virtual, __gnu_cxx::char_traits<char_type>::compare
used __gnu_cxx::char_traits<char_type>::lt rather than
std::char_traits<char>::lt and thus compared chars as signed if char is
signed.
This change fixes it by inlining __gnu_cxx::char_traits<char_type>::compare
into std::char_traits<char>::compare by hand, so that it calls the right
lt method.

2021-02-23  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/99181
* include/bits/char_traits.h (char_traits<char>::compare): For
constexpr evaluation don't call
__gnu_cxx::char_traits<char_type>::compare but do the comparison loop
directly.

* testsuite/21_strings/char_traits/requirements/char/99181.cc: New
test.

(cherry picked from commit 311c57f6d8f285d69e44bf94152c753900cb1a0a)

tree-cfg: Fix up gimple_merge_blocks FORCED_LABEL handling [PR99034]

The verifiers require that DECL_NONLOCAL or EH_LANDING_PAD_NR
labels are always the first label if there is more than one label.

When merging blocks, we don't honor that though.
On the following testcase, we try to merge blocks:
<bb 13> [count: 0]:
<L2>:
S::~S (&s);

and
<bb 15> [count: 0]:
<L0>:
resx 1

where <L2> is landing pad and <L0> is FORCED_LABEL. And the code puts
the FORCED_LABEL before the landing pad label, violating the verification
requirements.

The following patch fixes it by moving the FORCED_LABEL after the
DECL_NONLOCAL or EH_LANDING_PAD_NR label if it is the first label.

2021-02-19 Jakub Jelinek <jakub@redhat.com>

PR ipa/99034
* tree-cfg.c (gimple_merge_blocks): If bb a starts with eh landing
pad or non-local label, put FORCED_LABELs from bb b after that label
rather than before it.

* g++.dg/opt/pr99034.C: New test.

(cherry picked from commit 33be24d77d3d8f0c992eb344ce63f78e14cf753d)

c: Fix ICE with -fexcess-precision=standard [PR99136]

The following testcase ICEs on i686-linux, because c_finish_return wraps
c_fully_folded retval back into EXCESS_PRECISION_EXPR, but when the function
return type is void, we don't call convert_for_assignment on it that would
then be fully folded again, but just put the retval into RETURN_EXPR's
operand, so nothing removes it anymore and during gimplification we
ICE as EXCESS_PRECISION_EXPR is not handled.

This patch fixes it by not adding that EXCESS_PRECISION_EXPR in functions
returning void, the return value is ignored and all we need is evaluate any
side-effects of the expression.

2021-02-18 Jakub Jelinek <jakub@redhat.com>

PR c/99136
* c-typeck.c (c_finish_return): Don't wrap retval into
EXCESS_PRECISION_EXPR in functions that return void.

* gcc.dg/pr99136.c: New test.

(cherry picked from commit 3d7ce7ce6c03165ca1041b38e02428c925254968)

c++: Fix up build_zero_init_1 once more [PR99106]

My earlier build_zero_init_1 patch for flexible array members created
an empty CONSTRUCTOR. As the following testcase shows, that doesn't work
very well because the middle-end doesn't expect CONSTRUCTOR elements with
incomplete type (that the empty CONSTRUCTOR at the end of outer CONSTRUCTOR
had).

The following patch just doesn't add any CONSTRUCTOR for the flexible array
members, it doesn't seem to be needed.

2021-02-17 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/99106
* init.c (build_zero_init_1): For flexible array members just return
NULL_TREE instead of returning empty CONSTRUCTOR with non-complete
ARRAY_TYPE.

* g++.dg/ubsan/pr99106.C: New test.

(cherry picked from commit af868e89ec21340d1cafd26eaed356ce4b0104c3)

match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079]

The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification
uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe
it doesn't check what it was meant to check.  On convert?@3
TREE_TYPE (@3) is not the type of what it has been converted from, but
what it has been converted to, which needs to be (because it is operand
of normal binary operation) equal or compatible to type of the modulo
result and first operand - type.
I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1))
and be done with it, but actually most of the non-nop conversions are IMHO
ok and so we would regress those optimizations.
In particular, if we have say narrowing conversions (foo5 and foo6 in
the new testcase), I think we are fine, either the shift of the power of two
constant after narrowing conversion is still that power of two (or negation
of that) and then it will still work, or the result of narrowing conversion
is 0 and then we would have UB which we can ignore.
Similarly, widening conversions where the shift result is unsigned are fine,
or even widening conversions where the shift result is signed, but we sign
extend to a signed wider divisor, the problematic case of INT_MIN will
become x % (long long) INT_MIN and we can still optimize that to
x & (long long) INT_MAX.
What doesn't work is the case in the pr99079.c testcase, widening conversion
of a signed shift result to wider unsigned divisor, where if the shift
is negative, we end up with x % (unsigned long long) INT_MIN which is
x % 0xffffffff80000000ULL where the divisor is not a power of two and
we can't optimize that to x & 0x7fffffffULL.

So, the patch rejects only the single problematic case.

Furthermore, when the shift result is signed, we were introducing UB into
a program which previously didn't have one (well, left shift into the sign
bit is UB in some language/version pairs, but it is definitely valid in
C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c
testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that
subtraction of 1, x % (1 << 31) in C++20 is well defined, but
x & ((1 << 31) - 1) triggers UB on the subtraction.
So, the patch performs the subtraction in the unsigned type if it isn't
wrapping.

2021-02-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/99079
* match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove
useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check.  Instead
require both type and TREE_TYPE (@1) to be integral types and either
type having smaller or equal precision, or TREE_TYPE (@1) being
unsigned type, or type being signed type.  If TREE_TYPE (@1)
doesn't have wrapping overflow, perform the subtraction of one in
unsigned type.

* gcc.dg/fold-modpow2-2.c: New test.
* gcc.c-torture/execute/pr99079.c: New test.

(cherry picked from commit 45de8afb2d534e3b38b4d1898686b20c29cc6a94)

c++: Fix zero initialization of flexible array members [PR99033]

array_type_nelts returns error_mark_node for type of flexible array members
and build_zero_init_1 was placing an error_mark_node into the CONSTRUCTOR,
on which e.g. varasm ICEs.  I think there is nothing erroneous on zero
initialization of flexible array members though, such arrays should simply
get no elements, like they do if such classes are constructed (everything
except when some larger initializer comes from an explicit initializer).

So, this patch handles [] arrays in zero initialization like [0] arrays
and fixes handling of the [0] arrays - the
tree_int_cst_equal (max_index, integer_minus_one_node) check
didn't do what it thought it would do, max_index is typically unsigned
integer (sizetype) and so it is never equal to a -1.

What the patch doesn't do and maybe would be desirable is if it returns
error_mark_node for other reasons let the recursive callers not stick that
into CONSTRUCTOR but return error_mark_node instead.  But I don't have a
testcase where that would be needed right now.

2021-02-11  Jakub Jelinek  <jakub@redhat.com>

PR c++/99033
* init.c (build_zero_init_1): Handle zero initialiation of
flexible array members like initialization of [0] arrays.
Use integer_minus_onep instead of comparison to integer_minus_one_node
and integer_zerop instead of comparison against size_zero_node.
Formatting fixes.

* g++.dg/ext/flexary38.C: New test.

(cherry picked from commit ea535f59b19f65e5b313c990ee6c194a7b055bd7)

varasm: Fix ICE with -fsyntax-only [PR99035]

My FE change from 2 years ago uses TREE_ASM_WRITTEN in -fsyntax-only
mode more aggressively to avoid "expanding" functions multiple times.
With -fsyntax-only nothing is really expanded, so I think it is acceptable
to adjust the assert and allow declare_weak at any time, with -fsyntax-only
we know it is during parsing only anyway.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR c++/99035
* varasm.c (declare_weak): For -fsyntax-only, allow even
TREE_ASM_WRITTEN function decls.

* g++.dg/ext/weak6.C: New test.

(cherry picked from commit a964f494cd5a90f631b8c0c01777a9899e0351ce)

openmp: Temporarily disable into_ssa when gimplifying OpenMP reduction clauses [PR99007]

gimplify_scan_omp_clauses was already calling gimplify_expr with false as
last argument to make sure it is not an SSA_NAME, but as the testcases show,
that is not enough, SSA_NAME temporaries created during that gimplification
can be reused too and we can't allow SSA_NAMEs to be used across OpenMP
region boundaries, as we can only firstprivatize decls.

Fixed by temporarily disabling into_ssa.

2021-02-10 Jakub Jelinek <jakub@redhat.com>

PR middle-end/99007
* gimplify.c (gimplify_scan_omp_clauses): For MEM_REF on reductions,
temporarily disable gimplify_ctxp->into_ssa around gimplify_expr
calls.

* g++.dg/gomp/pr99007.C: New test.
* gcc.dg/gomp/pr99007-1.c: New test.
* gcc.dg/gomp/pr99007-2.c: New test.
* gcc.dg/gomp/pr99007-3.c: New test.

(cherry picked from commit deba6b20a3889aa23f0e4b3a5248de4172a0167d)

c++: Fix ICE with structured binding initialized to incomplete array [PR97878]

We ICE on the following testcase, for incomplete array a on auto [b] { a }; without
giving any kind of diagnostics, with auto [c] = a; during error-recovery.
The problem is that we get too far through check_initializer and e.g.
store_init_value -> constexpr stuff can't deal with incomplete array types.

As the type of the structured binding artificial variable is always deduced,
I think it is easiest to diagnose this early, even if they have array types
we'll need their deduced type to be complete rather than just its element
type.

2021-02-05 Jakub Jelinek <jakub@redhat.com>

PR c++/97878
* decl.c (check_array_initializer): For structured bindings, require
the array type to be complete.

* g++.dg/cpp1z/decomp54.C: New test.

(cherry picked from commit 8b7f2d3eae16dd629ae7ae40bb76f4bb0099f441)

ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]

As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
some insns) at expand_binop or expand_unop and expects it to do something
(and then will check if it created valid insns and punts if not).
These functions in the end if the operands don't match try to
copy_to_mode_reg the operands, which does
if (!general_operand (x, VOIDmode))
  x = force_operand (x, temp);
but, force_operand is far from handling all possible RTLs, it will ICE for
all more unusual RTL codes.  Basically handles just simple arithmetic and
unary RTL operations if they have an optab and
expand_simple_binop/expand_simple_unop ICE on others.

The following patch fixes it by adding some operand verification (whether
there is a hope that copy_to_mode_reg will succeed on those).  It is added
both to noce_emit_move_insn (not needed for this exact testcase,
that function simply tries to recog the insn as is and if it fails,
handles some simple binop/unop cases; the patch performs the verification
of their operands) and noce_try_sign_mask.

2021-02-03  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97487
* ifcvt.c (noce_can_force_operand): New function.
(noce_emit_move_insn): Use it.
(noce_try_sign_mask): Likewise.  Formatting fix.

* gcc.dg/pr97487-1.c: New test.
* gcc.dg/pr97487-2.c: New test.

(cherry picked from commit 025a0ee3911c0866c69f841df24a558c7c8df0eb)

expand: Fix up find_bb_boundaries [PR98331]

When expansion emits some control flow insns etc. inside of a former GIMPLE
basic block, find_bb_boundaries needs to split it into multiple basic
blocks.
The code needs to ignore debug insns in decisions how many splits to do or
where in between some non-debug insns the split should be done, but it can
decide where to put debug insns if they can be kept and otherwise throws
them away (they can't stay outside of basic blocks).
On the following testcase, we end up in the bb from expander with
control flow insn
debug insns
barrier
some other insn
(the some other insn is effectively dead after __builtin_unreachable and
we'll optimize that out later).
Without debug insns, we'd do the split when encountering some other insn
and split after PREV_INSN (some other insn), i.e. after barrier (and the
splitting code then moves the barrier in between basic blocks).
But if there are debug insns, we actually split before the first debug insn
that appeared after the control flow insn, so after control flow insn,
and get a basic block that starts with debug insns and then has a barrier
in the middle that nothing moves it out of the bb. This leads to ICEs and
even if it wouldn't, different behavior from -g0.
The reason for treating debug insns that way is a different case, e.g.
control flow insn
debug insns
some other insn
or even
control flow insn
barrier
debug insns
some other insn
where splitting before the first such debug insn allows us to keep them
while otherwise we would have to drop them on the floor, and in those
situations we behave the same with -g0 and -g.

So, the following patch fixes it by resetting debug_insn not just when
splitting the blocks (it is set only after seeing a control flow insn and
before splitting for it if needed), but also when seeing a barrier,
which effectively means we always throw away debug insns after a control
flow insn and before following barrier if any, but there is no way around
that, control flow insn must be the last in the bb (BB_END) and BARRIER
after it, debug insns aren't allowed outside of bb.
We still handle the other cases fine (when there is no barrier or when
debug insns appear only after the barrier).

2021-01-29 Jakub Jelinek <jakub@redhat.com>

PR debug/98331
* cfgbuild.c (find_bb_boundaries): Reset debug_insn when seeing
a BARRIER.

* gcc.dg/pr98331.c: New test.

(cherry picked from commit ea0e1eaa30f42e108f6c716745347cc1dcfdc475)

c++: Fix up handling of register ... asm ("...") vars in templates [PR33661, PR98847]

As the testcase shows, for vars appearing in templates, we don't attach
the asm spec string to the pattern decls, nor pass it back to cp_finish_decl
during instantiation.

The following patch does that.

2021-01-28 Jakub Jelinek <jakub@redhat.com>

PR c++/33661
PR c++/98847
* decl.c (cp_finish_decl): For register vars with asmspec in templates
call set_user_assembler_name and set DECL_HARD_REGISTER.
* pt.c (tsubst_expr): When instantiating DECL_HARD_REGISTER vars,
pass asmspec_tree to cp_finish_decl.

* g++.dg/opt/pr98847.C: New test.

(cherry picked from commit cf93f94b3498f3925895fb0bbfd4b64232b9987a)

aarch64: Tighten up checks for ubfix [PR98681]

The testcase in the patch doesn't assemble, because the instruction requires
that the penultimate operand (lsb) range is [0, 32] (or [0, 64]) and the last
operand's range is [1, 32 - lsb] (or [1, 64 - lsb]).
The INTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) will accept the lsb operand
to be in range [MIN, 32] (or [MIN, 64]) and then we invoke UB in the
compiler and sometimes it will make it through.
The patch changes all the INTVAL uses in that function to UINTVAL,
which isn't strictly necessary, but can be done (e.g. after the
UINTVAL (shft_amnt) < GET_MODE_BITSIZE (mode) check we know it is not
negative and thus INTVAL (shft_amnt) and UINTVAL (shft_amnt) then behave the
same.  But, I had to add INTVAL (mask) > 0 check in that case, otherwise we
risk (hypothetically) emitting instruction that doesn't assemble.
The problem is with masks that have the MSB bit set, while the instruction
can handle those, e.g.
ubfiz w1, w0, 13, 19
will do
(w0 << 13) & 0xffffe000
in RTL we represent SImode constants with MSB set as negative HOST_WIDE_INT,
so it will actually be HOST_WIDE_INT_C (0xffffffffffffe000), and
the instruction uses %P3 to print the last operand, which calls
asm_fprintf (f, "%u", popcount_hwi (INTVAL (x)))
to print that.  But that will not print 19, but 51 instead, will include
there also all the copies of the sign bit.
Not supporting those masks with MSB set isn't a big loss though, they really
shouldn't appear normally, as both GIMPLE and RTL optimizations should
optimize those away (one isn't masking any bits off with such masks, so
just w0 << 13 will do too).

2021-01-26  Jakub Jelinek  <jakub@redhat.com>

PR target/98681
* config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p):
Use UINTVAL (shft_amnt) and UINTVAL (mask) instead of INTVAL (shft_amnt)
and INTVAL (mask).  Add && INTVAL (mask) > 0 condition.

* gcc.c-torture/execute/pr98681.c: New test.

(cherry picked from commit fb09d7242a25971b275292332337a56b86637f2c)

rs6000: Fix up __m64 typedef in mmintrin.h [PR97301]

The x86 __m64 type is defined as:
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
and so matches the comment above it in that reads and stores through
pointers to __m64 can alias anything.
But in the rs6000 headers that is the case only for __m128, but not __m64.

The following patch adds that attribute, which fixes the
FAIL: gcc.target/powerpc/sse-movhps-1.c execution test
FAIL: gcc.target/powerpc/sse-movlps-1.c execution test
regressions that appeared when Honza improved ipa-modref.

2021-01-23 Jakub Jelinek <jakub@redhat.com>

PR testsuite/97301
* config/rs6000/mmintrin.h (__m64): Add __may_alias__ attribute.

(cherry picked from commit db9a3ce7b83ce3ed3e0ffe7eb7a918595640e161)

c++: Fix up ubsan false positives on references [PR95693]

Alex' 2 years old change to build_zero_init_1 to return NULL pointer with
reference type for references breaks the sanitizers, the assignment of NULL
to a reference typed member is then instrumented before it is overwritten
with a non-NULL address later on.
That change has been done to fix error recovery ICE during
process_init_constructor_record, where we:
          if (TYPE_REF_P (fldtype))
            {
              if (complain & tf_error)
                error ("member %qD is uninitialized reference", field);
              else
                return PICFLAG_ERRONEOUS;
            }
a few lines earlier, but then continue and ICE when build_zero_init returns
NULL.

The following patch reverts the build_zero_init_1 change and instead creates
the NULL with reference type constants during the error recovery.

The pr84593.C testcase Alex' change was fixing still works as before.

2021-01-22  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/95693
* init.c (build_zero_init_1): Revert the 2018-03-06 change to
return build_zero_cst for reference types.
* typeck2.c (process_init_constructor_record): Instead call
build_zero_cst here during error recovery instead of build_zero_init.

* g++.dg/ubsan/pr95693.C: New test.

(cherry picked from commit e5750f847158e7f9bdab770fd9c5fff58c5074d3)

match.pd: Replace incorrect simplifications into copysign [PR90248]

In the PR Andrew said he has implemented a simplification that has been
added to LLVM, but that actually is not true, what is in there are
X * (X cmp 0.0 ? +-1.0 : -+1.0) simplifications into +-abs(X)
but what has been added into GCC are (X cmp 0.0 ? +-1.0 : -+1.0)
simplifications into copysign(1, +-X) and then
X * copysign (1, +-X) into +-abs (X).
The problem is with the (X cmp 0.0 ? +-1.0 : -+1.0) simplifications,
they don't work correctly when X is zero.
E.g.
(X > 0.0 ? 1.0 : -1.0)
is -1.0 when X is either -0.0 or 0.0, but copysign will make it return
1.0 for 0.0 and -1.0 only for -0.0.
(X >= 0.0 ? 1.0 : -1.0)
is 1.0 when X is either -0.0 or 0.0, but copysign will make it return
still 1.0 for 0.0 and -1.0 for -0.0.
The simplifications were guarded on !HONOR_SIGNED_ZEROS, but as discussed in
the PR, that option doesn't mean that -0.0 will not ever appear as operand
of some operation, it is hard to guarantee that without compiler adding
canonicalizations of -0.0 to 0.0 after most of the operations and thus
making it very slow, but that the user asserts that he doesn't care if the result
of operations will be 0.0 or -0.0.  Not to mention that some of the
transformations are incorrect even for positive 0.0.

So, instead of those simplifications this patch recognizes patterns where
those ?: expressions are multiplied by X, directly into +-abs.
That works fine even for 0.0 and -0.0 (as long as we don't care about
whether the result is exactly 0.0 or -0.0 in those cases), because
whether the result of copysign is -1.0 or 1.0 doesn't matter when it is
multiplied by 0.0 or -0.0.

As a follow-up, maybe we should add the simplification mentioned in the PR,
in particular doing copysign by hand through
VIEW_CONVERT_EXPR <int, float_X> < 0 ? -float_constant : float_constant
into copysign (float_constant, float_X).  But I think that would need to be
done in phiopt.

2021-01-22  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/90248
* match.pd (X cmp 0.0 ? 1.0 : -1.0 -> copysign(1, +-X),
X cmp 0.0 ? -1.0 : +1.0 -> copysign(1, -+X)): Remove
simplifications.
(X * (X cmp 0.0 ? 1.0 : -1.0) -> +-abs(X),
X * (X cmp 0.0 ? -1.0 : 1.0) -> +-abs(X)): New simplifications.

* gcc.dg/tree-ssa/copy-sign-1.c: Don't expect any copysign
builtins.
* gcc.dg/pr90248.c: New test.

(cherry picked from commit dd92986ea6d2d363146e1726817a84910453fdc8)

tree-cfg: Allow enum types as result of POINTER_DIFF_EXPR [PR98556]

As conversions between signed integers and signed enums with the same
precision are useless in GIMPLE, it seems strange that we require that
POINTER_DIFF_EXPR result must be INTEGER_TYPE.

If we really wanted to require that, we'd need to change the gimplifier
to ensure that, which it isn't the case on the following testcase.
What is going on during the gimplification is that when we have the
(enum T) (p - q) cast, it is stripped through
      /* Strip away as many useless type conversions as possible
         at the toplevel.  */
      STRIP_USELESS_TYPE_CONVERSION (*expr_p);
and when the MODIFY_EXPR is gimplified, the *to_p has enum T type,
while *from_p has intptr_t type and as there is no conversion in between,
we just create GIMPLE_ASSIGN from that.

2021-01-09  Jakub Jelinek  <jakub@redhat.com>

PR c++/98556
* tree-cfg.c (verify_gimple_assign_binary): Allow lhs of
POINTER_DIFF_EXPR to be any integral type.

* c-c++-common/pr98556.c: New test.

(cherry picked from commit 0188eab844eacda5edc6257771edb771844ae069)

wide-int: Fix wi::to_mpz [PR98474]

The following testcase is miscompiled, because niter analysis miscomputes
the number of iterations to 0.
The problem is that niter analysis uses mpz_t (wonder why, wouldn't
widest_int do the same job?) and when wi::to_mpz is called e.g. on the
TYPE_MAX_VALUE of __uint128_t, it initializes the mpz_t result with wrong
value.
wi::to_mpz has code to handle negative wide_ints in signed types by
inverting all bits, importing to mpz and complementing it, which is fine,
but doesn't handle correctly the case when the wide_int's len (times
HOST_BITS_PER_WIDE_INT) is smaller than precision when wi::neg_p.
E.g. the 0xffffffffffffffffffffffffffffffff TYPE_MAX_VALUE is represented
in wide_int as 0xffffffffffffffff len 1, and wi::to_mpz would create
0xffffffffffffffff mpz_t value from that.
This patch handles it by adding the needed -1 host wide int words (and has
also code to deal with precision that aren't multiple of
HOST_BITS_PER_WIDE_INT).

2020-12-31 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/98474
* wide-int.cc (wi::to_mpz): If wide_int has MSB set, but type
is unsigned and excess negative, append set bits after len until
precision.

* gcc.c-torture/execute/pr98474.c: New test.

(cherry picked from commit a4d191d08c6acb24034af4182b3524e6ef97546c)

gimplify: Gimplify value in gimplify_init_ctor_eval_range [PR98353]

gimplify_init_ctor_eval_range wasn't gimplifying value, so if it wasn't
a gimple val, verification at the end of gimplification would ICE (or with
release checking some random pass later on would ICE or misbehave).

2020-12-21 Jakub Jelinek <jakub@redhat.com>

PR c++/98353
* gimplify.c (gimplify_init_ctor_eval_range): Gimplify value before
storing it into cref.

* g++.dg/opt/pr98353.C: New test.

(cherry picked from commit f3113a85f098df8165624321cc85d20219fb2ada)

openmp: Don't optimize shared to firstprivate on task with depend clause

The attached testcase is miscompiled, because we optimize shared clauses
to firstprivate when task body can't modify the variable even when the
task has depend clause. That is wrong, because firstprivate means the
variable will be copied immediately when the task is created, while with
depend clause some other task might change it later before the dependencies
are satisfied and the task should observe the value only after the change.

2020-12-18 Jakub Jelinek <jakub@redhat.com>

* gimplify.c (struct gimplify_omp_ctx): Add has_depend member.
(gimplify_scan_omp_clauses): Set it to true if OMP_CLAUSE_DEPEND
appears on OMP_TASK.
(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Force
GOVD_WRITTEN on shared variables if task construct has depend clause.

* testsuite/libgomp.c/task-6.c: New test.

(cherry picked from commit 99ddd36e800a24fcb744a14ff9adabb0a7ef72c8)

openmp, openacc: Fix up handling of data regions [PR98183]

While the data regions (target data and OpenACC counterparts) aren't
standalone directives, unlike most other OpenMP/OpenACC constructs
we allow (apparently as an extension) exceptions and goto out of
the block. During gimplification we place an *end* call into a finally
block so that it is reached even on exceptions or goto out etc.).
During omplower pass we then add paired #pragma omp return for them,
but due to the exceptions because the region is not SESE we can end up
with #pragma omp return appearing only conditionally in the CFG etc.,
which the ompexp pass can't handle.
For the ompexp pass, we actually don't care about the end part or about
target data nesting, so we can treat it as standalone directive.

2020-12-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/98183
* omp-low.c (lower_omp_target): Don't add OMP_RETURN for
data regions.
* omp-expand.c (expand_omp_target): Don't try to remove
OMP_RETURN for data regions.
(build_omp_regions_1, omp_make_gimple_edges): Don't expect
OMP_RETURN for data regions.

* gcc.dg/gomp/pr98183.c: New test.
* gcc.dg/goacc/pr98183.c: New test.

(cherry picked from commit 8c1ed7223ad1bc19ed9c936ba496220c8ef673bc)

openmp: Fix ICE with broken doacross loop [PR98205]

If the loop body doesn't ever continue, we don't have a bb to insert the
updates. Fixed by not adding them at all in that case.

2020-12-10 Jakub Jelinek <jakub@redhat.com>

PR middle-end/98205
* omp-expand.c (expand_omp_for_generic): Fix up broken_loop handling.

* c-c++-common/gomp/doacross-4.c: New test.

(cherry picked from commit c925d4cebf817905c237aa2d93887f254b4a74f4)