git.ipfire.org Git - thirdparty/gcc.git/log

Add new test [PR78969].

gcc/testsuite/ChangeLog:
PR tree-optimization/78969
* gcc.dg/tree-ssa/builtin-snprintf-warn-6.c: New test.

configure: Account CXXFLAGS in gcc-plugin.m4.

We now use a C++ compiler so that we need to process
CXXFLAGS as well as CFLAGS in the gcc-plugin config
fragment.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
config/ChangeLog:

* gcc-plugin.m4: Save and process CXXFLAGS.

gcc/ChangeLog:

* configure: Regenerate.

libcc1/ChangeLog:

* configure: Regenerate.

nvptx: Add -misa=sm_75 and -misa=sm_80

Add new target macros TARGET_SM75 and TARGET_SM80. Add support for
__builtin_tanhf, HFmode exp2/tanh and also for HFmode min/max, controlled by
TARGET_SM75 and TARGET_SM80 respectively.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.

gcc/ChangeLog:

* config/nvptx/nvptx-opts.h (ptx_isa): PTX_ISA_SM75 and PTX_ISA_SM80
ISA levels.
* config/nvptx/nvptx.opt: Add sm_75 and sm_80 to -misa.
* config/nvptx/nvptx.h (TARGET_SM75, TARGET_SM80):
New helper macros to conditionalize functionality on target ISA.
* config/nvptx/nvptx-c.c (nvptx_cpu_cpp_builtins): Add __PTX_SM__
support for the new ISA levels.
* config/nvptx/nvptx.c (nvptx_file_start): Add support for TARGET_SM75
and TARGET_SM80.
* config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
(define_mode_iterator HSFM): New iterator for HFmode and SFmode.
(exp2hf2): New define_insn controlled by TARGET_SM75.
(tanh<mode>2): New define_insn controlled by TARGET_SM75.
(sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.

gcc/testsuite/ChangeLog:

* gcc.target/nvptx/float16-2.c: New test case.
* gcc.target/nvptx/tanh-1.c: New test case.

[nvptx] Add -mptx=7.0

Add support for ptx isa version 7.0, required for the addition of -misa=sm_75
and -misa=sm_80.

Tested by setting the default ptx isa version to 7.0, and doing a build and
libgomp test run.

gcc/ChangeLog:

* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_7_0.
* config/nvptx/nvptx.c (nvptx_file_start): Handle TARGET_PTX_7_0.
* config/nvptx/nvptx.h (TARGET_PTX_7_0): New macro.
* config/nvptx/nvptx.opt (ptx_version): Add 7.0.

aarch64: Don't classify vector pairs as short vectors [PR103094]

In this PR we were wrongly classifying a pair of 8-byte vectors
as a 16-byte “short vector” (in the AAPCS64 sense). As the
comment in the patch says, this stems from an old condition
in aarch64_short_vector_p that is too loose, but that would
be difficult to tighten now.

We can still do the right thing for the newly-added modes though,
since there are no backwards compatibility concerns there.

Co-authored-by: Tamar Christina <tamar.christina@arm.com>
gcc/
PR target/103094
* config/aarch64/aarch64.c (aarch64_short_vector_p): Return false
for structure modes, rather than ignoring the type in that case.

gcc/testsuite/
PR target/103094
* gcc.target/aarch64/pr103094.c: New test.

c++: Fix warning word splitting [PR103713]

PR c++/103713

gcc/cp/ChangeLog:

* tree.c (maybe_warn_parm_abi): Fix warning word splitting.

middle-end: REE should always check all vector usages, even if it finds a defining def. [PR103350]

This and the report in PR103632 are caused by a bug in REE where it generates
incorrect code.

It's trying to eliminate the following zero extension

(insn 54 90 102 2 (set (reg:V4SI 33 v1)
        (zero_extend:V4SI (reg/v:V4HI 40 v8)))
     (nil))

by folding it in the definition of `v8`:

(insn 2 5 104 2 (set (reg/v:V4HI 40 v8)
        (reg:V4HI 32 v0 [156]))
     (nil))

which is fine, except that `v8` is also used by the extracts, e.g.:

(insn 11 10 12 2 (set (reg:SI 1 x1)
        (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
                (parallel [
                        (const_int 3)
                    ]))))
     (nil))

REE replaces insn 2 by folding insn 54 and placing it at the definition site of
insn 2, so before insn 11.

Trying to eliminate extension:
(insn 54 90 102 2 (set (reg:V4SI 33 v1)
        (zero_extend:V4SI (reg/v:V4HI 40 v8)))
     (nil))
Tentatively merged extension with definition (copy needed):
(insn 2 5 104 2 (set (reg:V4SI 33 v1)
        (zero_extend:V4SI (reg:V4HI 32 v0)))
     (nil))

to produce

(insn 2 5 110 2 (set (reg:V4SI 33 v1)
        (zero_extend:V4SI (reg:V4HI 32 v0)))
     (nil))
(insn 110 2 104 2 (set (reg:V4SI 40 v8)
        (reg:V4SI 33 v1))
     (nil))

The new insn 2 using v0 directly is correct, but the insn 110 it creates is
wrong, `v8` should still be V4HI.

or it also needs to eliminate the zero extension from the extracts, so instead
of

(insn 11 10 12 2 (set (reg:SI 1 x1)
        (zero_extend:SI (vec_select:HI (reg/v:V4HI 40 v8)
                (parallel [
                        (const_int 3)
                    ]))))
     (nil))

it should be

(insn 11 10 12 2 (set (reg:SI 1 x1)
        (vec_select:SI (reg/v:V4SI 40 v8)
                (parallel [
                        (const_int 3)
                    ])))
     (nil))

without doing so the indices have been remapped in the extension and so we
extract the wrong elements

At any other optimization level but -Os ree seems to abort so this doesn't
trigger:

Trying to eliminate extension:
(insn 54 90 101 2 (set (reg:V4SI 32 v0)
        (zero_extend:V4SI (reg/v:V4HI 40 v8)))
     (nil))
Elimination opportunities = 2 realized = 0

purely due to the ordering of instructions. REE doesn't check uses of `v8`
because it assumes that with a zero extended value, you still have access to the
lower bits by using the the bottom part of the register.

This is true for scalar but not for vector.  This would have been fine as well
if REE had eliminated the zero_extend on insn 11 and the rest but it doesn't do
so since REE can only handle cases where the SRC value are REG_P.

It does try to do this in add_removable_extension:

1160      /* For vector mode extensions, ensure that all uses of the
1161         XEXP (src, 0) register are in insn or debug insns, as unlike
1162         integral extensions lowpart subreg of the sign/zero extended
1163         register are not equal to the original register, so we have
1164         to change all uses or none and the current code isn't able
1165         to change them all at once in one transaction.  */

However this code doesn't trigger for the example because REE doesn't check the
uses if the defining instruction doesn't feed into another extension..

Which is bogus. For vectors it should always check all usages.

r12-2288-g8695bf78dad1a42636775843ca832a2f4dba4da3 simply exposed this as it now
lowers VEC_SELECT 0 into the RTL canonical form subreg 0 which causes REE to run
more often.

gcc/ChangeLog:

PR rtl-optimization/103350
* ree.c (add_removable_extension): Don't stop at first definition but
inspect all.

gcc/testsuite/ChangeLog:

PR rtl-optimization/103350
* gcc.target/aarch64/pr103350-1.c: New test.
* gcc.target/aarch64/pr103350-2.c: New test.

testsuite: Fix up cpp23/auto-fncast11.C testcase [PR103408]

This test fails:
+FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b  (test for errors, line 19)
+FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b (test for excess errors)
because the regex in dg-error was missing an indefinite article.

2021-12-15  Jakub Jelinek  <jakub@redhat.com>

PR c++/103408
* g++.dg/cpp23/auto-fncast11.C: Fix expected diagnostic wording.

dwarf2cfi: Improve cfa_reg comparisons [PR103619]

On Tue, Dec 14, 2021 at 10:32:21AM -0700, Jeff Law wrote:
> I think the attached testcase should trigger on c6x with -mbig-endian -O2 -g

Thanks.  Finally I see what's going on.  c6x doesn't really need the CFA
with span > 1 (and I bet neither does armbe), the only reason why
dwf_cfa_reg is called is that the code in 13 cases just tries to compare
the CFA against dwf_cfa_reg (some_reg).  And that dwf_cfa_reg on some reg
that usually isn't a CFA reg results in targetm.dwarf_register_span hook
call, which on targets like c6x or armeb and others for some registers
creates a PARALLEL with various REGs in it, then the loop with the assertion
and finally operator== which just notes that the reg is different and fails.

This seems compile time memory and time inefficient.

The following so far untested patch instead adds an extra operator== and !=
for comparison of cfa_reg with rtx, which has the most common case where it
is a different register number done early without actually invoking
dwf_cfa_reg.  This means the assertion in dwf_cfa_reg can stay as is (at
least until some big endian target needs to have hard frame pointer or stack
pointer with span > 1 as well).
I've removed a different assertion there because it is redundant - dwf_regno
already has exactly that assertion in it too.

And I've included those 2 tweaks to avoid creating a REG in GC memory when
we can use {stack,hard_frame}_pointer_rtx which is already initialized to
the same REG we need by init_emit_regs.

On Tue, Dec 14, 2021 at 03:05:37PM -0700, Jeff Law wrote:
> So if someone is unfamiliar with the underlying issues here and needs to
> twiddle dwarf2cfi, how are they supposed to know if they should compare
> directly or use dwf_cfa_reg?

Comparison without dwf_cfa_reg should be used whenever possible, because
for registers which are never CFA related that won't call
targetm.dwarf_register_span uselessly.

The only comparisons with dwf_cfa_reg I've kept are the:
            regno = dwf_cfa_reg (XEXP (XEXP (dest, 0), 0));

            if (cur_cfa->reg == regno)
              offset -= cur_cfa->offset;
            else if (cur_trace->cfa_store.reg == regno)
              offset -= cur_trace->cfa_store.offset;
            else
              {
                gcc_assert (cur_trace->cfa_temp.reg == regno);
                offset -= cur_trace->cfa_temp.offset;
              }
and
            struct cfa_reg regno = dwf_cfa_reg (XEXP (dest, 0));

            if (cur_cfa->reg == regno)
              offset = -cur_cfa->offset;
            else if (cur_trace->cfa_store.reg == regno)
              offset = -cur_trace->cfa_store.offset;
            else
              {
                gcc_assert (cur_trace->cfa_temp.reg == regno);
                offset = -cur_trace->cfa_temp.offset;
              }
and there are 2 reasons for it:
1) there is an assertion, which guarantees it must compare equal to one of
those 3 cfa related struct cfa_reg structs, so it must be some CFA related
register (so, right now, targetm.dwarf_register_span shouldn't return
non-NULL in those on anything but gcn)
2) it is compared 3 times in a row, so for the GCN case doing
            if (cur_cfa->reg == XEXP (XEXP (dest, 0), 0))
              offset -= cur_cfa->offset;
            else if (cur_trace->cfa_store.reg == XEXP (XEXP (dest, 0), 0))
              offset -= cur_trace->cfa_store.offset;
            else
              {
                gcc_assert (cur_trace->cfa_temp.reg == XEXP (XEXP (dest, 0), 0));
                offset -= cur_trace->cfa_temp.offset;
              }
could actually create more GC allocated garbage than the way it is written
now.  But doing it that way would work fine.

I think for most of the comparisons even comparing with dwf_cfa_reg would
work but be less compile time/memory efficient (e.g. those assertions that
it is equal to some CFA related cfa_reg or in any spots where only the CFA
related regs may appear in the frame related patterns).

I'm aware just of a single spot where comparison with dwf_cfa_reg doesn't
work (when the assert is in dwf_cfa_reg), that is the spot that was ICEing
on your testcase, where we save arbitrary call saved register:
      if (REG_P (src)
          && REGNO (src) != STACK_POINTER_REGNUM
          && REGNO (src) != HARD_FRAME_POINTER_REGNUM
          && cur_cfa->reg == src)

2021-12-15  Jakub Jelinek  <jakub@redhat.com>

PR debug/103619
* dwarf2cfi.c (dwf_cfa_reg): Remove gcc_assert.
(operator==, operator!=): New overloaded operators.
(dwarf2out_frame_debug_adjust_cfa, dwarf2out_frame_debug_cfa_offset,
dwarf2out_frame_debug_expr): Compare vars with cfa_reg type directly
with REG rtxes rather than with dwf_cfa_reg results on those REGs.
(create_cie_data): Use stack_pointer_rtx instead of
gen_rtx_REG (Pmode, STACK_POINTER_REGNUM).
(execute_dwarf2_frame): Use hard_frame_pointer_rtx instead of
gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM).

i386: Fix emissing of __builtin_cpu_supports.

PR target/103661

gcc/ChangeLog:

* config/i386/i386-builtins.c (fold_builtin_cpu): Compare to 0
as API expects that non-zero values are returned (do that
it mask == 31).
For "avx512vbmi2" argument, we return now 1 << 31, which is a
negative integer value.

openmp: Avoid calling operand_equal_p on OMP_CLAUSEs [PR103704]

On OMP_CLAUSEs we reuse TREE_TYPE as CP_OMP_CLAUSE_INFO in the C++ FE.
This confuses the hashing code that operand_equal_p does when checking.
There is really no reason to compare OMP_CLAUSEs against expressions
like captured this, they will never compare equal.

2021-12-15 Jakub Jelinek <jakub@redhat.com>

PR c++/103704
* semantics.c (finish_omp_target_clauses_r): For OMP_CLAUSEs
just walk subtrees.

* g++.dg/gomp/pr103704.C: New test.

libstdc++: Poor man's case insensitive comparisons in time_get [PR71557]

This patch uses the same not completely correct case insensitive comparisons
as used elsewhere in the same header. Proper comparisons that would handle
even multi-byte characters would be harder, but I don't see them implemented
in __ctype's methods.

2021-12-15 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/71557
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format):
Compare characters other than format specifiers and whitespace
case insensitively.
(_M_extract_name): Compare characters case insensitively.
* testsuite/22_locale/time_get/get/char/71557.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/71557.cc: New test.

Add combine splitter to transform vashr/vlshr/vashl_optab to ashr/lshr/ashl_optab for const vector duplicate operand.

gcc/ChangeLog:

PR target/101796
* config/i386/predicates.md (const_vector_operand):
Add new predicate.
* config/i386/sse.md(<insn><mode>3<mask_name>):
Add new define_split below.

gcc/testsuite/ChangeLog:

PR target/101796
* gcc.target/i386/pr101796-1.c: New test.

Generate XXSPLTIDP for scalars on power10.

This patch implements XXSPLTIDP support for SF, and DF scalar constants.
The previous patch added support for vector constants. This patch adds
the support for SFmode and DFmode scalar constants.

I added 2 new tests to test loading up SF and DF scalar constants.

2021-12-15 Michael Meissner <meissner@the-meissners.org>

gcc/

* config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
(UNSPEC_XXSPLTIW_CONST): New unspec.
(movsf_hardfloat): Add support for generating XXSPLTIDP.
(mov<mode>_hardfloat32): Likewise.
(mov<mode>_hardfloat64): Likewise.
(xxspltidp_<mode>_internal): New insns.
(xxspltiw_<mode>_internal): New insns.
(splitters for SF/DFmode): Add new splitters for XXSPLTIDP.

gcc/testsuite/

* gcc.target/powerpc/vec-splat-constant-df.c: New test.
* gcc.target/powerpc/vec-splat-constant-sf.c: New test.

Generate XXSPLTIDP for vectors on power10.

This patch implements XXSPLTIDP support for all vector constants.  The
XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector
of two DFmode constants.  The immediate is in SFmode format, so only constants
that fit as SFmode values can be loaded with XXSPLTIDP.

The constraint (eP) added in the previous patch for XXSPLTIW is also used
for XXSPLTIDP.

DImode scalar constants are not handled.  This is due to the majority of DImode
constants will be in the GPR registers.  With vector registers, you have the
problem that XXSPLTIDP splats the double word into both elements of the
vector.  However, if TImode is loaded with an integer constant, it wants a full
128-bit constant.

SFmode and DFmode scalar constants are not handled in this patch.  The
support for for those constants will be in the next patch.

I have added a temporary switch (-msplat-float-constant) to control whether or
not the XXSPLTIDP instruction is generated.

I added 2 new tests to test loading up V2DI and V2DF vector constants.

2021-12-14  Michael Meissner  <meissner@the-meissners.org>

gcc/

* config/rs6000/predicates.md (easy_fp_constant): Add support for
generating XXSPLTIDP.
(vsx_prefixed_constant): Likewise.
(easy_vector_constant): Likewise.
* config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
New declaration.
* config/rs6000/rs6000.c (output_vec_const_move): Add support for
generating XXSPLTIDP.
(prefixed_xxsplti_p): Likewise.
(constant_generates_xxspltidp): New function.
* config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.

gcc/testsuite/

* gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
regex for power10.
* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
* gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Generate XXSPLTIW on power10.

This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
instruction for V8HImode, V4SImode, and V4SFmode vectors. It does this by
adding support for vector constants that can be used, and adding a
VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.

Add the eP constraint to recognize constants that can be loaded into
vector registers with a single prefixed instruction such as xxspltiw and
xxspltidp.

I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
constants.

2021-12-14 Michael Meissner <meissner@linux.ibm.com>

gcc/

* config/rs6000/constraints.md (eP): Update comment.
* config/rs6000/predicates.md (easy_fp_constant): Add support for
generating XXSPLTIW.
(vsx_prefixed_constant): New predicate.
(easy_vector_constant): Add support for
generating XXSPLTIW.
* config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
declaration.
(constant_generates_xxspltiw): Likewise.
* config/rs6000/rs6000.c (xxspltib_constant_p): Generate XXSPLTIW
if possible instead of XXSPLTIB and sign extending the constant.
(output_vec_const_move): Add support for XXSPLTIW.
(prefixed_xxsplti_p): New function.
(constant_generates_xxspltiw): New function.
* config/rs6000/rs6000.md (prefixed attribute): Add support to
mark XXSPLTI* instructions as being prefixed.
* config/rs6000/rs6000.opt (-msplat-word-constant): New debug
switch.
* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
generating XXSPLTIW or XXSPLTIDP.
(vsx_mov<mode>_32bit): Likewise.
* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
eP constraint.

gcc/testsuite/

* gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
* gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
* gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
* gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
* gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Add LXVKQ support.

This patch adds support to generate the LXVKQ instruction to load specific
IEEE-128 floating point constants.

Compared to the last time I submitted this patch, I modified it so that it
uses the bit pattern of the vector to see if it can generate the LXVKQ
instruction.  This means on a little endian Power<xxx> system, the
following code will generate a LXVKQ 34,16 instruction:

    vector long long foo (void)
    {
      return (vector long long) { 0x0000000000000000, 0x8000000000000000 };
    }

because that vector pattern is the same bit pattern as -0.0F128.

2021-12-14  Michael Meissner  <meissner@the-meissners.org>

gcc/

* config/rs6000/constraints.md (eQ): New constraint.
* config/rs6000/predicates.md (easy_fp_constant): Add support for
generating the LXVKQ instruction.
(easy_vector_constant_ieee128): New predicate.
(easy_vector_constant): Add support for generating the LXVKQ
instruction.
* config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New
declaration.
* config/rs6000/rs6000.c (output_vec_const_move): Add support for
generating LXVKQ.
(constant_generates_lxvkq): New function.
* config/rs6000/rs6000.opt (-mieee128-constant): New debug
option.
* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
generating LXVKQ.
(vsx_mov<mode>_32bit): Likewise.
* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
eQ constraint.

gcc/testsuite/

* gcc.target/powerpc/float128-constant.c: New test.

Add new constant data structure.

This patch provides the data structure and function to convert a
CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
an array of bytes, half-words, words, and double words that can be loaded
into a 128-bit vector register.

The next patches will use this data structure to generate code that
generates load of the vector/floating point registers using the XXSPLTIDP,
XXSPLTIW, and LXVKQ instructions that were added in power10.

2021-12-15 Michael Meissner <meissner@the-meissners.org>

gcc/

* config/rs6000/rs6000-protos.h (VECTOR_128BIT_BITS): New macro.
(VECTOR_128BIT_BYTES): Likewise.
(VECTOR_128BIT_HALF_WORDS): Likewise.
(VECTOR_128BIT_WORDS): Likewise.
(VECTOR_128BIT_DOUBLE_WORDS): Likewise.
(vec_const_128bit_type): New structure type.
(vec_const_128bit_to_bytes): New declaration.
* config/rs6000/rs6000.c (constant_int_to_128bit_vector): New
helper function.
(constant_fp_to_128bit_vector): New helper function.
(vec_const_128bit_to_bytes): New function.

[PR100518] store by mult pieces: keep addr in Pmode

The conversion of a MEM address to ptr_mode in
try_store_by_multiple_pieces was misguided: copy_addr_to_reg expects
Pmode for addresses.

for gcc/ChangeLog

PR target/100518
* builtins.c (try_store_by_multiple_pieces): Drop address
conversion to ptr_mode.

for gcc/testsuite/ChangeLog

PR target/100518
* gcc.target/aarch64/pr100518.c: New.

[PR100843] store by mult pieces: punt on max_len < min_len

The testcase confuses the code that detects min and max len for the
memset, so max_len ends up less than min_len.  That shouldn't be
possible, but the testcase requires us to handle this case.

The store-by-mult-pieces algorithm actually relies on min and max
lengths, so if we find them to be inconsistent, the best we can do is
punting.

for  gcc/ChangeLog

PR middle-end/100843
* builtins.c (try_store_by_multiple_pieces): Fail if min_len
is greater than max_len.

for  gcc/testsuite/ChangeLog

PR middle-end/100843
* gcc.dg/pr100843.c: New.

Daily bump.

Fix ICE. [PR103682]

Check is_gimple_assign before gimple_assign_rhs_code.

gcc/ChangeLog:

PR target/103682
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check
is_gimple_assign before gimple_assign_rhs_code.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr103682.c: New test.

libstdc++: Support old and new T_FMT for en_HK locale [PR103687]

This checks whether the locale data for en_HK includes %p and adjusts
the string being tested accordingly. To account for Jakub's fix to make
%I parse "12" as 0 instead of 12, we need to change the expected value
for the case where the locale format doesn't include %p. Also change the
time from 12:00:00 to 12:02:01 so we can tell if the minutes and seconds
get mixed up.

libstdc++-v3/ChangeLog:

PR libstdc++/103687
* testsuite/22_locale/time_get/get_date/wchar_t/4.cc: Restore
original locale before returning.
* testsuite/22_locale/time_get/get_time/char/2.cc: Check for %p
in locale's T_FMT and adjust accordingly.
* testsuite/22_locale/time_get/get_time/wchar_t/2.cc: Likewise.

[PATCH] stddef.h: add support for musl typedef macro guards

The stddef.h header checks/sets various hardcoded toolchain/os specific
macro guards to prevent redefining types such as ptrdiff_t, wchar_t, or
size_t. However, without this patch, the file does not check/set the
typedef macro guards for musl libc. This causes types such as size_t to
be defined twice for files which include both musl's stdlib.h as well as
GCC's ginclude/stddef.h. This is, for example, the case for
libgo/sysinfo.c. If libgo/sysinfo.c has multiple typedefs for size_t
this confuses -fdump-go-spec and causes size_t not to be included in the
generated type definitions thereby causing a gcc-go compilation failure
on Alpine Linux Edge (which uses musl libc) with the following error:

sysinfo.go:7765:13: error: use of undefined type '_size_t'
7765 | type Size_t _size_t
      |             ^
libcall_posix.go:49:35: error: non-integer len argument in make
   49 |                 b := make([]byte, len)
      |

This commit fixes this issue by ensuring that ptrdiff_t, wchar_t, and size_t
are only defined once in the pre-processed libgo/sysinfo.c file by enhancing
gcc/ginclude/stddef.h with musl-specific typedef macro guards.

gcc/ChangeLog:

* ginclude/stddef.h (__DEFINED_ptrdiff_t): Add support for musl
libc typedef macro guard.
(__DEFINED_size_t): Ditto.
(__DEFINED_wchar_t): Ditto.

regrename: Skip renaming if instruction is noop move.

gcc/
* regrename.c (find_rename_reg): Return satisfied regno
if instruction is noop move.

libstdc++: Fix handling of invalid ranges in std::regex [PR102447]

std::regex currently allows invalid bracket ranges such as [\w-a] which
are only allowed by ECMAScript when in web browser compatibility mode.
It should be an error, because the start of the range is a character
class, not a single character. The current implementation of
_Compiler::_M_expression_term does not provide a way to reject this,
because we only remember a previous character, not whether we just
processed a character class (or collating symbol etc.)

This patch replaces the pair<bool, CharT> used to emulate
optional<CharT> with a custom class closer to pair<tribool,CharT>. That
allows us to track three states, so that we can tell when we've just
seen a character class.

With this additional state the code in _M_expression_term for processing
the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
case, without regressing for valid cases such as [\w-] and [----].

libstdc++-v3/ChangeLog:

PR libstdc++/102447
* include/bits/regex_compiler.h (_Compiler::_BracketState): New
class.
(_Compiler::_BrackeyMatcher): New alias template.
(_Compiler::_M_expression_term): Change pair<bool, CharT>
parameter to _BracketState. Process first character for
ECMAScript syntax as well as POSIX.
* include/bits/regex_compiler.tcc
(_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
(_Compiler::_M_expression_term): Use _BracketState to store
state between calls. Improve handling of dashes in ranges.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
Add more tests for ranges containing dashes. Check invalid
ranges with character class at the beginning.

libstdc++: Simplify typedefs by using __UINTPTR_TYPE__

libstdc++-v3/ChangeLog:

* include/ext/pointer.h (_Relative_pointer_impl::_UIntPtrType):
Rename to uintptr_t and define as __UINTPTR_TYPE__.

libstdc++: Simplify definition of std::regex_constants variables

This removes the __syntax_option and __match_flag enumeration types,
which are only used to define enumerators with successive values that
are then used to initialize the std::regex_constants global variables.

By defining enumerators in the syntax_option_type and match_flag_type
enumeration types with the correct values for the globals we get rid of
two useless enumeration types that just count from 0 to N, and we
improve the debugging experience. Because the enumeration types now have
enumerators defined, GDB will print values in terms of those enumerators
e.g.

$6 = (std::regex_constants::_S_ECMAScript | std::regex_constants::_S_multiline)

Previously this would have been shown as simply 0x810 because there were
no enumerators of that type.

This changes the type and value of enumerators such as _S_grep, but
users should never be referring to them directly anyway.

libstdc++-v3/ChangeLog:

* include/bits/regex_constants.h (__syntax_option, __match_flag):
Remove.
(syntax_option_type, match_flag_type): Define enumerators.
Use to initialize globals. Add constexpr to compound assignment
operators.
* include/bits/regex_error.h (error_type): Add comment.
* testsuite/28_regex/constants/constexpr.cc: Remove comment.
* testsuite/28_regex/constants/error_type.cc: Improve comment.
* testsuite/28_regex/constants/match_flag_type.cc: Check bitmask
requirements.
* testsuite/28_regex/constants/syntax_option_type.cc: Likewise.

rs6000: Rename arrays to remove temporary _x suffix

While we had two sets of built-in infrastructure at once, I added _x as a
suffix to two arrays to disambiguate the old and new versions.  Time to fix
that also.

2021-12-06  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(altivec_resolve_overloaded_builtin): Likewise.  Also rename
rs6000_builtin_info_x to rs6000_builtin_info.
* config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
rs6000_builtin_info_x to rs6000_builtin_info.
(rs6000_builtin_is_supported): Likewise.
(rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
rs6000_builtin_info.
(cpu_expand_builtin): Likewise.
(rs6000_expand_builtin): Likewise.
(rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
to rs6000_builtin_decls.
(rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
rs6000_builtin_decls.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In generated code,
rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_bif_static_init): In generated code, rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_bif_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_ovld_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(write_init_file): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
Likewise.
(rs6000_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_reciprocal): Likewise.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.

rs6000: Rename functions with "new" in their names

While we had two sets of built-in functionality at the same time, I put "new"
in the names of quite a few functions.  Time to undo that.

2021-12-02  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
Remove forward declaration.
(rs6000_new_builtin_type_compatible): Rename to
rs6000_builtin_type_compatible.
(rs6000_builtin_type_compatible): Remove.
(altivec_resolve_overloaded_builtin): Remove.
(altivec_build_new_resolved_builtin): Rename to
altivec_build_resolved_builtin.
(altivec_resolve_new_overloaded_builtin): Rename to
altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
called function names.
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
forward declaration.
(rs6000_gimple_fold_new_builtin): Likewise.
(rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
(rs6000_gimple_fold_builtin): Remove.
(rs6000_new_builtin_valid_without_lhs): Rename to
rs6000_builtin_valid_without_lhs.
(rs6000_new_builtin_is_supported): Rename to
rs6000_builtin_is_supported.
(rs6000_gimple_fold_new_mma_builtin): Rename to
rs6000_gimple_fold_mma_builtin.
(rs6000_gimple_fold_new_builtin): Rename to
rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
function names.
(rs6000_expand_builtin): Remove.
(new_cpu_expand_builtin): Rename to cpu_expand_builtin.
(new_mma_expand_builtin): Rename to mma_expand_builtin.
(new_htm_spr_num): Rename to htm_spr_num.
(new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
of called function.
(rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
static keyword.  Adjust called function names.
(rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
static keyword.
(rs6000_builtin_decl): Remove.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
* config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
to rs6000_invalid_builtin.
* config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
Rename to rs6000_builtin_vectorized_function.
(rs6000_new_builtin_md_vectorized_function): Rename to
rs6000_builtin_md_vectorized_function.
(rs6000_builtin_vectorized_function): Remove.
(rs6000_builtin_md_vectorized_function): Remove.

rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-02 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin.def: Delete.
* config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
(builtin_description): Delete.
(builtin_hash_struct): Delete.
(builtin_hasher): Delete.
(builtin_hash_table): Delete.
(builtin_hasher::hash): Delete.
(builtin_hasher::equal): Delete.
(rs6000_builtin_info_type): Delete.
(rs6000_builtin_info): Delete.
(bdesc_compat): Delete.
(bdesc_3arg): Delete.
(bdesc_4arg): Delete.
(bdesc_dst): Delete.
(bdesc_2arg): Delete.
(bdesc_altivec_preds): Delete.
(bdesc_abs): Delete.
(bdesc_1arg): Delete.
(bdesc_0arg): Delete.
(bdesc_htm): Delete.
(bdesc_mma): Delete.
(rs6000_overloaded_builtin_p): Delete.
(rs6000_overloaded_builtin_name): Delete.
(htm_spr_num): Delete.
(rs6000_builtin_is_supported_p): Delete.
(rs6000_gimple_fold_mma_builtin): Delete.
(gt-rs6000-call.h): Remove include directive.
* config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
(rs6000_builtin_is_supported_p): Delete.
(rs6000_overloaded_builtin_name): Delete.
* config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
(rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
* config/rs6000/rs6000.h (rs6000_builtins): Delete.
(altivec_builtin_types): Delete.
(rs6000_builtin_decls): Delete.
* config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.

rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-02 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-builtin-new.def: Rename to...
* config/rs6000/rs6000-builtins.def: ...this.
* config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
rs6000-builtin-new.def to rs6000-builtins.def.
(rs6000-builtins.c): Likewise.

rs6000: Remove altivec_overloaded_builtins array and initialization

2021-12-06 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Remove.
* config/rs6000/rs6000.h (altivec_overloaded_builtins): Remove.

rs6000: Do not allow combining of multiple assemble quads [PR103548]

The compiler will gladly CSE the result of two __builtin_mma_build_acc
calls with the same four vector arguments, leading to illegal MMA
code being generated.  The fix here is to make the mma_assemble_acc
pattern use a unspec_volatile to stop the CSE from happening.

2021-12-14  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/103548
* config/rs6000/mma.md (UNSPEC_MMA_ASSEMBLE): Rename unspec from this...
(UNSPEC_VSX_ASSEMBLE): ...to this.
(UNSPECV_MMA_ASSEMBLE): New unspecv.
(vsx_assemble_pair): Use UNSPEC_VSX_ASSEMBLE.
(*vsx_assemble_pair): Likewise.
(mma_assemble_acc): Use UNSPECV_MMA_ASSEMBLE.
(*mma_assemble_acc): Likewise.
* config/rs6000/rs6000.c (rs6000_split_multireg_move): Handle
UNSPEC_VOLATILE.  Use UNSPEC_VSX_ASSEMBLE and UNSPECV_MMA_ASSEMBLE.

gcc/testsuite/
PR target/103548
* gcc.target/powerpc/mma-builtin-10-pair.c: New test.
* gcc.target/powerpc/mma-builtin-10-quad.c: New test.

Fortran: prevent NULL pointer dereference in check of passed do-loop variable

gcc/fortran/ChangeLog:

PR fortran/103717
* frontend-passes.c (doloop_code): Prevent NULL pointer
dereference when checking for passing a do-loop variable to a
contained procedure with an interface mismatch.

gcc/testsuite/ChangeLog:

PR fortran/103717
* gfortran.dg/do_check_19.f90: New test.

Fortran: prevent NULL pointer dereferences checking do-loop contained stuff

gcc/fortran/ChangeLog:

PR fortran/103718
PR fortran/103719
* frontend-passes.c (doloop_contained_procedure_code): Add several
checks to prevent NULL pointer dereferences on valid and invalid
code called within do-loops.

gcc/testsuite/ChangeLog:

PR fortran/103718
PR fortran/103719
* gfortran.dg/do_check_18.f90: New test.

i386: Implement VxHF vector set/insert/extract with lower ABI levels

This is a preparation patch that moves VxHF vector set/insert/extract
expansions from AVX512FP16 ABI to lower ABIs. There are no functional
changes for -mavx512fp16 and a follow-up patch is needed to actually
enable VxHF vector modes for lower ABIs.

2021-12-14 Uroš Bizjak <ubizjak@gmail.com>

gcc/ChangeLog:

PR target/103571
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate)
<case E_V8HFmode>: Implement for TARGET_SSE2.
<case E_V16HFmode>: Implement for TARGET_AVX.
<case E_V32HFmode>: Implement for TARGET_AVX512F.
(ix86_expand_vector_set_var): Handle V32HFmode
without TARGET_AVX512BW.
(ix86_expand_vector_extract)
<case E_V8HFmode>: Implement for TARGET_SSE2.
<case E_V16HFmode>: Implement for TARGET_AVX.
<case E_V32HFmode>: Implement for TARGET_AVX512BW.
(expand_vec_perm_broadcast_1) <case E_V8HFmode>: New.
* config/i386/sse.md (VI12HF_AVX512VL): Remove
TARGET_AVX512FP16 condition.
(V): Ditto.
(V_256_512): Ditto.
(avx_vbroadcastf128_<mode>): Use V_256H mode iterator.

rs6000: Remove new_builtins_are_live and dead code it was guarding

To allow for a sane switch-over from the old built-in infrastructure to the
new, both sets of code have co-existed, with the enabled one under the control
of the boolean variable new_builtins_are_live. As a first step in removing the
old code, remove this variable and the now-dead code it was guarding.

2021-12-06 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/darwin.h (SUBTARGET_INIT_BUILTINS): Remove
test for new_builtins_are_live and simplify.
* config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Remove
dead function.
(altivec_resolve_overloaded_builtin): Remove test for
new_builtins_are_live and simplify.
* config/rs6000/rs6000-call.c (altivec_init_builtins): Remove forward
declaration.
(builtin_function_type): Likewise.
(rs6000_common_init_builtins): Likewise.
(htm_init_builtins): Likewise.
(mma_init_builtins): Likewise.
(def_builtin): Remove dead function.
(rs6000_expand_zeroop_builtin): Likewise.
(rs6000_expand_mtfsf_builtin): Likewise.
(rs6000_expand_mtfsb_builtin): Likewise.
(rs6000_expand_set_fpscr_rn_builtin): Likewise.
(rs6000_expand_set_fpscr_drn_builtin): Likewise.
(rs6000_expand_unop_builtin): Likewise.
(altivec_expand_abs_builtin): Likewise.
(rs6000_expand_binop_builtin): Likewise.
(altivec_expand_lxvr_builtin): Likewise.
(altivec_expand_lv_builtin): Likewise.
(altivec_expand_stxvl_builtin): Likewise.
(altivec_expand_stv_builtin): Likewise.
(mma_expand_builtin): Likewise.
(htm_expand_builtin): Likewise.
(cpu_expand_builtin): Likewise.
(rs6000_expand_quaternop_builtin): Likewise.
(rs6000_expand_ternop_builtin): Likewise.
(altivec_expand_dst_builtin): Likewise.
(altivec_expand_vec_sel_builtin): Likewise.
(altivec_expand_builtin): Likewise.
(rs6000_invalid_builtin): Likewise.
(rs6000_builtin_valid_without_lhs): Likewise.
(rs6000_gimple_fold_builtin): Remove test for new_builtins_are_live and
simplify.
(rs6000_expand_builtin): Likewise.
(rs6000_init_builtins): Remove tests for new_builtins_are_live and
simplify.
(rs6000_builtin_decl): Likewise.
(altivec_init_builtins): Remove dead function.
(mma_init_builtins): Likewise.
(htm_init_builtins): Likewise.
(builtin_quaternary_function_type): Likewise.
(builtin_function_type): Likewise.
(rs6000_common_init_builtins): Likewise.
* config/rs6000/rs6000-gen-builtins.c (write_header_file): Don't
declare new_builtins_are_live.
(write_init_bif_table): In generated code, remove test for
new_builtins_are_live and simplify.
(write_init_ovld_table): Likewise.
(write_init_file): Don't initialize new_builtins_are_live.
* config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove
test for new_builtins_are_live and simplify.
(rs6000_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_reciprocal): Likewise.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.

rs6000: Builtins for doubleword compare should be in [power8-vector] (PR103625)

2021-12-13 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
PR target/103625
* config/rs6000/rs6000-builtin-new.def (__builtin_altivec_vcmpequd):
Move to power8-vector stanza.
(__builtin_altivec_vcmpequd_p): Likewise.
(__builtin_altivec_vcmpgtsd): Likewise.
(__builtin_altivec_vcmpgtsd_p): Likewise.
(__builtin_altivec_vcmpgtud): Likewise.
(__builtin_altivec_vcmpgtud_p): Likewise.

rs6000: Some builtins require IBM-128 long double format (PR103623)

2021-12-14 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
PR target/103623
* config/rs6000/rs6000-builtin-new.def (__builtin_pack_longdouble): Add
ibmld attribute.
(__builtin_unpack_longdouble): Likewise.
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Add special
handling for ibmld attribute.
* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add isibmld.
(parse_bif_attrs): Handle ibmld.
(write_decls): Likewise.
(write_bif_static_init): Likewise.

Add support for global rvalue initialization and constructors

This patch adds support for initialization of global variables
with rvalues and creating constructors for array, struct and
union types which can be used as rvalues.

Signed-off-by:
2021-12-14 Petter Tomner <tomner@kth.se>

gcc/jit/
* jit-common.h: New enum
* jit-playback.c : Folding an setting intitial
(global_new_decl) : Handle const global generation
(new_global) : New flag
(global_set_init_rvalue) : New
(new_ctor) : New
(new_global_initialized) : Flag
(as_truth_value) : Fold
(new_unary_op) : Fold
(new_binary_op) : Fold
(new_comparison) : Fold
(new_array_access) : Fold
(new_dereference) : Fold
(get_address) : Fold
* jit-playback.h :
(global_set_init_rvalue) : New
(new_ctor) : New
* jit-recording.c :
* jit-recording.h :
(new_global_init_rvalue) : New
(new_ctor) : New
(ctor) : New, inherits rvalue
(global_init_rvalue) : New, inherits memento
(type::is_union) : New
* libgccjit++.h : New entrypoints, see C-header
* libgccjit.c : See .h
* libgccjit.h : New entrypoints
(gcc_jit_context_new_array_constructor) : New
(gcc_jit_context_new_struct_constructor) : New
(gcc_jit_context_new_union_constructor) : New
(gcc_jit_global_set_initializer_rvalue) : New
(LIBGCCJIT_HAVE_CTORS) : New feuture macro
* libgccjit.map : New entrypoints added to ABI 19
* docs/topics/expressions.rst : Updated docs

gcc/testsuite/
* jit.dg/all-non-failing-tests.h: Added two tests
* jit.dg/test-error-ctor-array-wrong-obj.c: New
* jit.dg/test-error-ctor-struct-too-big.c: New
* jit.dg/test-error-ctor-struct-wrong-field-obj.c: New
* jit.dg/test-error-ctor-struct-wrong-type.c: New
* jit.dg/test-error-ctor-struct-wrong-type2.c
* jit.dg/test-error-ctor-union-wrong-field-name.c: New
* jit.dg/test-error-global-already-init.c: New
* jit.dg/test-error-global-common-section.c: New
* jit.dg/test-error-global-init-too-small-array.c: New
* jit.dg/test-error-global-lvalue-init.c: New
* jit.dg/test-error-global-nonconst-init.c: New
* jit.dg/test-global-init-rvalue.c: New
* jit.dg/test-local-init-rvalue.c: New

Fortran: PACK intrinsic should not try to read from zero-sized array

libgfortran/ChangeLog:

PR libfortran/103634
* intrinsics/pack_generic.c (pack_internal): Handle case when the
array argument of PACK has one or more extents of size zero to
avoid invalid reads.

gcc/testsuite/ChangeLog:

PR libfortran/103634
* gfortran.dg/intrinsic_pack_6.f90: New test.

Determine global memory accesses in ipa-modref

As discussed in PR103585, fatigue2 is now only benchmark from my usual testing
set (SPEC2k6, SPEC2k17, CPP benchmarks, polyhedron, Firefox, clang) which sees
important regression when inlining functions called once is limited.  This
prevents us from solving runtime issues in roms benchmarks and elsewhere.

The problem is that there is perdida function that takes many arguments and
some of them are array descriptors.  We constant propagate most of their fields
but still keep their initialization. Because perdida is quite fast, the call
overhead dominates, since we need over 100 memory stores consuing about 35%
of the overall benchmark runtime.

The memory stores would be eliminated if perdida did not call fortran I/O which
makes modref to thin that the array descriptors could be accessed. We are
quite close discovering that they can't becuase they are non-escaping from
function.  This patch makes modref to distingush between global memory access
(only things that escapes) and unkonwn accesss (that may access also
nonescaping things reaching the function).  This makes disambiguation for
functions containing error handling better.

Unfortunately the patch hits two semi-latent issues in Fortran frontned.
First is wrong code in gfortran.dg/unlimited_polymorphic_3.f03. This can be
turned into wrong code testcase on both mainline and gcc11 if the runtime
call is removed, so I filled PR 103662 for it. There is TBAA mismatch for
structure produced in FE.

Second is issue with GOMP where Fortran marks certain parameters as non-escaping
and then makes them escape via GOMP_parallel.  For this I disabled the use of
escape info in verify_arg which also disables the useful transform on perdida
but still does useful work for e.g. GCC error handling.  I will work on this
incrementally.

Bootstrapped/regtested x86_64-linux, lto-bootstrapped and also tested with
clang build.  I plan to commit this tomorrow if there are no complains
(the patch is not completely short but conceptualy simple and handles a lot
of common cases).

gcc/ChangeLog:

2021-12-12  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103585
* ipa-modref-tree.c (modref_access_node::range_info_useful_p): Handle
MODREF_GLOBAL_MEMORY_PARM.
(modref_access_node::dump): Likewise.
(modref_access_node::get_call_arg): Likewise.
* ipa-modref-tree.h (enum modref_special_parms): Add
MODREF_GLOBAL_MEMORY_PARM.
(modref_access_node::useful_for_kill): Handle
MODREF_GLOBAL_MEMORY_PARM.
(modref:tree::merge): Add promote_unknown_to_global.
* ipa-modref.c (verify_arg):New function.
(may_access_nonescaping_parm_p): New function.
(modref_access_analysis::record_global_memory_load): New member
function.
(modref_access_analysis::record_global_memory_store): Likewise.
(modref_access_analysis::process_fnspec): Distingush global and local
memory.
(modref_access_analysis::analyze_call): Likewise.
* tree-ssa-alias.c (ref_may_access_global_memory_p): New function.
(modref_may_conflict): Use it.

gcc/testsuite/ChangeLog:

2021-12-12  Jan Hubicka  <hubicka@ucw.cz>

* gcc.dg/analyzer/data-model-1.c: Disable ipa-modref.
* gcc.dg/uninit-38.c: Likewise.
* gcc.dg/uninit-pr98578.c: Liewise.

testsuite: Silence conversion warnings for MIN1 and MAX1

gcc/testsuite/ChangeLog:

PR fortran/91497
* gfortran.dg/pr91497.f90: Adjust test to use
dg-require-effective-target directive.
* gfortran.dg/pr91497_2.f90: New test to cover all targets.
Cover MAX1 and MIN1 intrinsics.

fortran: Silence conversion warnings for MIN1 and MAX1

gcc/fortran/ChangeLog:

PR fortran/91497
* simplify.c (simplify_min_max): Disable conversion warnings for
MIN1 and MAX1.

[PR99531] Do not scan push insn for ia32 in the test

The patch prohibits scanning push insn for ia32 as push are expected not to be generated only for x86_64 Linux ABI.

gcc/testsuite/ChangeLog:

PR target/99531
* gcc.target/i386/pr99531.c: Do not scan for ia32.

MAINTAINERS: Add myself to write after approval

Changelog:

* MAINTAINERS: Add myself to write after approval.

aarch64: Add LS64 extension and intrinsics

This patch is adding support for LS64 (Armv8.7-A Load/Store 64 Byte extension)
which is part of Armv8.7-A architecture. Changes include missing plumbing for
TARGET_LS64, LS64 data structure and intrinsics defined in ACLE. Machine
description of intrinsics is using new V8DI mode added in a separate patch.
__ARM_FEATURE_LS64 is defined if the Armv8.7-A LS64 instructions for atomic
64-byte access to device memory are supported.

New compiler internal type is added wrapping ACLE struct data512_t:

typedef struct {
uint64_t val[8];
} __arm_data512_t;

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Define AARCH64_LS64_BUILTIN_LD64B, AARCH64_LS64_BUILTIN_ST64B,
AARCH64_LS64_BUILTIN_ST64BV, AARCH64_LS64_BUILTIN_ST64BV0.
(aarch64_init_ls64_builtin_decl): Helper function.
(aarch64_init_ls64_builtins): Helper function.
(aarch64_init_ls64_builtins_types): Helper function.
(aarch64_general_init_builtins): Init LS64 intrisics for
TARGET_LS64.
(aarch64_expand_builtin_ls64): LS64 intrinsics expander.
(aarch64_general_expand_builtin): Handle aarch64_expand_builtin_ls64.
(ls64_builtins_data): New helper struct.
(v8di_UP): New define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_LS64.
* config/aarch64/aarch64.c (aarch64_classify_address): Enforce the
V8DI range (7-bit signed scaled) for both ends of the range.
* config/aarch64/aarch64-simd.md (movv8di): New pattern.
(aarch64_movv8di): New pattern.
* config/aarch64/aarch64.h (AARCH64_ISA_LS64): New define.
(TARGET_LS64): New define.
* config/aarch64/aarch64.md: Add UNSPEC_LD64B, UNSPEC_ST64B,
UNSPEC_ST64BV and UNSPEC_ST64BV0.
(ld64b): New define_insn.
(st64b): New define_insn.
(st64bv): New define_insn.
(st64bv0): New define_insn.
* config/aarch64/arm_acle.h (data512_t): New type derived from
__arm_data512_t.
(__arm_data512_t): New internal type.
(__arm_ld64b): New intrinsic.
(__arm_st64b): New intrinsic.
(__arm_st64bv): New intrinsic.
(__arm_st64bv0): New intrinsic.
* config/arm/types.md: Add new type ls64.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/ls64_asm.c: New test.
* gcc.target/aarch64/acle/ls64_ld64b.c: New test.
* gcc.target/aarch64/acle/ls64_ld64b-2.c: New test.
* gcc.target/aarch64/acle/ls64_ld64b-3.c: New test.
* gcc.target/aarch64/acle/ls64_st64b.c: New test.
* gcc.target/aarch64/acle/ls64_ld_st_o0.c: New test.
* gcc.target/aarch64/acle/ls64_st64b-2.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv-2.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv-3.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv0.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv0-2.c: New test.
* gcc.target/aarch64/acle/ls64_st64bv0-3.c: New test.
* gcc.target/aarch64/pragma_cpp_predefs_2.c: Add checks
for __ARM_FEATURE_LS64.

testsuite: fix ASAN errors

The tests failed on my machine as they contain out-of-bounds
access.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-psraq-1.c: Use ARRAY_SIZE.
* gcc.target/i386/m128-check.h: Move it to the top-level
context.
* gcc.target/i386/sse2-psraq-1.c: Use ARRAY_SIZE.
* gcc.target/i386/sse4_2-check.h: Include the header with
ARRAY_SIZE definition.

libstdc++: Fix non-reserved name in <regex> header

libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.tcc (_Compiler::_M_match_token):
Use reserved name for parameter.
* testsuite/17_intro/names.cc: Check "token".

c++: processing_template_decl vs template depth [PR103408]

We use processing_template_decl in two slightly different ways: as a
flag to signal that we're dealing with templated trees, and as a measure
of the current syntactic template nesting depth.  This overloaded
meaning of p_t_d is conceptually confusing and leads to bugs that we end
up working around in an ad-hoc fashion.

This patch replaces all uses of processing_template_decl that care about
its magnitude to instead look at the depth of current_template_parms
via a new macro current_template_depth.  This allows us to eliminate 3
workarounds in the concepts code: two about non-templated
requires-expressions (in constraint.cc) and one about lambdas inside
constraints (in cp_parser_requires_clause_expression etc).  This also
fixes the testcase in PR103408 about auto(x) used inside a non-templated
requires-expression.

The replacement was mostly mechanical, aside from two issues:

  * In synthesize_implicit_template_parm, when introducing a new template
    parameter list for an abbreviated function template, we need to add
    the new level of current_template_parms sooner, before calling
    process_template_parm, since this latter function now looks at
    current_template_depth to determine the level of the new parameter.

  * In instantiate_class_template_1 after substituting a template
    friend declaration, we currently increment processing_template_decl
    around the call to make_friend_class so that the friend_depth
    computation within this subroutine yields a nonzero value.  We could
    just replace this with an equivalent manipulation of
    current_template_depth, but this patch instead rewrites the
    friend_depth calculation within make_friend_class to not depend on
    p_t_d / c_t_d at all when called from instantiate_class_template_1.

PR c++/103408

gcc/cp/ChangeLog:

* constraint.cc (type_deducible_p): Remove workaround for
non-templated requires-expressions.
(normalize_placeholder_type_constraints): Likewise.
* cp-tree.h (current_template_depth): Define.
(PROCESSING_REAL_TEMPLATE_DECL): Inspect current_template_depth
instead of the magnitude of processing_template_decl.
* decl.c (start_decl): Likewise.
(grokfndecl): Likewise.
(grokvardecl): Likewise.
(grokdeclarator): Likewise.
* friend.c (make_friend_class): Likewise.  Calculate
friend_depth differently when called at instantiation time
instead of parse time.
(do_friend): Likewise.
* parser.c (cp_parser_requires_clause_expression): Remove
workaround for lambdas inside constraints.
(cp_parser_constraint_expression): Likewise.
(cp_parser_requires_expression): Likewise.
(synthesize_implicit_template_parm): Add to current_template_parms
before calling process_template_parm.
* pt.c (inline_needs_template_parms): Inspect
current_template_depth instead of the magnitude of
processing_template_decl.
(push_inline_template_parms_recursive): Likewise.
(maybe_begin_member_template_processing): Likewise.
(begin_template_parm_list): Likewise.
(process_template_parm): Likewise.
(end_template_parm_list): Likewise.
(push_template_decl): Likewise.
(add_inherited_template_parms): Likewise.
(instantiate_class_template_1): Don't adjust
processing_template_decl around the call to make_friend_class.
adjust_processing_template_decl to adjust_template_depth.  Set
current_template_parms instead of processing_template_decl when
adjust_template_depth.
(make_auto_1): Inspect current_template_depth instead of the
magnitude of processing_template_decl.
(splice_late_return_type): Likewise.
* semantics.c (fixup_template_type): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic18.C: Expect a "constraints on a
non-templated function" error.
* g++.dg/cpp23/auto-fncast11.C: New test.

Remove fpic multilib on x86_64-vxworks

The addition of fPIC for shared libraries is performed
independently from multilibs and fpic multilibs have
no other particular purpose for VxWorks at this stage.

They incur extra build time, complexify the install tree
and are a bit tricky because -fpic is not supported for kernel
mode.

2021-12-14 Olivier Hainque <hainque@adacore.com>

gcc/
* config/i386/t-vxworks: Drop the fPIC multilibs.

c++: don't leak 'arglist' in build_new_op

gcc/cp/ChangeLog:

* call.c (build_new_op): Use releasing_vec for arglist. Declare
conv in the scope it's used.

c++: remove COMPOUND_EXPR_OVERLOADED flag

This flag is never set because non-dependent COMPOUND_EXPRs that resolve
to an overload are expressed as a CALL_EXPR at template definition time
(in build_x_compound_expr) ever since r6-5772.

gcc/cp/ChangeLog:

* cp-tree.h (COMPOUND_EXPR_OVERLOADED): Remove.
* pt.c (build_non_dependent_expr): Don't inspect the flag.
* tree.c (build_min_non_dep): Don't set the flag.

Drop the fpic multilib for powerpc*-vxworks*

The addition of fPIC for shared libraries is performed
independently from multilibs and the fpic multilibs have
no other particular purpose. They incur extra build time,
complexify the install tree and are a bit tricky because
-fpic is not supported for kernel mode.

2020-11-06 Fred Konrad <konrad@adacore.com>

gcc/
* config/rs6000/t-vxworks: Drop the fPIC multilib.

c: Fix ICE on deferred pragma in unknown attribute arguments [PR103587]

We ICE on the following testcase, because c_parser_balanced_token_sequence
when encountering a deferred pragma will just use c_parser_consume_token
which the FE doesn't allow for CPP_PRAGMA tokens (and if that wasn't
the case, it could ICE on CPP_PRAGMA_EOL similarly).
We don't know in what exact context the pragma appears when we don't
know what those arguments semantically mean, so I think we should just
skip over them, like e.g. the C++ FE does. And, I think (/[/{ vs. )/]/}
from outside of the pragma shouldn't be paired with those inside of
the pragma and it doesn't seem to be necessary to check that inside of
the pragma line itself all the paren kinds are balanced.

2021-12-14 Jakub Jelinek <jakub@redhat.com>

PR c/103587
* c-parser.c (c_parser_balanced_token_sequence): For CPP_PRAGMA,
consume the pragma and silently skip to the pragma eol.

* gcc.dg/pr103587.c: New test.

Adjust 'gfortran.dg/goacc/privatization-1-*' [PR103576, PR103697]

... for the recent commit 494ebfa7c9aacaeb6ec1fccc47a0e49f31eb2bb8
"Fortran: Handle compare in OpenMP atomic", which changes the GIMPLE IR
such that a temporary is no longer used; 'original' dump:

             x = *a;
    -        {
    -          integer(kind=4) D.4237;
    -
    -          D.4237 = *a;
               #pragma omp atomic relaxed
    -            &y = D.4237;
    -        }
    +          &y = *a;
           }

(I'm not familiar to comment whether that's correct; but it appears that the
difference again disappears in later compiler passes.)

These OpenACC test cases verify behavior re OpenACC privatization levels, and
have to be adjusted accordingly.

gcc/testsuite/
PR fortran/103576
PR testsuite/103697
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.

Daily bump.

[PR99531] Modify pseudo class cost calculation when processing move involving the pseudo and a hard register

Pseudo class calculated on the 1st iteration should not have a
special treatment in cost calculation when processing move involving
the pseudo and a hard register.

gcc/ChangeLog:

PR target/99531
* ira-costs.c (record_operand_costs): Do not take pseudo class
calculated on the 1st iteration into account when processing move
involving the pseudo and a hard register.

gcc/testsuite/ChangeLog:

PR target/99531
* gcc.target/i386/pr99531.c: New test.

x86: Avoid generating orb $0, %ah

I'll post my proposed fix for PR target/103611 shortly, but this patch
fixes another missed optimization opportunity revealed by that PR.
Occasionally, reload materializes integer constants during register
allocation sometimes resulting in unnecessary instructions such as:

(insn 23 31 24 2 (parallel [
            (set (reg:SI 0 ax [99])
                (ior:SI (reg:SI 0 ax [99])
                    (const_int 0 [0])))
            (clobber (reg:CC 17 flags))
        ]) "pr103611.c":18:73 550 {*iorsi_1}
     (nil))

These then get "optimized" during the split2 pass, which realizes that
no bits outside of 0xff00 are set, so this operation can be implemented
by operating on just the highpart of a QIreg_operand, i.e. %ah, %bh, %ch
etc., which leads to the useless "orb $0, %ah" seen in the reported PR.

This fix catches the case of const0_rtx in relevant splitter, either
eliminating the instruction or turning it into a simple move.

2021-12-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (define_split any_or:SWI248 -> orb %?h):
Optimize the case where the integer constant operand is zero.

gcc/testsuite/ChangeLog
* gcc.target/i386/pr103611-1.c: New test case.

Rework VXWORKS_LINK_SPEC for shared objects support

Split LINK_SPEC as BASE_LINK_SPEC + EXTRA_LINK_SPEC,
with an overridable LINK_OS component that cpu ports may
redefine.

Leverage the latter on powerpc for VxWorks 7, where we incorporate
our specific bits in the linux os configuration as the system compiler
is now very close to a standard linux one.

The split allows supporting shared objects (shared libs and
non-static rtps) on recent versions of VxWorks while retaining
compatibility with older VxWorks targets which could link with
shared libraries but not build them.

2021-12-07 Doug Rupp <rupp@adacore.com>
Olivier Hainque <hainque@adacore.com>

gcc/
* config/vxworks.h (VXWORKS_LINK_OS_SPEC): New spec.
(VXWORKS_BASE_LINK_SPEC): New spec, using the former.
(VXWORKS_EXTRA_LINK_SPEC): New spec for old and new VxWorks.
(VXWORKS_LINK_SPEC): Combo of BASE and EXTRA specs.
* config/rs6000/vxworks.h (VXWORKS_LINK_OS_SPEC): Empty.
(LINK_OS_EXTRA_SPEC32): Use VXWORKS_LINK_SPEC.
(LINK_OS_EXTRA_SPEC64): Likewise.

Remove ppc*-vxworks7* inadequate libgcc Makefile fragments

t-linux assigns .so version numbers to a set of
symbols, some of which aren't included the VxWorks libgcc
on powerpc (from ibm-ldouble.c, in particular).

t-slibgcc-libgcc yields a kind of .so file that the default
loader can't handle. This sort of extension to tmake_file for
shared libs will be better handled in a grouped fashion for
all targets anyway.

2021-12-13 Olivier Hainque <hainque@adacore.com>

* config.host (powerpc*-*-vxworks7*): Remove
rs6000/t-linux and t-slibgcc-libgcc from tmake_file.

Remove special case for arm-vxworks on the use of vxcrtstuff

Not needed any more after the recent cleanups issued for the
support of shared libraries.

2021-12-13 Olivier Hainque <hainque@adacore.com>

libgcc/
* config.host (*vxworks*): Remove special case for
arm on the use of vxcrtstuff.

Tigthen libc_internal and crtstuff for VxWorks shared objects

This change tightens and documents the use of libc_internal, then
strengthens the VxWorks crtstuff objects for the support of shared
libraries. In particular:

- Define __dso_handle, which libstdc++.so requires,

- Provide _init and _fini functions to run through the init/fini arrays
  for shared libs in configurations which HAVE_INITFINI_ARRAY_SUPPORT.

The init/fini functions are provided by libc_internal.a for static links
but with slightly different names and we don't want to risk dragging other
libc_internal contents in the closure accidentally so make sure we don't
link with it.

As for the !vxworks crtstuff, the new shared libs specific bits are
conditioned by a CRTSTUFFS_O macro, for which we provide new Makefile
fragment.

The bits to actually use the fragment and the shared objects will
be added by a forthcoming change, as part of a more general configury
update for shared libs.

The change also adds guards the eh table registration code
in vxcrtstuff so the objects can be used for either init/fini
or eh tables independently.

2021-12-07  Fred Konrad  <konrad@adacore.com>
    Olivier Hainque  <hainque@adacore.com>

gcc/
* config/vxworks.h (VXWORKS_BASE_LIBS_RTP): Guard -lc_internal
on !shared+!non-static and document.
(VXWORKS_LIB_SPEC): Remove the bits intended to drag the
init/fini functions from libc_internal in the shared lib case.
(VX_CRTBEGIN_SPEC/VX_CRTEND_SPEC): Use vxcrtstuff objects also in
configurations with shared lib and INITFINI_ARRAY support.

libgcc/
* config/t-vxcrtstuffS: New Makefile fragment.
* config/vxcrtstuff.c: Provide __dso_handle. Provide _init/_fini
functions for INITFINI_ARRAY support in shared libs and guard
the definition of eh table registration functions on conditions
indicating they are needed.

VxWorks config fixes for shared objects

This strengthens the VxWorks configuration files for the support
of shared objects, which encompasses a VxWorks specific "non-static"
mode for RTPs (in addition to -static and -shared).

2020-11-06 Fred Konrad <konrad@adacore.com>
Olivier Hainque <hainque@adacore.com>

gcc/
* config/vx-common.h: Define REAL_LIBGCC_SPEC since the
'-non-static' option is not standard.
* config/vxworks.h (VXWORKS_LIBGCC_SPEC): Implement the LIBGCC_SPEC
since REAL_LIBGCC_SPEC is used now.
(STARTFILE_PREFIX_SPEC): Use the PIC VSB when building shared libraries
or non-static binaries.

Preserve cpu specific CRTSTUFF_T_CFLAGS on powerpc-vxworks7

The unconditional assignment performed in t-vxworks to handle
include flags currently overrides what specific cpu ports had
for the regular (!vxworks) crtstuff objects.

This was not done on purpose and the proposed change adjusts the
configuration bits to apply the vxworks specific flags on top of
the cpu ones instead.

2021-12-07 Olivier Hainque <hainque@adacore.com>

* config.host (powerpc*-wrs-vxworks7*): Place t-crtstuff
ahead of the other files in tmake_files.
* config/t-vxworks: Add to CRTSTUFF_T_CFLAGS instead of
overriding it.

Add -fipa-strict-aliasing

gcc/ChangeLog:

2021-12-13 Jan Hubicka <hubicka@ucw.cz>

* common.opt: Add -fipa-strict-aliasing.
* doc/invoke.texi: Document -fipa-strict-aliasing.
* ipa-modref.c (modref_access_analysis::record_access): Honor
-fipa-strict-aliasing.
(modref_access_analysis::record_access_lto): Likewise.

aarch64: Add command-line support for Armv8.8-a

This final patch in the series is much simpler and adds command-line support for -march=armv8.8-a,
making use of the +mops features added in the previous patches.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (armv8.8-a): Define.
* config/aarch64/aarch64.h (AARCH64_FL_V8_8): Define.
(AARCH64_FL_FOR_ARCH8_8): Define.
* doc/invoke.texi: Document -march=armv8.8-a.

aarch64: Use +mops to inline memset operations

This 3rd patch in the series adds an inline sequence for the memset operation.
The aarch64-mops-memset-size-threshold param is added to control the size threshold for the sequence.
Its default setting is 256, which may seem a bit high, but it is consistent with the current
SIMD memset inline sequence limit, and future CPU tunings can override it easily as needed.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_expand_setmem_mops): Define.
(aarch64_expand_setmem): Adjust for TARGET_MOPS.
* config/aarch64/aarch64.h (CLEAR_RATIO): Adjust for TARGET_MOPS.
(SET_RATIO): Likewise.
* config/aarch64/aarch64.md ("unspec"): Add UNSPEC_SETMEM.
(aarch64_setmemdi): Define.
(setmemdi): Adjust for TARGET_MOPS.
* config/aarch64/aarch64.opt (aarch64-mops-memset-size-threshold):
New param.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mops_3.c: New test.

aarch64: Add memmove expansion for +mops

This second patch in the series adds an inline movmem expansion for TARGET_MOPS
that emits the recommended sequence.

A new param aarch64-mops-memmove-size-threshold is added to control the memmove size threshold
for this expansion. Its default value is zero to be consistent with the current behaviour where
we always emit a libcall, as we don't currently have a movmem inline expansion
(we should add a compatible-everywhere inline expansion, but that's for the future), so we should
always prefer to emit the MOPS sequence when available in lieu of a libcall.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_movmemdi): Define.
(movmemdi): Define.
(unspec): Add UNSPEC_MOVMEM.
* config/aarch64/aarch64.opt (aarch64-mops-memmove-size-threshold):
New param.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mops_2.c: New test.

aarch64: Add support for Armv8.8-a memory operations and memcpy expansion

This patch adds the +mops architecture extension flag from the 2021 Arm Architecture extensions, Armv8.8-a.
The +mops extensions introduce instructions to accelerate the memcpy, memset, memmove standard functions.
The first patch here uses the instructions in the inline memcpy expansion.
Further patches in the series will use similar instructions to inline memmove and memset.

A new param, aarch64-mops-memcpy-size-threshold, is introduced to control the size threshold above which to
emit the new sequence. Its default setting is 256 bytes, which is the same as the current threshold above
which we'd emit a libcall.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (mops): Define.
* config/aarch64/aarch64.c (aarch64_expand_cpymem_mops): Define.
(aarch64_expand_cpymem): Define.
* config/aarch64/aarch64.h (AARCH64_FL_MOPS): Define.
(AARCH64_ISA_MOPS): Define.
(TARGET_MOPS): Define.
(MOVE_RATIO): Adjust for TARGET_MOPS.
* config/aarch64/aarch64.md ("unspec"): Add UNSPEC_CPYMEM.
(aarch64_cpymemdi): New pattern.
(cpymemdi): Adjust for TARGET_MOPS.
* config/aarch64/aarch64.opt (aarch64-mops-memcpy-size-threshol):
New param.
* doc/invoke.texi (AArch64 Options): Document +mops.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mops_1.c: New test.

inline: fix ICE with -fprofile-generate

PR ipa/103636

gcc/ChangeLog:

* ipa-inline.c (can_inline_edge_p): Move logic checking
no_profile_instrument_function logic to ...
(can_early_inline_edge_p): ... here.

Include yvals.h for VxWorks < 7 RTPs as well

For -mrtp on VxWorks 6.9, at least inttypes.h ends up #including
system headers checking that _BITS_BYTES is 8, which the system yvals.h
defines. We do pre-include _yvals.h ahead of inttypes.h for this kind of
purpose, but it currently assumes that only VxWorks >= 7 provides yvals.h.

This results in unexpected configure checks failures, complaining about
_BITS_BYTES not being 8, spotted while inspecting libstdc++ config.log for
unrelated reasons.

This change relaxes the guard in _yvals.h to include yvals.h for
__RTP__ in addition to version >= 7.

2021-12-13 Olivier Hainque <hainque@adacore.com>

* config/vxworks/_yvals.h: #include yvals.h also if
defined(__RTP__).

Ensure VxWorks headers expose C99 features for C++

C++ relies on C99 features since C++11 and libstdc++ down to c++98
checks for C99 features at configure time. Simpler is to request C99
features from system headers unconditionally.

2021-12-11 Olivier Hainque <hainque@adacore.com>

* config/vxworks.h (VXWORKS_OS_CPP_BUILTINS): Define
_C99 for C++.

Leverage sysroot for VxWorks

The build of a VxWorks toolchain relies a lot on system headers
and VxWorks has a few very specific features that require special
processing. For example, different sets of headers for the kernel
vs the rtp modes, which the compiler knows about by way of -mrtp
on the command line.

If we manage to avoid the need for fixincludes on recent versions
of VxWorks (>= 7), we still need to handle at least VxWorks 6.9 at
this stage.

We sort of get away with locating the correct headers at
run-time thanks to environment variables and various tests for
-mrtp in cpp specs, but getting fixincludes to work for old
configurations has always been tricky and getting a toolchain
to build with c++/libstdc++ support gets trickier with every
move to a more recent release.

sysroot_headers_suffix_spec is a pretty powerful device to help
address such issues, and this patch introduces changes that let
us get advantage of it.

The general idea is to leverage the assumption that compilations
occur with --sysroot=$VSB_DIR on vx7 or --sysroot=$WIND_BASE/target
prior to that.

For the toolchains we build, this is achieved with a few
configure options like:

  --with-sysroot
  --with-build-sysroot=${WIND_BASE}/target
  --with-specs=%{!sysroot=*:--sysroot=%:getenv(WIND_BASE /target)}

This also allows simplifying the libgcc compilation flags control
and we take the opportunity to merge t-vxworks7 into t-vxworks as
the two files were differing only on the libgcc2 flags part.

2021-12-09  Olivier Hainque  <hainque@adacore.com>

gcc/
* config/t-vxworks: Clear NATIVE_SYSTEM_HEADER_DIR.
* config/vxworks.h (SYSROOT_HEADERS_SUFFIX_SPEC): Define, for
VxWorks 7 and earlier.
(VXWORKS_ADDITIONAL_CPP_SPEC): Simplify accordingly.
(STARTFILE_PREFIX_SPEC): Adjust accordingly.
* config/rs6000/vxworks.h (STARTFILE_PREFIX_SPEC): Adjust.

libgcc/
* config/t-vxworks (LIBGCC2_INCLUDES): Simplify and handle
both VxWorks7 and earlier.
* config/t-vxworks7: Remove.
* config.host: Remove special case for vxworks7.

libstdc++: Add support for '?' in linker script globs

The scripts/make_exports.pl script used for darwin only replaces '*'
wildcards in globs, it doesn't handle '?'. This means the recent changes
to std::__timepunct exports broke darwin.

Rather than use mangled names in the linker script, this adds support
for '?' to the perl script.

This also removes some unnecessary escaping of the replacement strings
in s// substitutions.

libstdc++-v3/ChangeLog:

* scripts/make_exports.pl: Replace '?' with '.' when turning
a glob into a regex.

Fortran: Handle compare in OpenMP atomic

gcc/fortran/ChangeLog:

PR fortran/103576
* openmp.c (is_scalar_intrinsic_expr): Fix condition.
(resolve_omp_atomic): Fix/update checks, accept compare.
* trans-openmp.c (gfc_trans_omp_atomic): Handle compare.

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.1): Set Fortran support for atomic to 'Y'.
* testsuite/libgomp.fortran/atomic-19.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/atomic-25.f90: Remove sorry, fix + add checks.
* gfortran.dg/gomp/atomic-26.f90: Likewise.
* gfortran.dg/gomp/atomic-21.f90: New test.

libstdc++: Make ranges::size and ranges::empty check for unbounded arrays

Passing IncompleteType(&)[] to ranges::begin produces an error outside
the immediate context, which is fine for ranges::begin, but it means
that we fail to enforce the SFINAE-able constraints for ranges::size and
ranges::size. They should not be callable for any array of unknown
bound, whether the type is complete or not. Because we don't enforce
that in their constraints, we get a hard error when they try to use
ranges::begin.

This simply adds explicit checks for arrays of unknown bound to the
constraints for ranges::size and ranges::empty. We only need to check it
for the __sentinel_size and __eq_iter_empty concepts, because those are
the ones that are relevant to arrays, and which try to use
ranges::begin.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (ranges::size, ranges::empty): Add
explicit check for unbounded arrays before using ranges::begin.
* testsuite/std/ranges/access/empty.cc: Check handling of unbounded
arrays.
* testsuite/std/ranges/access/size.cc: Likewise.

libstdc++: Fix std::regex_replace for strings with embedded null [PR103664]

The overload of std::regex_replace that takes a std::basic_string as the
fmt argument (for the replacement string) is implemented in terms of the
one taking a const C*, which uses std::char_traits to find the length.
That means it stops at a null character, even though the basic_string
might have additional characters beyond that.

Rather than duplicate the implementation of the const C* one for the
std::basic_string case, this moves that implementation to a new
__regex_replace function which takes a const C* and a length. Then both
the std::basic_string and const C* overloads can call that (with the
latter using char_traits to find the length to pass to the new
function).

libstdc++-v3/ChangeLog:

PR libstdc++/103664
* include/bits/regex.h (__regex_replace): Declare.
(regex_replace): Use it.
* include/bits/regex.tcc (__regex_replace): Replace regex_replace
definition with __regex_replace.
* testsuite/28_regex/algorithms/regex_replace/char/103664.cc: New test.

docs: add missing @item for the first item

gcc/ChangeLog:

* doc/extend.texi: Use @item for the first @itemx entry.

pch: Small cleanup

> Fixed thusly, compile tested on x86_64-linux, committed to trunk.

Here is a small cleanup.  IMHO we should use gt_pointer_operator instead of
specifying manually void (*) (void *, void *) or
void (*) (void *, void *, void *) so that next time we want to change it,
we don't have to trace all the spots.  I was afraid it wouldn't work due to
header dependencies, but it works well.  gengtype generated files also
use gt_pointer_operator.

2021-12-13  Jakub Jelinek  <jakub@redhat.com>

* machmode.h (gt_pch_nx): Use gt_pointer_operator as type of second
argument instead of equivalent void (*) (void *, void *, void *).
* poly-int.h (gt_pch_nx): Likewise.
* wide-int.h (gt_pch_nx): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (gt_pch_nx): Likewise.

Do not ICE on ternary expressions when calculating value ranges

gcc/ChangeLog:

2021-12-12 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103513
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Do not ICE
on ternary expression.

gcc/testsuite/ChangeLog:

2021-12-12 Jan Hubicka <hubicka@ucw.cz>

PR ipa/103513
* gcc.c-torture/compile/pr103513.c: New test.

pragma: Update target option node when optimization changes [PR103515]

For a function with optimize pragma, it's possible that the target
options change as optimization options change.  Now we create one
optimization option node when optimize pragma parsing, but don't
create target option node for possible target option changes.  It
makes later processing not detect the target options can actually
change and further doesn't update the target options accordingly.

This patch is to check whether target options have changed when
creating one optimization option node for pragma optimize, and
make one target option node if needed.  The associated test case
shows the difference.  Without this patch, the function foo1 will
perform unrolling which is unexpected.  The reason is that flag
unroll_only_small_loops isn't correctly set for it.  The value
is updated after parsing function foo2, but doesn't get restored
later since both decls don't have DECL_FUNCTION_SPECIFIC_TARGET
set and the hook thinks we don't need to switch.  With this patch,
there is no unrolling for foo1, which is also consistent with the
behavior by replacing pragma by attribute whether w/ and w/o this
patch.

As Martin noted, this change does the similar thing like what his
previous commit r12-1039 did.

gcc/ChangeLog:

PR target/103515
* attribs.c (decl_attributes): Check if target options change and
create one node if so.

gcc/testsuite/ChangeLog:

PR target/103515
* gcc.target/powerpc/pr103515.c: New test.

Daily bump.

Replace gnu::unique_ptr with std::unique_ptr

Now that GCC is compiled as C++11 there is no need to keep the C++03
implementation of gnu::unique_ptr.

This removes the unique-ptr.h header and replaces it with <memory> in
system.h, and changes the INCLUDE_UNIQUE_PTR macro to INCLUDE_MEMORY.
Uses of gnu::unique_ptr and gnu::move can be replaced with
std::unique_ptr and std::move. There are no uses of unique_xmalloc_ptr
or xmalloc_deleter in GCC.

gcc/analyzer/ChangeLog:

* engine.cc: Define INCLUDE_MEMORY instead of INCLUDE_UNIQUE_PTR.

gcc/c-family/ChangeLog:

* known-headers.cc: Define INCLUDE_MEMORY instead of
INCLUDE_UNIQUE_PTR.
* name-hint.h: Likewise.
(class name_hint): Use std::unique_ptr instead of gnu::unique_ptr.

gcc/c/ChangeLog:

* c-decl.c: Define INCLUDE_MEMORY instead of INCLUDE_UNIQUE_PTR.
* c-parser.c: Likewise.

gcc/cp/ChangeLog:

* error.c: Define INCLUDE_MEMORY instead of
INCLUDE_UNIQUE_PTR.
* lex.c: Likewise.
* name-lookup.c: Likewise.
(class namespace_limit_reached): Use std::unique_ptr instead of
gnu::unique_ptr.
(suggest_alternatives_for): Use std::move instead of gnu::move.
(suggest_alternatives_in_other_namespaces): Likewise.
* parser.c: Define INCLUDE_MEMORY instead of INCLUDE_UNIQUE_PTR.

gcc/ChangeLog:

* Makefile.in: Remove unique-ptr-tests.o.
* selftest-run-tests.c (selftest::run_tests): Remove
unique_ptr_tests_cc_tests.
* selftest.h (unique_ptr_tests_cc_tests): Remove.
* system.h: Check INCLUDE_MEMORY instead of INCLUDE_UNIQUE_PTR
and include <memory> instead of "unique-ptr.h".
* unique-ptr-tests.cc: Removed.

include/ChangeLog:

* unique-ptr.h: Removed.

libgccjit: Add support for setting the link section of global variables [PR100688]

2021-12-12 Antoni Boucher <bouanto@zoho.com>

gcc/jit/
PR target/100688
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_18): New ABI
tag.
* docs/topics/expressions.rst: Add documentation for the
function gcc_jit_lvalue_set_link_section.
* jit-playback.h: New function (set_link_section).
* jit-recording.c: New function (set_link_section) and
support for setting the link section.
* jit-recording.h: New function (set_link_section) and new
field m_link_section.
* libgccjit.c: New function (gcc_jit_lvalue_set_link_section).
* libgccjit.h: New function (gcc_jit_lvalue_set_link_section).
* libgccjit.map (LIBGCCJIT_ABI_18): New ABI tag.

gcc/testsuite/
PR target/100688
* jit.dg/all-non-failing-tests.h: Mention new test
link-section-assembler.
* jit.dg/test-link-section-assembler.c: New test.
* jit.dg/jit.exp: New helper function to test that the
assembly contains a pattern.

nvptx: Add (experimental) support for HFmode with -misa=sm_53

The recent flurry of activity around HFmode on gcc-patches intrigued me
to investigate adding HFmode support to the nvptx backend. NVidia GPUs
with an SM ISA above 5.3 support IEEE 16-bit floating point instructions.
Hence, this patch adds support for -misa=sm_53, and implements some
backend patterns/insns sufficient for a proof-of-concept prototype.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.

gcc/ChangeLog:

* config/nvptx/nvptx-opts.h (ptx_isa): Add PTX_ISA_SM53 ISA level
to enumeration.
* config/nvptx/nvptx.opt: Add sm_53 to -misa.
* config/nvptx/nvptx-modes.def: Add support for HFmode.
* config/nvptx/nvptx.h (TARGET_SM53):
New helper macro to conditionalize functionality on target ISA.
* config/nvptx/nvptx-c.c (nvptx_cpu_cpp_builtins): Add __PTX_SM__
support for the new ISA levels.
* config/nvptx/nvptx.c (nvtx_ptx_type_from_mode): Support new HFmode
with the ".f16" suffix/qualifier.
(nvptx_file_start): Add support for TARGET_SM53.
(nvptx_omp_device_kind_arch_isa): Add support for TARGET_SM53
and tweak TARGET_SM35.
(nvptx_scalar_mode_supported_p): Target hook with conditional
HFmode support on TARGET_SM53 and higher.
(nvptx_libgcc_floating_mode_supported_p): Likewise.
(TARGET_SCALAR_MODE_SUPPORTED_P): Use nvptx_scalar_mode_supported_p.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Likewise, use new hook.
* config/nvptx/nvptx.md (*movhf_insn): New define_insn.
(movhf): New define_expand for HFmode moves.
(addhf3, subhf3, mulhf, extendhf<mode>2, trunc<mode>hf2): New
instructions conditional on TARGET_SM53 (i.e. -misa=sm_53).

gcc/testsuite/ChangeLog:

* gcc.target/nvptx/float16-1.c: New test case.

Terminate BB analysis on NULL memory access in ipa-pure-const and ipa-modref

As discussed in the PR, we miss some optimization becuase
gimple-ssa-isolate-paths turns NULL memory accesses to volatile and adds
__builtin_trap after them.  This is seen as a side-effect by IPA analysis
and additionally the (fully unreachable) builtin_trap is believed to load
all global memory.

I think we should think of less intrusive gimple representation of this, but
it is also easy enough to special case that in IPA analysers as done in
this patch.  This is a win even if we improve the representation since
gimple-ssa-isolate-paths is run late and this way we improve optimization
early.

This affects 1623 functions during cc1plus link.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-12-12  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/103665
* ipa-modref.c (modref_access_analysis::analyze): Terminate BB
analysis on NULL memory access.
* ipa-pure-const.c (analyze_function): Likewise.

Daily bump.

libgccjit: Add support for TLS variable [PR95415]

2021-12-11 Antoni Boucher <bouanto@zoho.com>

gcc/jit/
PR target/95415
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_17): New ABI
tag.
* docs/topics/expressions.rst: Add document for the function
gcc_jit_lvalue_set_tls_model.
* jit-playback.h: New function (set_tls_model).
* jit-recording.c: New function (set_tls_model), new
variables (tls_models and tls_model_enum_strings) and support
for setting the tls model.
* jit-recording.h: New function (set_tls_model) and new
field m_tls_model.
* libgccjit.c: New function (gcc_jit_lvalue_set_tls_model).
* libgccjit.h: New function (gcc_jit_lvalue_set_tls_model)
and new enum (gcc_jit_tls_model).
* libgccjit.map (LIBGCCJIT_ABI_17): New ABI tag.

gcc/testsuite/
PR target/95415
* jit.dg/all-non-failing-tests.h: Add test-tls.c.
* jit.dg/test-tls.c: New test.

libgccjit: Add support for types used by atomic builtins [PR96066] [PR96067]

2021-12-11 Antoni Boucher <bouanto@zoho.com>

gcc/jit/
PR target/96066
PR target/96067
* jit-builtins.c: Implement missing types for builtins.
* jit-recording.c:: Allow sending a volatile const void * as
argument.
* jit-recording.h: New functions (is_volatile, is_const) and
allow comparing qualified types.

gcc/testsuite/
PR target/96066
PR target/96067
* jit.dg/all-non-failing-tests.h: Add test-builtin-types.c.
* jit.dg/test-builtin-types.c
* jit.dg/test-error-bad-assignment.c
* jit.dg/test-fuzzer.c: Add fuzzing for type qualifiers.

Signed-off-by: Antoni Boucher <bouanto@zoho.com>

Fortran: fix checking of elemental functions of type CLASS

gcc/fortran/ChangeLog:

PR fortran/103606
* resolve.c (resolve_fl_procedure): Do not access CLASS components
before class container has been built.

gcc/testsuite/ChangeLog:

PR fortran/103606
* gfortran.dg/pr103606.f90: New test.

Avoid updating hot bb threshold in call speculation code

This patch removes apparently forgotten debugging hack (which got in during
the speculative call patchset) which reduces hot bb threshold.  This does not
make sense since it is set and reset randomly as the summaries are processed.
One problem is that we set the BB threshold to make certain BBs hot and hten
unrolling or vectorization may reduce it to some fraction of the count that
makes it cold.  We may want to add some buffer and divide the value by,
say 32, but that shoulid be done independently of speculative calls.

gcc/ChangeLog:

2021-12-11  Jan Hubicka  <hubicka@ucw.cz>

* ipa-profile.c (ipa_profile): Do not update hot bb threshold.

Fix handling of thunks in ipa-modref

Thunks are not transparent for ipa-modref summary since it cares about offsets
from pointer parameters and also for virtual thunk about the read from memory
in there.  We however use function_or_virtual_thunk_symbol to get the summary
that may lead to wrong code (and does in two testsuite testcases with patch
I am working on).  This is a first aid fix that is bacportable to gcc 11.
We could easily produce summary for thunk on demand.  I will look into it
incrementally.  It is not very important since we usually inline the thunk when
we devirutalize...

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

2021-12-11  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref.c (get_modref_function_summary): Use ultimate_alias_target.
(ignore_edge): Likewise.
(compute_parm_map): Likewise.
(modref_propagate_in_scc): Likewise.
(modref_propagate_flags_in_scc): Likewise.

libgcc: vxcrtstuff.c: make ctor/dtor functions static

When the translation unit itself creates pointers to the ctors/dtors
in a specific section handled by the linker (whether .init_array or
.ctors.*), there's no reason for the functions to have external
linkage. That ends up polluting the symbol table in the running
kernel.

This makes vxcrtstuff.c on par with the generic crtstuff.c which also
defines e.g. frame_dummy and __do_global_dtors_aux static.

libgcc/
* config/vxcrtstuff.c: Make constructor and destructor
functions static when possible.

libgcc: vxcrtstuff.c: remove ctor/dtor declarations

These declarations prevent the priority given in the
constructor/destructor attributes from taking effect, thus emitting
the function pointers in the ordinary (lowest-priority)
.init_array/.fini_array sections.

libgcc/
* config/vxcrtstuff.c: Remove constructor/destructor
declarations.

libstdc++: check length in string append [PR103534]

In the testcase for 103534 we get a warning about append leading to memcpy
of a very large number of bytes overflowing the buffer. This turns out to
be because we weren't calling _M_check_length for string append. Rather
than do that directly, let's go through the public pointer append that calls
it.

PR c++/103534

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (append (basic_string)): Call pointer
append instead of _M_append directly.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wstringop-overflow-8.C: New test.

Daily bump.

libgcc, Darwin: Update darwin10 unwinder shim dependencies.

We include libgcc_tm.h to provide a prototype for this shim
so add that to the make dependencies.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:

* config/t-darwin: Add libgcc_tm.h to the dependencies
for darwin10-unwind-find-enc-func.

jit: set DECL_CONTEXT of RESULT_DECL [PR103562]

libgccjit was failing to set the DECL_CONTEXT of function RESULT_DECLs,
leading to them failing to be properly handled by the inlining machinery.
Fixed thusly.

gcc/jit/ChangeLog:
PR jit/103562
* jit-playback.c (gcc::jit::playback::context::new_function): Set
DECL_CONTEXT of the result_decl.

gcc/testsuite/ChangeLog:
PR jit/103562
* jit.dg/all-non-failing-tests.h: Add comment about...
* jit.dg/test-pr103562.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>