git.ipfire.org Git - thirdparty/gcc.git/log

PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

Fortran: Fix attributes and bounds in ISO_Fortran_binding.

2021-07-26 José Rui Faustino de Sousa <jrfsousa@gmail.com>
Tobias Burnus <tobias@codesourcery.com>

PR fortran/93308
PR fortran/93963
PR fortran/94327
PR fortran/94331
PR fortran/97046

gcc/fortran/ChangeLog:

* trans-decl.c (convert_CFI_desc): Only copy out the descriptor
if necessary.
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Updated attribute
handling which reflect a previous intermediate version of the
standard. Only copy out the descriptor if necessary.

libgfortran/ChangeLog:

* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Add code
to verify the descriptor. Correct bounds calculation.
(gfc_desc_to_cfi_desc): Add code to verify the descriptor.

gcc/testsuite/ChangeLog:

* gfortran.dg/ISO_Fortran_binding_1.f90: Add pointer attribute,
this test is still erroneous but now it compiles.
* gfortran.dg/bind_c_array_params_2.f90: Update regex to match
code changes.
* gfortran.dg/PR93308.f90: New test.
* gfortran.dg/PR93963.f90: New test.
* gfortran.dg/PR94327.c: New test.
* gfortran.dg/PR94327.f90: New test.
* gfortran.dg/PR94331.c: New test.
* gfortran.dg/PR94331.f90: New test.
* gfortran.dg/PR97046.f90: New test.

Abstract out conditional simplification out of execute_vrp.

VRP simplifies conditionals involving casted values outside of the main
folding mechanism, because this optimization inhibits the VRP jump
threader from threading through the comparison.

As part of replacing VRP with an evrp instance, I am making sure we do
everything VRP does.  Hence, I am abstracting this functionality out so
we can call it from from elsewhere.

ISTM that when the proposed ranger-based jump threader can handle
everything the forward threader does, there will be no need for this
optimization to be done outside of the evrp folder.  Perhaps we can fold
this into the substitute_using_ranges class.  But that's further down
the line.

Also, there is no need to pass a vr_values around, when the base
range_query class will do.  I fixed this, at it makes it trivial to pass
down a ranger or evrp instance.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-vrp.c (vrp_simplify_cond_using_ranges): Rename vr_values
with range_query.
(execute_vrp): Abstract out simplification of conditionals...
(simplify_casted_conds): ...here.

Pass gimple context to array_bounds_checker.

I have changed the use of the array_bounds_checker in VRP to use a
ranger in my local tree to make sure there are no regressions when using
either VRP or the ranger. In doing so I noticed that the checker
does not pass context to get_value_range, which causes the ranger to miss a
few cases. This patch fixes the oversight.

Tested on x86-64 Linux using the array bounds checker both with VRP and
the ranger.

gcc/ChangeLog:

* gimple-array-bounds.cc (array_bounds_checker::get_value_range):
Add gimple argument.
(array_bounds_checker::check_array_ref): Same.
(array_bounds_checker::check_addr_expr): Same.
(array_bounds_checker::check_array_bounds): Pass statement to
check_array_bounds and check_addr_expr.
* gimple-array-bounds.h (check_array_bounds): Add gimple argument.
(check_addr_expr): Same.
(get_value_range): Same.

AArch64: correct dot-product RTL patterns for aarch64.

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead I rewrite the RTL to be in canonical ordering and merge them.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
(sdot_prod, udot_prod): ... This.
* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): Merged
into...
(<sur>dot_prod<vsi2qi>): ... this.
(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
Change operands order.
(<sur>sadv16qi): Use new operands order.
* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
vdotq_s32): Use new RTL ordering.

AArch64: correct usdot vectorizer and intrinsics optabs

There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This means we need different patterns here. This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.

There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
aarch64_types_ternop_suss_qualifiers): New.
* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.

openmp: Add support for omp attributes section and scan directives

This patch adds support for expressing the section and scan directives
using the attribute syntax and additionally fixes some bugs in the attribute
syntax directive handling.
For now it requires that the scan and section directives appear as the only
attribute, not combined with other OpenMP or non-OpenMP attributes on the same
statement.

2021-07-26  Jakub Jelinek  <jakub@redhat.com>

* parser.h (struct cp_lexer): Add orphan_p member.
* parser.c (cp_parser_statement): Don't change in_omp_attribute_pragma
upon restart from CPP_PRAGMA handling.  Fix up condition when a lexer
should be destroyed and adjust saved_tokens if it records tokens from
the to be destroyed lexer.
(cp_parser_omp_section_scan): New function.
(cp_parser_omp_scan_loop_body): Use it.  If
parser->lexer->in_omp_attribute_pragma, allow optional comma
after scan.
(cp_parser_omp_sections_scope): Use cp_parser_omp_section_scan.

* g++.dg/gomp/attrs-1.C: Use attribute syntax even for section
and scan directives.
* g++.dg/gomp/attrs-2.C: Likewise.
* g++.dg/gomp/attrs-6.C: New test.
* g++.dg/gomp/attrs-7.C: New test.
* g++.dg/gomp/attrs-8.C: New test.

Daily bump.

[Ada] Declare time_t uniformly based on a system parameter #2

gcc/ada/

* libgnat/s-osprim__x32.adb: Add missing with clause.

Daily bump.

include: Fix -Wundef warnings in ansidecl.h

This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.

I've also tested -traditional-cpp.

include/ChangeLog:

* ansidecl.h: Check if __cplusplus is defined before checking
the value of __cpp_constexpr and __cplusplus. Don't check
__STDC_VERSION__ in C++.

Daily bump.

Fortran: extend check for array arguments and reject CLASS array elements.

gcc/fortran/ChangeLog:

PR fortran/101536
* check.c (array_check): Adjust check for the case of CLASS
arrays.

gcc/testsuite/ChangeLog:

PR fortran/101536
* gfortran.dg/pr101536.f90: New test.

expmed: Fix store_integral_bit_field [PR101562]

Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
     This expression code is used in only one context: as the
     destination operand of a 'set' expression.  In addition, the
     operand of this expression must be a non-paradoxical 'subreg'
     expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.

* gcc.c-torture/compile/pr101562.c: New test.

Use range_query object in array bounds class.

Now that all dependencies of array_bounds_checker take a range_query, we
can sever the relationship with vr_values. Changing this will allow us
to use the array_bounds_checker with VRP, evrp, or the ranger.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-array-bounds.h (class array_bounds_checker): Change
ranges type to range_query.

aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x2
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.

gcc/ChangeLog:

2021-07-23 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst1_s64_x2): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_oi one vector at a time.
(vst1_u64_x2): Likewise.
(vst1_f64_x2): Likewise.
(vst1_s8_x2): Likewise.
(vst1_p8_x2): Likewise.
(vst1_s16_x2): Likewise.
(vst1_p16_x2): Likewise.
(vst1_s32_x2): Likewise.
(vst1_u8_x2): Likewise.
(vst1_u16_x2): Likewise.
(vst1_u32_x2): Likewise.
(vst1_f16_x2): Likewise.
(vst1_f32_x2): Likewise.
(vst1_p64_x2): Likewise.
(vst1q_s8_x2): Likewise.
(vst1q_p8_x2): Likewise.
(vst1q_s16_x2): Likewise.
(vst1q_p16_x2): Likewise.
(vst1q_s32_x2): Likewise.
(vst1q_s64_x2): Likewise.
(vst1q_u8_x2): Likewise.
(vst1q_u16_x2): Likewise.
(vst1q_u32_x2): Likewise.
(vst1q_u64_x2): Likewise.
(vst1q_f16_x2): Likewise.
(vst1q_f32_x2): Likewise.
(vst1q_f64_x2): Likewise.
(vst1q_p64_x2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x3
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x3 intrinsics.

gcc/ChangeLog:

2021-07-23 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst1_s64_x3): Use
__builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vst1_u64_x3): Likewise.
(vst1_f64_x3): Likewise.
(vst1_s8_x3): Likewise.
(vst1_p8_x3): Likewise.
(vst1_s16_x3): Likewise.
(vst1_p16_x3): Likewise.
(vst1_s32_x3): Likewise.
(vst1_u8_x3): Likewise.
(vst1_u16_x3): Likewise.
(vst1_u32_x3): Likewise.
(vst1_f16_x3): Likewise.
(vst1_f32_x3): Likewise.
(vst1_p64_x3): Likewise.
(vst1q_s8_x3): Likewise.
(vst1q_p8_x3): Likewise.
(vst1q_s16_x3): Likewise.
(vst1q_p16_x3): Likewise.
(vst1q_s32_x3): Likewise.
(vst1q_s64_x3): Likewise.
(vst1q_u8_x3): Likewise.
(vst1q_u16_x3): Likewise.
(vst1q_u32_x3): Likewise.
(vst1q_u64_x3): Likewise.
(vst1q_f16_x3): Likewise.
(vst1q_f32_x3): Likewise.
(vst1q_f64_x3): Likewise.
(vst1q_p64_x3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

x86: Don't return hard register when LRA is in progress

Don't return hard register in ix86_gen_scratch_sse_rtx when LRA is in
progress to avoid ICE when there are no available hard registers for
LRA.

gcc/

PR target/101504
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): Don't return
hard register when LRA is in progress.

gcc/testsuite/

PR target/101504
* gcc.target/i386/pr101504.c: New test.

libstdc++: Reduce headers included by <future>

The <future> header only needs std::atomic_flag, so can include
<bits/atomic_base.h> instead of the whole of <atomic>.

libstdc++-v3/ChangeLog:

* include/std/future: Include <bits/atomic_base.h> instead of
<atomic>.

aarch64: Use memcpy to copy vector tables in vst1[q]_x4 intrinsics

Use __builtin_memcpy to copy vector structures instead of using a
union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.

gcc/ChangeLog:

2021-07-21 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst1_s8_x4): Use
__builtin_memcpy instead of using a union.
(vst1q_s8_x4): Likewise.
(vst1_s16_x4): Likewise.
(vst1q_s16_x4): Likewise.
(vst1_s32_x4): Likewise.
(vst1q_s32_x4): Likewise.
(vst1_u8_x4): Likewise.
(vst1q_u8_x4): Likewise.
(vst1_u16_x4): Likewise.
(vst1q_u16_x4): Likewise.
(vst1_u32_x4): Likewise.
(vst1q_u32_x4): Likewise.
(vst1_f16_x4): Likewise.
(vst1q_f16_x4): Likewise.
(vst1_f32_x4): Likewise.
(vst1q_f32_x4): Likewise.
(vst1_p8_x4): Likewise.
(vst1q_p8_x4): Likewise.
(vst1_p16_x4): Likewise.
(vst1q_p16_x4): Likewise.
(vst1_s64_x4): Likewise.
(vst1_u64_x4): Likewise.
(vst1_p64_x4): Likewise.
(vst1q_s64_x4): Likewise.
(vst1q_u64_x4): Likewise.
(vst1q_p64_x4): Likewise.
(vst1_f64_x4): Likewise.
(vst1q_f64_x4): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst2[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.

gcc/ChangeLog:

2021-07-21 Jonathan Wrightt <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vst2_u64): Likewise.
(vst2_f64): Likewise.
(vst2_s8): Likewise.
(vst2_p8): Likewise.
(vst2_s16): Likewise.
(vst2_p16): Likewise.
(vst2_s32): Likewise.
(vst2_u8): Likewise.
(vst2_u16): Likewise.
(vst2_u32): Likewise.
(vst2_f16): Likewise.
(vst2_f32): Likewise.
(vst2_p64): Likewise.
(vst2q_s8): Likewise.
(vst2q_p8): Likewise.
(vst2q_s16): Likewise.
(vst2q_p16): Likewise.
(vst2q_s32): Likewise.
(vst2q_s64): Likewise.
(vst2q_u8): Likewise.
(vst2q_u16): Likewise.
(vst2q_u32): Likewise.
(vst2q_u64): Likewise.
(vst2q_f16): Likewise.
(vst2q_f32): Likewise.
(vst2q_f64): Likewise.
(vst2q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst3[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst3q intrinsics.

gcc/ChangeLog:

2021-07-21 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst3_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_ci one vector
at a time.
(vst3_u64): Likewise.
(vst3_f64): Likewise.
(vst3_s8): Likewise.
(vst3_p8): Likewise.
(vst3_s16): Likewise.
(vst3_p16): Likewise.
(vst3_s32): Likewise.
(vst3_u8): Likewise.
(vst3_u16): Likewise.
(vst3_u32): Likewise.
(vst3_f16): Likewise.
(vst3_f32): Likewise.
(vst3_p64): Likewise.
(vst3q_s8): Likewise.
(vst3q_p8): Likewise.
(vst3q_s16): Likewise.
(vst3q_p16): Likewise.
(vst3q_s32): Likewise.
(vst3q_s64): Likewise.
(vst3q_u8): Likewise.
(vst3q_u16): Likewise.
(vst3q_u32): Likewise.
(vst3q_u64): Likewise.
(vst3q_f16): Likewise.
(vst3q_f32): Likewise.
(vst3q_f64): Likewise.
(vst3q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst4[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.

gcc/ChangeLog:

2021-07-20 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_xi one vector
at a time.
(vst4_u64): Likewise.
(vst4_f64): Likewise.
(vst4_s8): Likewise.
(vst4_p8): Likewise.
(vst4_s16): Likewise.
(vst4_p16): Likewise.
(vst4_s32): Likewise.
(vst4_u8): Likewise.
(vst4_u16): Likewise.
(vst4_u32): Likewise.
(vst4_f16): Likewise.
(vst4_f32): Likewise.
(vst4_p64): Likewise.
(vst4q_s8): Likewise.
(vst4q_p8): Likewise.
(vst4q_s16): Likewise.
(vst4q_p16): Likewise.
(vst4q_s32): Likewise.
(vst4q_s64): Likewise.
(vst4q_u8): Likewise.
(vst4q_u16): Likewise.
(vst4q_u32): Likewise.
(vst4q_u64): Likewise.
(vst4q_f16): Likewise.
(vst4q_f32): Likewise.
(vst4q_f64): Likewise.
(vst4q_p64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.

aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbx4
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

gcc/ChangeLog:

2021-07-19 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.

aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbl[34]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

gcc/ChangeLog:

2021-07-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.

aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbx[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbx[234] intrinsics.

gcc/ChangeLog:

2021-07-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New tests.

aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics

Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbl[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.

gcc/ChangeLog:

2021-07-08 Jonathan Wright <jonathan.wright@arm.com>

* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: New test.

libstdc++: Update documentation comments for namespace rel_ops

The comments in <bits/stl_relops.h> describe problems that were solved
years ago (for GCC 3.1). The comparison operators in <iterator> are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.

The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/stl_relops.h: Update documentation comments.

openmp: Add support for __has_attribute(omp::directive) and __has_attribute(omp::sequence)

Now that the C++ FE supports these attributes, but not through registering
them in the attributes tables (they work quite differently from other
attributes), this teaches c_common_has_attributes about those.

2021-07-23 Jakub Jelinek <jakub@redhat.com>

* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name also
on attr_id. Return 1 for omp::directive or omp::sequence in C++11
and later.

* c-c++-common/gomp/attrs-1.c: New test.
* c-c++-common/gomp/attrs-2.c: New test.
* c-c++-common/gomp/attrs-3.c: New test.

openmp: Diagnose invalid mixing of the attribute and pragma syntax directives

The OpenMP 5.1 spec says that the attribute and pragma syntax directives
should not be mixed on the same statement.  The following patch adds diagnostic
for that,
  [[omp::directive (...)]]
  #pragma omp ...
is always an error and for the other order
  #pragma omp ...
  [[omp::directive (...)]]
it depends on whether the pragma directive is an OpenMP construct
(then it is an error because it needs a structured block or loop
or statement as body) or e.g. a standalone directive (then it is fine).

Only block scope is handled for now though, namespace scope and class scope
still needs implementing even the basic support.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP__START_ and
PRAGMA_OMP__LAST_ enumerators.
gcc/cp/
* parser.h (struct cp_parser): Add omp_attrs_forbidden_p member.
* parser.c (cp_parser_handle_statement_omp_attributes): Diagnose
mixing of attribute and pragma syntax directives when seeing
omp::directive if parser->omp_attrs_forbidden_p or if attribute syntax
directives are followed by OpenMP pragma.
(cp_parser_statement): Clear parser->omp_attrs_forbidden_p after
the cp_parser_handle_statement_omp_attributes call.
(cp_parser_omp_structured_block): Add disallow_omp_attrs argument,
if true, set parser->omp_attrs_forbidden_p.
(cp_parser_omp_scan_loop_body, cp_parser_omp_sections_scope): Pass
false as disallow_omp_attrs to cp_parser_omp_structured_block.
(cp_parser_omp_parallel, cp_parser_omp_task): Set
parser->omp_attrs_forbidden_p.
gcc/testsuite/
* g++.dg/gomp/attrs-4.C: New test.
* g++.dg/gomp/attrs-5.C: New test.

testsuite: mips: pass -finline/-fnoinline through

gcc/testsuite/

* gcc.target/mips/mips.exp (mips_option_groups): add
-finline and -fno-inline.

Revert "testsuite: mips: use noinline attribute instead of -fno-inline"

This reverts commit 3b33b1136d5ba1903a56fa601a848accc3db46ef.

analyzer: fix feasibility false +ve with overly complex svalues

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc
(class auto_disable_complexity_checks): New.
(epath_finder::explore_feasible_paths): Use it to disable
complexity checks whilst processing the worklist.
* region-model-manager.cc
(region_model_manager::region_model_manager): Initialize
m_check_complexity.
(region_model_manager::reject_if_too_complex): Bail if
m_check_complexity is false.
* region-model.h
(region_model_manager::enable_complexity_check): New.
(region_model_manager::disable_complexity_check): New.
(region_model_manager::m_check_complexity): New.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/feasibility-3.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Fix execution failure of parity_1.f90 on P10 [PR100952]

gcc/
PR target/100952
* config/rs6000/rs6000.md (cstore<mode>4): Fix wrong fall through.

Daily bump.

Bind(c): signed char is not a Fortran character type

CFI_allocate and CFI_select_part were incorrectly treating
CFI_type_signed_char as a Fortran character type for the purpose of
deciding whether or not to use the elem_len argument. It is a Fortran
integer type per table 18.2 in the 2018 Fortran standard.

Other functions in ISO_Fortran_binding.c appeared to handle this case
correctly already.

2021-07-15 Sandra Loosemore <sandra@codesourcery.com>

libgfortran/
* runtime/ISO_Fortran_binding.c (CFI_allocate): Don't use elem_len
for CFI_type_signed_char.
(CFI_select_part): Likewise.

libstdc++: Fix non-default constructors for hash containers [PR101583]

When I added the new mixin to _Hashtable, I forgot to explicitly
construct it in each non-default constructor. That means you can't
use any constructors unless all three of the hash function, equality
function, and allocator are all default constructible.

libstdc++-v3/ChangeLog:

PR libstdc++/101583
* include/bits/hashtable.h (_Hashtable): Replace mixin with
_Enable_default_ctor. Construct it explicitly in all
non-forwarding, non-defaulted constructors.
* testsuite/23_containers/unordered_map/cons/default.cc: Check
non-default constructors can be used.
* testsuite/23_containers/unordered_set/cons/default.cc:
Likewise.

Add new test for PR65178.

gcc/testsuite/ChangeLog:
PR tree-optimization/65178
* gcc.dg/uninit-pr65178.c: New test.

Remove an invalid defintion [PR101568].

Resolves:
PR testsuite/101568 - g++.dg/ipa/pr82352.C fails

gcc/testsuite/ChangeLog:
PR testsuite/101568
* g++.dg/ipa/pr82352.C

Fix PR 10153: tail recusion for vector types.

The problem here is we try to an initialized value
from a scalar constant. For vectors we need to do
a vect_dup instead. This fixes that issue by using
build_{one,zero}_cst instead of integer_{one,zero}_node
when calling create_tailcall_accumulator.

Changes from v1:
* v2: Use build_{one,zero}_cst and get the correct type before.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/10153
* tree-tailcall.c (create_tailcall_accumulator):
Don't call fold_convert as the type should be correct already.
(tree_optimize_tail_calls_1): Use build_{one,zero}_cst instead
of integer_{one,zero}_node for the call of create_tailcall_accumulator.

gcc/testsuite/ChangeLog:

PR tree-optimization/10153
* gcc.c-torture/compile/pr10153-1.c: New test.
* gcc.c-torture/compile/pr10153-2.c: New test.

Allow non-null adjustments for pointers even when there is a known range.

Fix non_null_ref::adjust_range so it always adjust ranges, not just
varying ranges. This will allow pointers that have a range, but are not
necessarily non-null, to be adjusted.

gcc/ChangeLog:

* gimple-range-cache.cc (non_null_ref::adjust_range): Replace
varying_p check for null/non-null check.

aix: Protect AIX math.h overloads with new macro.

AIX math.h provides C++ overloaded inlined math functions, which should
not be present for G++. The definitions have been guaded by
__COMPATMATH__, but that macro had other uses in IBM xlC++. A new
macro has been introduced with the sole purpose of guarding the functions.
This patch updates libstdc++ os_defines.h to define the additional macro.
The earlier macro definition is retained to guard the functions in the
math.h header of earlier AIX releases.

libstdc++-v3/ChangeLog:

* config/os/aix/os_defines.h (__LIBC_NO_CPP_MATH_OVERLOADS__): Define.

libstdc++: Use __builtin_operator_new when available [PR94295]

Clang provides __builtin_operator_new and __builtin_operator_delete,
which have the same semantics as ::operator new and ::operator delete
except that the compiler is allowed to elide calls to them. This changes
std::allocator to use those built-in functions so that memory allocated
by std::allocator can be optimized away when using Clang. This avoids an
abstraction penalty for using std::allocator to allocate storage rather
than a new-expression.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/94295
* include/ext/new_allocator.h (_GLIBCXX_OPERATOR_NEW)
(_GLIBCXX_OPERATOR_DELETE, _GLIBCXX_SIZED_DEALLOC): Define.
(allocator::allocate, allocator::deallocate): Use new macros.

libstdc++: Use std::addressof in ranges::uninitialized_xxx [PR101571]

Make the ranges::uninitialized_xxx algorithms use std::addressof to
protect against iterator types that overload operator&.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101571
* include/bits/ranges_uninitialized.h (_DestroyGuard): Change
constructor parameter to reference and use addressof.
* testsuite/util/testsuite_iterators.h: Define deleted operator&
overloads for test iterators.

libstdc++: Initialize all subobjects of std::function

The std::function::swap member swaps each data member unconditionally,
resulting in -Wmaybe-uninitialized warnings for a default constructed
object. This happens because the _M_invoker and _M_functor members are
only initialized if the function has a target.

This change ensures that all subobjects are zero-initialized on
construction.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/std_function.h (_Function_base): Add
default member initializers and define constructor as defaulted.
(function::_M_invoker): Add default member initializer.

libstdc++: Restore __gnu_debug::array [PR100682]

As the PR points out, we removed the debug version of std::array without
any period of deprecation. Although std::array contains all the actual
debug checks now, removing the <debug/arrray> header breaks any code
that was using that explicitly. The manual still lists doing that as
supported.

This restores the <debug/array> header, but simply defines
__gnu_debug::array as an alias for std::array, and declares the alias
with the deprecated attribute. The docs are updated to match.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/100682
* doc/xml/manual/debug_mode.xml: Update documentation about
debug capability of std::array.
* doc/html/*: Regenerate.
* include/debug/array: New file.

Allow non-symmetrical equivalences.

Don't trap if equivalences are processed out of DOM order, and aren't
completely symmetrical. We will eventually resolve this, but its OK for now.

gcc/
PR tree-optimization/101511
* value-relation.cc (relation_oracle::query_relation): Check if ssa1
is in ssa2's equiv set, and don't trap if so.

gcc/testsuite/
* g++.dg/pr101511.C: New.

Check for undefined on COND_EXPR before querying type.

gcc/
PR tree-optimization/101497
* gimple-range-fold.cc (fold_using_range::range_of_cond_expr): Check
for undefined.

gcc/testsuite
* gcc.dg/pr101497.c: New.

Only call vrp_visit_cond_stmt if range_of_stmt doesn't resolve to a const.

Eevntually all functionality will be subsumed. Until then, call it only
if needed.

gcc/
PR tree-optimization/101496
* vr-values.c (simplify_using_ranges::fold_cond): Call range_of_stmt
first, then vrp_visit_cond_Stmt.

gcc/testsuite
* gcc.dg/pr101496.c: New.

Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.

Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.

benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast

The performance diff

strategy    : cycles
memory      : 1046611188
memory      : 1255420817
memory      : 1044720793
memory      : 1253414145
average     : 1097868397

broadcast   : 1044430688
broadcast   : 1044477630
broadcast   : 1253554603
broadcast   : 1044561934
average     : 1096756213

But however broadcast has larger size.

the size diff

size broadcast.o
   text    data     bss     dec     hex filename
    137       0       0     137      89 broadcast.o

size memory.o
   text    data     bss     dec     hex filename
    115       0       0     115      73 memory.o

gcc/ChangeLog:

* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.

Support logic shift left/right for avx512 mask type.

gcc/ChangeLog:

* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshrqi3_1): and this, also extend this pattern to avx512
mask registers.
(*lshrhi3_1): And this, also extend this pattern to avx512
mask registers.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.

gcc/testsuite/ChangeLog:

* gcc.target/i386/mask-shift.c: New test.

Daily bump.

analyzer: bulletproof -Wanalyzer-file-leak [PR101547]

gcc/analyzer/ChangeLog:
PR analyzer/101547
* sm-file.cc (file_leak::emit): Handle m_arg being NULL.
(file_leak::describe_final_event): Handle ev.m_expr being NULL.

gcc/testsuite/ChangeLog:
PR analyzer/101547
* gcc.dg/analyzer/pr101547.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix ICE in binding_cluster::purge_state_involving [PR101522]

gcc/analyzer/ChangeLog:
PR analyzer/101522
* store.cc (binding_cluster::purge_state_involving): Don't change
m_map whilst iterating through it.

gcc/testsuite/ChangeLog:
PR analyzer/101522
* g++.dg/analyzer/pr101522.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

OpenACC 'nohost' clause

Do not "compile a version of this procedure for the host".

gcc/
* tree-core.h (omp_clause_code): Add 'OMP_CLAUSE_NOHOST'.
* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
Handle it.
* tree-pretty-print.c (dump_omp_clause): Likewise.
* omp-general.c (oacc_verify_routine_clauses): Likewise.
* gimplify.c (gimplify_scan_omp_clauses)
(gimplify_adjust_omp_clauses): Likewise.
* tree-nested.c (convert_nonlocal_omp_clauses)
(convert_local_omp_clauses): Likewise.
* omp-low.c (scan_sharing_clauses): Likewise.
* omp-offload.c (execute_oacc_device_lower): Update.
gcc/c-family/
* c-pragma.h (pragma_omp_clause): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
gcc/c/
* c-parser.c (c_parser_omp_clause_name): Handle 'nohost'.
(c_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
* c-typeck.c (c_finish_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'nohost'.
(cp_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
* pt.c (tsubst_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
* semantics.c (finish_omp_clauses): Likewise.
gcc/fortran/
* dump-parse-tree.c (show_attr): Update.
* gfortran.h (symbol_attribute): Add 'oacc_routine_nohost' member.
(gfc_omp_clauses): Add 'nohost' member.
* module.c (ab_attribute): Add 'AB_OACC_ROUTINE_NOHOST'.
(attr_bits, mio_symbol_attribute): Update.
* openmp.c (omp_mask2): Add 'OMP_CLAUSE_NOHOST'.
(gfc_match_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
(OACC_ROUTINE_CLAUSES): Add 'OMP_CLAUSE_NOHOST'.
(gfc_match_oacc_routine): Update.
* trans-decl.c (add_attributes_to_decl): Update.
* trans-openmp.c (gfc_trans_omp_clauses): Likewise.
gcc/testsuite/
* c-c++-common/goacc/classify-routine-nohost.c: New file.
* c-c++-common/goacc/classify-routine.c: Update.
* c-c++-common/goacc/routine-2.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: New file.
* c-c++-common/goacc/routine-nohost-2.c: Likewise.
* g++.dg/goacc/template.C: Update.
* gfortran.dg/goacc/classify-routine-nohost.f95: New file.
* gfortran.dg/goacc/classify-routine.f95: Update.
* gfortran.dg/goacc/pure-elemental-procedures-2.f90: Likewise.
* gfortran.dg/goacc/routine-6.f90: Likewise.
* gfortran.dg/goacc/routine-intrinsic-2.f: Likewise.
* gfortran.dg/goacc/routine-module-1.f90: Likewise.
* gfortran.dg/goacc/routine-module-2.f90: Likewise.
* gfortran.dg/goacc/routine-module-3.f90: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
* gfortran.dg/goacc/routine-multiple-directives-2.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: New
file.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2_2.c:
Likewise.
* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise.

Co-Authored-By: Joseph Myers <joseph@codesourcery.com>
Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com>

[OpenACC] Fix '#pragma atomic update' typo in 'g++.dg/goacc/template.C'

    [...]/g++.dg/goacc/template.C:58: warning: ignoring ‘#pragma atomic update’ [-Wunknown-pragmas]
       58 | #pragma atomic update
          |

Small fix-up for r229832 (commit 7a5e4956cc026cba54159d5c764486ac4151db85)
"[openacc] tile, independent, default, private and firstprivate support in
c/++".

gcc/testsuite/
* g++.dg/goacc/template.C: Fix '#pragma atomic update' typo.

analyzer: fix issues with phi handling

The analyzer's state purging code was overzealously purging state
for ssa names that might be used within phi nodes, leading to
false positives from -Wanalyzer-use-of-uninitialized-value.

This patch updates phi handling in the analyzer to fix these issues.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::handle_phi): Add "old_state"
param and use it.
(region_model::update_for_phis): Update so that all of the phi
stmts are effectively handled simultaneously, rather than in
order.
* region-model.h (region_model::handle_phi): Add "old_state"
param.
* state-purge.cc (self_referential_phi_p): Replace with...
(name_used_by_phis_p): ...this new function.
(state_purge_per_ssa_name::process_point): Update to use the
above, so that all phi stmts at a basic block are effectively
considered simultaneously, and only consider the phi arguments for
the pertinent in-edge.
* supergraph.cc (cfg_superedge::get_phi_arg_idx): New.
(cfg_superedge::get_phi_arg): Use the above.
* supergraph.h (cfg_superedge::get_phi_arg_idx): New decl.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/explode-2.c: Remove xfail.
* gcc.dg/analyzer/explode-2a.c: Remove expected leak warning on
while stmt.
* gcc.dg/analyzer/phi-2.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fixes to -fdump-analyzer-state-purge for phi nodes

gcc/analyzer/ChangeLog:
* state-purge.cc (state_purge_annotator::add_node_annotations):
Rather than erroneously always using the NULL in-edge, determine
each relevant in-edge, and print the appropriate data for each
in-edge. Use print_needed to print the data as comma-separated
lists of SSA names.
(print_vec_of_names): Add "within_table" param and use it.
(state_purge_annotator::add_stmt_annotations): Factor out
collation and printing code into...
(state_purge_annotator::print_needed): ...this new function.
* state-purge.h (state_purge_annotator::print_needed): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: show BB index in BEFORE_SUPERNODE's in-edge

This is useful for debugging how the analyzer handles phi nodes.

gcc/analyzer/ChangeLog:
* program-point.cc (function_point::print): Show src BB index at
BEFORE_SUPERNODE.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: tweak dumping of min_expr/max_expr

gcc/analyzer/ChangeLog:
* svalue.cc (infix_p): New.
(binop_svalue::dump_to_pp): Use it to print MIN_EXPR and MAX_EXPR
in prefix form, rather than infix.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Fix typos in a comment.

gcc/ChangeLog:
* tree-ssa-alias.c (walk_aliased_vdefs_1): Fix typos in a comment.

rs6000: Add int128 target check to pr101129.c (PR101531)

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/testsuite/
PR target/101531
* gcc.target/powerpc/pr101129.c: Adjust.

rs6000: Write output to the builtins init file, part 2 of 3

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (write_init_bif_table):
Implement.

rs6000: Write output to the builtins init file, part 1 of 3

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (write_fntype): New
callback function.
(write_fntype_init): New stub function.
(write_init_bif_table): Likewise.
(write_init_ovld_table): New function.
(write_init_file): Implement.

rs6000: Write output to the builtins header file

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c
(write_autogenerated_header): New function.
(write_decls): Likewise.
(write_extern_fntype): New callback function.
(write_header_file): Implement.

rs6000: Write output to the builtin definition include file

2021-06-07 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (write_defines_file):
Implement.

rs6000: Build and store function type identifiers

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (complete_vector_type): New
function.
(complete_base_type): Likewise.
(construct_fntype_id): Likewise.
(parse_bif_entry): Call contruct_fntype_id.
(parse_ovld_entry): Likewise.

rs6000: Parsing of overload input file

2021-06-07 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (ovld_stanza): New struct.
(MAXOVLDSTANZAS): New macro.
(ovld_stanzas): New variable.
(curr_ovld_stanza): Likewise.
(MAXOVLDS): New macro.
(ovlddata): New struct.
(ovlds): New variable.
(curr_ovld): Likewise.
(max_ovld_args): Likewise.
(parse_ovld_entry): New function.
(parse_ovld_stanza): Likewise.
(parse_ovld): Implement.

rs6000: Parsing built-in input file, part 3 of 3

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (parse_bif_attrs):
Implement.

rs6000: Parsing built-in input file, part 2 of 3

2021-07-21 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (parse_args): New function.
(parse_prototype): Implement.

rs6000: Parsing built-in input file, part 1 of 3

2021-07-20 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (bif_stanza): New enum.
(curr_bif_stanza): New variable.
(stanza_entry): New struct.
(stanza_map): New initialized variable.
(enable_string): Likewise.
(fnkinds): New enum.
(typelist): New struct.
(attrinfo): Likewise.
(MAXRESTROPNDS): New macro.
(prototype): New struct.
(MAXBIFS): New macro.
(bifdata): New struct.
(bifs): New variable.
(curr_bif): Likewise.
(bif_order): Likewise.
(bif_index): Likewise.
(fatal): New function.
(stanza_name_to_stanza): Likewise.
(parse_bif_attrs): New stub function.
(parse_prototype): Likewise.
(parse_bif_entry): New function.
(parse_bif_stanza): Likewise.
(parse_bif): Implement.
(set_bif_order): New function.
(create_bif_order): Implement.

rs6000: Main function with stubs for parsing and output

2021-07-20 Bill Schmidt <wschmidt@linux.ibm.com>

gcc/
* config/rs6000/rs6000-gen-builtins.c (rbtree.h): New #include.
(num_bifs): New variable.
(num_ovld_stanzas): Likewise.
(num_ovlds): Likewise.
(parse_codes): New enum.
(bif_rbt): New variable.
(ovld_rbt): Likewise.
(fntype_rbt): Likewise.
(bifo_rbt): Likewise.
(parse_bif): New stub function.
(create_bif_order): Likewise.
(parse_ovld): Likewise.
(write_header_file): Likewise.
(write_init_file): Likewise.
(write_defines_file): Likewise.
(delete_output_files): New function.
(main): Likewise.

x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions

Since

commit 39671f87b2df6a1894cc11a161e4a7949d1ddccd
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Apr 15 05:59:48 2021 -0700

x86: Use crc32 target option for CRC32 intrinsics

enabled OPTION_MASK_ISA_CRC32 for -msse4 and removed TARGET_SSE4_2 check
in sse4_2_crc32<mode> pattens, remove OPTION_MASK_ISA_SSE4_2 from CRC32
_builtin functions.

gcc/

PR target/101549
* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_SSE4_2
from CRC32 _builtin functions.

gcc/testsuite/

PR target/101549
* gcc.target/i386/crc32-6.c: New test.

Fortran: ICE, OOM while calculating sizes of derived type array components

gcc/fortran/ChangeLog:

PR fortran/101514
* target-memory.c (gfc_interpret_derived): Size of array component
of derived type can only be computed here for explicit shape.
* trans-types.c (gfc_get_nodesc_array_type): Do not dereference
NULL pointers.

gcc/testsuite/ChangeLog:

PR fortran/101514
* gfortran.dg/pr101514.f90: New test.

Adjust macro to avoid warning [PR101379].

Resolves:
PR bootstrap/101379 - libatomic arm build failure after r12-2132 due to -Warray-bounds on a constant address

libatomic/ChangeLog:
PR bootstrap/101379
* config/linux/arm/host-config.h (__kernel_helper_version): New
function. Adjust shadow macro.

libstdc++: Make __gnu_cxx::sequence_buffer move-aware [PR101542]

The PR explains that Clang trunk now selects a different constructor
when a non-const sequence_buffer is returned in a context where it
qualifies as an implicitly-movable entity. Because lookup is first
performed using an rvalue, the sequence_buffer(const sequence_buffer&)
constructor gets chosen, which makes a copy instead of a "pseudo-move"
via the sequence_buffer(sequence_buffer&) constructor. The problem isn't
seen with GCC because as noted in the r11-2412 commit log, GCC actually
implements a slightly modified rule that avoids breaking exactly this
type of code.

This patch adds a move constructor to sequence_buffer, so that implicit
or explicit moves will have the same effect, calling the
sequence_buffer(sequence_buffer&) constructor. A move assignment
operator is also added to make move assignment work similarly.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101542
* include/ext/rope (sequence_buffer): Add move constructor and
move assignment operator.
* testsuite/ext/rope/101542.cc: New test.

c++tools, configury: Configure with C++; test checking status [PR98821].

The c++tools configure fragments need to be built with a C++ compiler.

In addition, the stand-alone server uses diagnostic mechanisms in common
with GCC, but needs to define implementations for gcc_assert and
supporting output functions.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/98821 - modules : c++tools configures with CC but code fragments assume CXX.

PR c++/98821

c++tools/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Configure using C++. Pull logic to
detect enabled checking modes; default to release
checking.
* server.cc (AI_NUMERICSERV): Define a fallback value.
(gcc_assert): New.
(gcc_unreachable): New.
(fancy_abort): Only build when checking is enabled.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>

gcov: Fix use of profile info section

If the -fprofile-info-section is used, then the gcov information is registered
in a linker set. This is done by build_gcov_info_var_registration(). The
compiler generated object placed in the section was not marked as referenced,
so once optimization was enabled, this object was optimized away. Mark it as
referenced.

gcc/
* coverage.c (build_gcov_info_var_registration): Mark the object placed
in the linker set as referenced so that it does not get optimized away.

Revert "RISC-V: Detect python and pick best one for calling multilib-generator"

This reverts commit e695f0101a8cacbc29353c5a000731e50b2627e6.

openmp: Fix up omp_check_private [PR101535]

The target data construct shouldn't affect omp_check_private, unless
the decl there is privatized (use_device_* clauses).  The routine
had some code for that, but it just did continue; in a loop that looped
only if the region type is one of selected 4 kinds, so effectively resulted
in return false; instead of looping again.  And not diagnosing lastprivate
(or reduction etc.) on a variable that is private to containing parallel
results in ICEs later on, as there is no original list item to which store
the last result.
The target construct is unclear as it has an implicit parallel region
and it is not obvious if the data privatization clauses on the construct
shall be treated as data privatization on the implicit parallel or just
on the target.  For now treat those as privatization on the implicit
parallel, but treat map clauses as shared on the implicit parallel.

2021-07-21  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/101535
* gimplify.c (omp_check_private): Properly skip ORT_TARGET_DATA
contexts in which decl isn't privatized and for ORT_TARGET return
false if decl is mapped.

* c-c++-common/gomp/pr101535-1.c: New test.
* c-c++-common/gomp/pr101535-2.c: New test.

c++: Ensure OpenMP reduction with reference type references complete type [PR101516]

The following testcase ICEs because we haven't verified if reduction decl
has reference type that TREE_TYPE of the reference is a complete type,
require_complete_type on the decl doesn't ensure that.

2021-07-21 Jakub Jelinek <jakub@redhat.com>

PR c++/101516
* semantics.c (finish_omp_reduction_clause): Also call
complete_type_or_else and return true if it fails.

* g++.dg/gomp/pr101516.C: New test.

Fortran: Fix bind(C) character length checks

gcc/fortran/ChangeLog:

* decl.c (gfc_verify_c_interop_param): Update for F2008 + F2018
changes; reject unsupported bits with 'Error: Sorry,'.
* trans-expr.c (gfc_conv_procedure_call): Fix condition to
For using CFI descriptor with characters.

gcc/testsuite/ChangeLog:

* gfortran.dg/iso_c_binding_char_1.f90: Update dg-error.
* gfortran.dg/pr32599.f03: Use -std=-f2003 + update comment.
* gfortran.dg/bind_c_char_10.f90: New test.
* gfortran.dg/bind_c_char_6.f90: New test.
* gfortran.dg/bind_c_char_7.f90: New test.
* gfortran.dg/bind_c_char_8.f90: New test.
* gfortran.dg/bind_c_char_9.f90: New test.

unroll: Run VN on unrolled-and-jammed loops

Unroll and jam can sometimes leave redundancies.  E.g. for:

  for (int j = 0; j < 100; ++j)
    for (int i = 0; i < 100; ++i)
      x[i] += y[i] * z[j][i];

the new loop will do the equivalent of:

  for (int j = 0; j < 100; j += 2)
    for (int i = 0; i < 100; ++i)
      {
        x[i] += y[i] * z[j][i];
        x[i] += y[i] * z[j + 1][i];
      }

with two reads of y[i] and with a round trip through memory for x[i].

At the moment these redundancies survive till vectorisation, so if
vectorisation succeeds, we're reliant on being able to remove the
redundancies from the vector form.  This can be hard to do if
a vector loop uses predication.  E.g. on SVE we end up with:

.L3:
        ld1w    z3.s, p0/z, [x3, x0, lsl 2]
        ld1w    z0.s, p0/z, [x5, x0, lsl 2]
        ld1w    z1.s, p0/z, [x2, x0, lsl 2]
        mad     z1.s, p1/m, z0.s, z3.s
        ld1w    z2.s, p0/z, [x4, x0, lsl 2]
        st1w    z1.s, p0, [x3, x0, lsl 2]    // store to x[i]
        ld1w    z1.s, p0/z, [x3, x0, lsl 2]  // load back from x[i]
        mad     z0.s, p1/m, z2.s, z1.s
        st1w    z0.s, p0, [x3, x0, lsl 2]
        add     x0, x0, x6
        whilelo p0.s, w0, w1
        b.any   .L3

This patch runs a value-numbering pass on loops after a successful
unroll-and-jam, which gets rid of the unnecessary load and gives
a more accurate idea of vector costs.  Unfortunately the redundant
store still persists without a pre-vect DSE, but that feels like
a separate issue.

Note that the pass requires the loop to have a single exit,
hence the simple calculation of exit_bbs.

gcc/
* gimple-loop-jam.c: Include tree-ssa-sccvn.h.
(tree_loop_unroll_and_jam): Run value-numbering on a loop that
has been successfully unrolled.

gcc/testsuite/
* gcc.dg/unroll-10.c: New test.

unroll: Avoid unnecessary tail loops for constant niters

unroll and jam can decide to unroll the outer loop of a nest like:

  for (int j = 0; j < n; ++j)
    for (int i = 0; i < n; ++i)
      x[i] += __builtin_expf (y[j][i]);

It then uses a tail loop to handle any left-over iterations.

However, the code is structured so that this tail loop is always used.
If n is a multiple of the unroll factor UF, the final UF iterations will
use the tail loop rather than the unrolled loop.

“Fixing” that for variable loop counts would mean introducing another
runtime test: a branch around the tail loop if there are no more
iterations.  There's at least an argument that the overhead of doing
that test might not pay for itself.

But we use this structure even if the iteration count is provably
a multiple of UF at compile time.  E.g. with s/n/100/ and an
unroll factor of 2, the first 98 iterations use the unrolled loop
and the final 2 iterations use the original loop.

This patch makes the unroller avoid a tail loop in that case.
The end result seemed easier to follow if variables were declared
at the point of initialisation, so that it's more obvious which
ones are meaningful even when there's no tail loop.

gcc/
* tree-ssa-loop-manip.c (determine_exit_conditions): Return a null
exit condition if no tail loop is needed, and if the original exit
condition should therefore be kept as-is.
(tree_transform_and_unroll_loop): Handle that case here too.

gcc/testsuite/
* gcc.dg/unroll-9.c: New test/

predcom: Refactor more using auto_vec

This patch follows Martin's suggestion at the link[1] to do more
refactorings by:
  - Adding m_ prefix for class pcom_worker member variables.
  - Using auto_vec instead of vec among class pcom_worker,
    chain, component and comp_ptrs.

The changes in tree-data-ref.[ch] is required, without it the
destruction of auto_vec instance could try to double free the
memory pointed by m_vec.

Bootstrapped and regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573424.html

gcc/ChangeLog:

* tree-data-ref.c (free_dependence_relations): Adjust to pass vec
by reference.
(free_data_refs): Likewise.
* tree-data-ref.h (free_dependence_relations): Likewise.
(free_data_refs): Likewise.
* tree-predcom.c (struct chain): Use auto_vec instead of vec for
members.
(struct component): Likewise.
(pcom_worker::pcom_worker): Adjust for auto_vec and renaming changes.
(pcom_worker::~pcom_worker): Likewise.
(pcom_worker::release_chain): Adjust as auto_vec changes.
(pcom_worker::loop): Rename to ...
(pcom_worker::m_loop): ... this.
(pcom_worker::datarefs): Rename to ...
(pcom_worker::m_datarefs): ... this.  Use auto_vec instead of vec.
(pcom_worker::dependences): Rename to ...
(pcom_worker::m_dependences): ... this.  Use auto_vec instead of vec.
(pcom_worker::chains): Rename to ...
(pcom_worker::m_chains): ... this.  Use auto_vec instead of vec.
(pcom_worker::looparound_phis): Rename to ...
(pcom_worker::m_looparound_phis): ... this.  Use auto_vec instead of
vec.
(pcom_worker::cache): Rename to ...
(pcom_worker::m_cache): ... this.  Use auto_vec instead of vec.
(pcom_worker::release_chain): Adjust for auto_vec changes.
(pcom_worker::release_chains): Adjust for auto_vec and renaming
changes.
(release_component): Remove.
(release_components): Adjust for release_component removal.
(component_of): Adjust to use vec.
(merge_comps): Likewise.
(pcom_worker::aff_combination_dr_offset): Adjust for renaming changes.
(pcom_worker::determine_offset): Likewise.
(class comp_ptrs): Remove.
(pcom_worker::split_data_refs_to_components): Adjust for renaming
changes, for comp_ptrs removal with auto_vec.
(pcom_worker::suitable_component_p): Adjust for renaming changes.
(pcom_worker::filter_suitable_components): Adjust for release_component
removal.
(pcom_worker::valid_initializer_p): Adjust for renaming changes.
(pcom_worker::find_looparound_phi): Likewise.
(pcom_worker::add_looparound_copies): Likewise.
(pcom_worker::determine_roots_comp): Likewise.
(pcom_worker::single_nonlooparound_use): Likewise.
(pcom_worker::execute_pred_commoning_chain): Likewise.
(pcom_worker::execute_pred_commoning): Likewise.
(pcom_worker::try_combine_chains): Likewise.
(pcom_worker::prepare_initializers_chain): Likewise.
(pcom_worker::prepare_initializers): Likewise.
(pcom_worker::prepare_finalizers_chain): Likewise.
(pcom_worker::prepare_finalizers): Likewise.
(pcom_worker::tree_predictive_commoning_loop): Likewise.

Daily bump.

libsanitizer: Bump asan/tsan versions

Bump asan/tsan versions for the upstream commit:

commit acf0a6428681dccac803984bfbb1e3e54248f090
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Fri Jul 2 02:42:38 2021 +0200

    [sanitizer] Fix __sanitizer_kernel_sigset_t endianness issue

    setuid(0) hangs on SystemZ under TSan because TSan's BackgroundThread
    ignores SIGSETXID. This in turn happens because internal_sigdelset()
    messes up the mask bits on big-endian system due to how
    __sanitizer_kernel_sigset_t is defined.

    Commit d9a1a53b8d80 ("[ESan] [MIPS] Fix workingset-signal-posix.cpp on
    MIPS") fixed this for MIPS by adjusting the __sanitizer_kernel_sigset_t
    definition. Generalize this by defining __SANITIZER_KERNEL_NSIG based
    on kernel's _NSIG and using uptr[] for __sanitizer_kernel_sigset_t.sig
    on all platforms.

    Reviewed By: dvyukov

    Differential Revision: https://reviews.llvm.org/D105629

which changed __sanitizer_kernel_sigset_t and changed the ABI for function

void __sanitizer_syscall_post_impl_rt_sigaction
  (long int, long int,
   const __sanitizer::__sanitizer_kernel_sigaction_t*,
   __sanitizer::__sanitizer_kernel_sigaction_t*,
   SIZE_T);

* asan/libtool-version: Bump version.
* tsan/libtool-version: Likewise.

libsanitizer: Update LOCAL_PATCHES

* LOCAL_PATCHES: Update to the corresponding revision.

libsanitizer: Apply local patches

libsanitizer: Merge with upstream

Merged revision: 7704fedfff6ef5676adb6415f3be0ac927d1a746

Correct stpcpy offset computation for -Warray-bounds et al. [PR101397].

Resolves:
PR middle-end/101397 - spurious warning writing to the result of stpcpy minus 1

gcc/ChangeLog:

PR middle-end/101397
* builtins.c (gimple_call_return_array): Add argument. Correct
offsets for memchr, mempcpy, stpcpy, and stpncpy.
(compute_objsize_r): Adjust offset computation for argument returning
built-ins.

gcc/testsuite/ChangeLog:

PR middle-end/101397
* gcc.dg/Warray-bounds-80.c: New test.
* gcc.dg/Warray-bounds-81.c: New test.
* gcc.dg/Warray-bounds-82.c: New test.
* gcc.dg/Warray-bounds-83.c: New test.
* gcc.dg/Warray-bounds-84.c: New test.
* gcc.dg/Wstringop-overflow-46.c: Adjust expected output.

libstdc++: Fix create_directories to resolve symlinks [PR101510]

When filesystem__create_directories checks to see if the path already
exists and resovles to a directory, it uses filesystem::symlink_status,
which means it reports an error if the path is a symlink. It should use
filesystem::status, so that the target directory is detected, and no
error is reported.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (fs::create_directories): Use status
instead of symlink_status.
* src/filesystem/ops.cc (fs::create_directories): Likewise.
* testsuite/27_io/filesystem/operations/create_directories.cc:
* testsuite/27_io/filesystem/operations/create_directory.cc: Do
not test with symlinks on Windows.
* testsuite/experimental/filesystem/operations/create_directories.cc:
* testsuite/experimental/filesystem/operations/create_directory.cc:
Do not test with symlinks on Windows.

Handle all UBSAN built-ins in -Wuninitialized [PR101300].

Resolves:
PR middle-end/101300 - -fsanitize=undefined suppresses -Wuninitialized for a VLA read at -O0

gcc/ChangeLog:
PR middle-end/101300
* tree-ssa-uninit.c (check_defs): Handle UBSAN built-ins.

gcc/testsuite/ChangeLog:
PR middle-end/101300
* gcc.dg/uninit-pr101300.c: New test.

Attach MEM_EXPR information when flushing BLKmode args to the stack - V2

gcc/
* function.c (assign_parm_setup_block): Use adjust_address instead
of change_address to preserve MEM_EXPR and friends.

Adjust by-value function vec arguments to by-reference.

gcc/c-family/ChangeLog:

* c-common.c (c_build_shufflevector): Adjust by-value argument to
by-const-reference.
* c-common.h (c_build_shufflevector): Same.

gcc/c/ChangeLog:

* c-tree.h (c_build_function_call_vec): Adjust by-value argument to
by-const-reference.
* c-typeck.c (c_build_function_call_vec): Same.

gcc/ChangeLog:

* cfgloop.h (single_likely_exit): Adjust by-value argument to
by-const-reference.
* cfgloopanal.c (single_likely_exit): Same.
* cgraph.h (struct cgraph_node): Same.
* cgraphclones.c (cgraph_node::create_virtual_clone): Same.
* genautomata.c (merge_states): Same.
* genextract.c (VEC_char_to_string): Same.
* genmatch.c (dt_node::gen_kids_1): Same.
(walk_captures): Adjust by-value argument to by-reference.
* gimple-ssa-store-merging.c (check_no_overlap): Adjust by-value argument
to by-const-reference.
* gimple.c (gimple_build_call_vec): Same.
(gimple_build_call_internal_vec): Same.
(gimple_build_switch): Same.
(sort_case_labels): Same.
(preprocess_case_label_vec_for_gimple): Adjust by-value argument to
by-reference.
* gimple.h (gimple_build_call_vec): Adjust by-value argument to
by-const-reference.
(gimple_build_call_internal_vec): Same.
(gimple_build_switch): Same.
(sort_case_labels): Same.
(preprocess_case_label_vec_for_gimple): Adjust by-value argument to
by-reference.
* haifa-sched.c (calc_priorities): Adjust by-value argument to
by-const-reference.
(sched_init_luids): Same.
(haifa_init_h_i_d): Same.
* ipa-cp.c (ipa_get_indirect_edge_target_1): Same.
(adjust_callers_for_value_intersection): Adjust by-value argument to
by-reference.
(find_more_scalar_values_for_callers_subset): Adjust by-value argument to
by-const-reference.
(find_more_contexts_for_caller_subset): Same.
(find_aggregate_values_for_callers_subset): Same.
(copy_useful_known_contexts): Same.
* ipa-fnsummary.c (remap_edge_summaries): Same.
(remap_freqcounting_predicate): Same.
* ipa-inline.c (add_new_edges_to_heap): Adjust by-value argument to
by-reference.
* ipa-predicate.c (predicate::remap_after_inlining): Adjust by-value argument
to by-const-reference.
* ipa-predicate.h (predicate::remap_after_inlining): Same.
* ipa-prop.c (ipa_find_agg_cst_for_param): Same.
* ipa-prop.h (ipa_find_agg_cst_for_param): Same.
* ira-build.c (ira_loop_tree_body_rev_postorder): Same.
* read-rtl.c (add_overload_instance): Same.
* rtl.h (native_decode_rtx): Same.
(native_decode_vector_rtx): Same.
* sched-int.h (sched_init_luids): Same.
(haifa_init_h_i_d): Same.
* simplify-rtx.c (native_decode_vector_rtx): Same.
(native_decode_rtx): Same.
* tree-call-cdce.c (gen_shrink_wrap_conditions): Same.
(shrink_wrap_one_built_in_call_with_conds): Same.
(shrink_wrap_conditional_dead_built_in_calls): Same.
* tree-data-ref.c (create_runtime_alias_checks): Same.
(compute_all_dependences): Same.
* tree-data-ref.h (compute_all_dependences): Same.
(create_runtime_alias_checks): Same.
(index_in_loop_nest): Same.
* tree-if-conv.c (mask_exists): Same.
* tree-loop-distribution.c (class loop_distribution): Same.
(loop_distribution::create_rdg_vertices): Same.
(dump_rdg_partitions): Same.
(debug_rdg_partitions): Same.
(partition_contains_all_rw): Same.
(loop_distribution::distribute_loop): Same.
* tree-parloops.c (oacc_entry_exit_ok_1): Same.
(oacc_entry_exit_single_gang): Same.
* tree-ssa-loop-im.c (hoist_memory_references): Same.
(loop_suitable_for_sm): Same.
* tree-ssa-loop-niter.c (bound_index): Same.
* tree-ssa-reassoc.c (update_ops): Same.
(swap_ops_for_binary_stmt): Same.
(rewrite_expr_tree): Same.
(rewrite_expr_tree_parallel): Same.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Same.
* tree-ssa-sccvn.h (ao_ref_init_from_vn_reference): Same.
* tree-ssa-structalias.c (process_all_all_constraints): Same.
(make_constraints_to): Same.
(handle_lhs_call): Same.
(find_func_aliases_for_builtin_call): Same.
(sort_fieldstack): Same.
(check_for_overlaps): Same.
* tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Same.
(vect_create_cond_for_unequal_addrs): Same.
(vect_create_cond_for_lower_bounds): Same.
(vect_create_cond_for_alias_checks): Same.
* tree-vect-slp-patterns.c (vect_validate_multiplication): Same.
* tree-vect-slp.c (vect_analyze_slp_instance): Same.
(vect_make_slp_decision): Same.
(vect_slp_bbs): Same.
(duplicate_and_interleave): Same.
(vect_transform_slp_perm_load): Same.
(vect_schedule_slp): Same.
* tree-vectorizer.h (vect_transform_slp_perm_load): Same.
(vect_schedule_slp): Same.
(duplicate_and_interleave): Same.
* tree.c (build_vector_from_ctor): Same.
(build_vector): Same.
(check_vector_cst): Same.
(check_vector_cst_duplicate): Same.
(check_vector_cst_fill): Same.
(check_vector_cst_stepped): Same.
* tree.h (build_vector_from_ctor): Same.

PR 100167: Fix vector long long multiply/divide tests on power10.

This patch updates the vector long long multiply and divide tests to
supply the correct code information if power10 code generation is used.

2021-06-18 Michael Meissner <meissner@linux.ibm.com>

gcc/testsuite/
PR testsuite/100167
* gcc.target/powerpc/fold-vec-div-longlong.c: Fix expected code
generation on power10.
* gcc.target/powerpc/fold-vec-mult-longlong.c: Likewise.

rs6000: Fix up easy_vector_constant_msb handling [PR101384]

The following gcc.dg/pr101384.c testcase is miscompiled on
powerpc64le-linux.
easy_altivec_constant has code to try construct vector constants with
different element sizes, perhaps different from CONST_VECTOR's mode. But as
written, that works fine for vspltis[bhw] cases, but not for the vspltisw
x,-1; vsl[bhw] x,x,x case, because that creates always a V16QImode, V8HImode
or V4SImode constant containing broadcasted constant with just the MSB set.
The vspltis_constant function etc. expects the vspltis[bhw] instructions
where the small [-16..15] or even [-32..30] constant is sign-extended to the
remaining step bytes, but that is not the case for the 0x80...00 constants,
with step > 1 we can't handle e.g.
{ 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff, 0x80, 0xff, 0xff, 0xff }
vectors but do want to handle e.g.
{ 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80, 0, 0, 0, 0x80 }
and similarly with copies > 1 we do want to handle e.g.
{ 0x80808080, 0x80808080, 0x80808080, 0x80808080 }.

2021-07-20 Jakub Jelinek <jakub@redhat.com>

PR target/101384
* config/rs6000/rs6000-protos.h (easy_altivec_constant): Change return
type from bool to int.
* config/rs6000/rs6000.c (vspltis_constant): Fix up handling the
EASY_VECTOR_MSB case if either step or copies is not 1.
(vspltis_shifted): Fix comment typo.
(easy_altivec_constant): Change return type from bool to int, instead
of returning true return byte size of the element mode that should be
used to synthetize the constant.
* config/rs6000/predicates.md (easy_vector_constant_msb): Require
that vspltis_shifted is 0, handle the case where easy_altivec_constant
assumes using different vector mode from CONST_VECTOR's mode.
* config/rs6000/altivec.md (easy_vector_constant_msb splitter): Use
easy_altivec_constant to determine mode in which -1 >> -1 should be
performed, use rs6000_expand_vector_init instead of gen_vec_initv4sisi.

* gcc.dg/pr101384.c: New test.
* gcc.target/powerpc/pr101384-1.c: New test.
* gcc.target/powerpc/pr101384-2.c: New test.

libstdc++: fix is_default_constructible for hash containers [PR 100863]

The recent change to _Hashtable_ebo_helper for this PR broke the
is_default_constructible trait for a hash container with a non-default
constructible allocator. That happens because the constructor needs to
be user-provided in order to initialize the member, and so is not
defined as deleted when the type is not default constructible.

By making _Hashtable derive from _Enable_special_members we can ensure
that the default constructor for the std::unordered_xxx containers is
deleted when it would be ill-formed. This makes the trait give the
correct answer.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/100863
* include/bits/hashtable.h (_Hashtable): Conditionally delete
default constructor by deriving from _Enable_special_members.
* testsuite/23_containers/unordered_map/cons/default.cc: New test.
* testsuite/23_containers/unordered_set/cons/default.cc: New test.

aarch64: Tweak old vect-* tests to avoid new FAILs

I'm not sure what these test were originally designed to test.
vaddv and vmaxv seem to be testing for vectorisation, with associated
scan-assembler tests.  But they use arm_neon.h functions to test
the results, which would presumably also trip many of the scans.
That was probably what the split into vect-fmax-fmin.c and
vect-fmaxv-fminv-compile.c was supposed to avoid.

Anyway, the tests started failing after the recent change to allow
staged reductions for epilogue loops.  And epilogues came into play
because the reduction loops iterate LANES-1 rather than LANES times.
(vmaxv was trying to iterate LANES times, but the gimple optimisers
outsmarted it.  The other two explicitly had a count of LANES-1.)

Just suppressing epilogues causes other issues for vaddv and vmaxv.
The easiest fix therefore seemed to be to use an asm to hide the
initial value of the vmaxv loop (so that it really does iterate
LANES times) and then make the others match that style.

gcc/testsuite/
PR testsuite/101506
* gcc.target/aarch64/vect-vmaxv.c: Use an asm to hide the
true initial value of the reduction from the vectorizer.
* gcc.target/aarch64/vect-vaddv.c: Likewise.  Make the vector
loop operate on exactly LANES (rather than LANES-1) iterations.
* gcc.target/aarch64/vect-fmaxv-fminv.x: Likewise.

libstdc++: Add more tests for filesystem::create_directory [PR101510]

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.

debug/101473 - apply debug prefix maps before checksumming DIEs

The following makes sure to apply the debug prefix maps to filenames
before checksumming DIEs to create the global symbol for the CU DIE
used by LTO to link the late debug to the early debug. This avoids
binary differences (in said symbol) when compiling with toolchains
installed under a different path and that compensated with appropriate
-fdebug-prefix-map options.

The easiest and most scalable way is to record both the unmapped
and the remapped filename in the dwarf_file_data so the remapping
process takes place at a single point and only once (otherwise it
creates GC garbage at each point doing that).

2021-07-20 Richard Biener <rguenther@suse.de>

PR debug/101473
* dwarf2out.h (dwarf_file_data): Add key member.
* dwarf2out.c (dwarf_file_hasher::equal): Compare key.
(dwarf_file_hasher::hash): Hash key.
(lookup_filename): Remap the filename and store it in the
filename member of dwarf_file_data when creating a new
dwarf_file_data.
(file_name_acquire): Do not remap the filename again.
(maybe_emit_file): Likewise.