Eric Botcazou [Tue, 14 Sep 2021 09:33:05 +0000 (11:33 +0200)]
Fix PR ada/101970
This is a regression present on the mainline and 11 branch in the form of an
ICE for an enumeration type with a full signed representation for its size.
gcc/ada/
PR ada/101970
* exp_attr.adb (Expand_N_Attribute_Reference) <Attribute_Enum_Rep>:
Use an unchecked conversion instead of a regular conversion in the
enumeration case and remove Conversion_OK flag in the integer case.
<Attribute_Pos>: Remove superfluous test.
Jakub Jelinek [Tue, 14 Sep 2021 08:38:17 +0000 (10:38 +0200)]
testsuite: Use sync_long_long instead of sync_int_long for atomic-29.c test
As discussed, the test tests atomics on doubles which are 64-bit and so we
should use sync_long_long effective target instead of sync_int_long that
covers 64-bit atomics only on 64-bit arches. I've added -march=pentium
to follow what is documented for sync_long_long, I guess -march=zarch should
be added for s390* too, but haven't tested that.
And using sync_long_long found a syntax error in that effective target
implementation, so I've fixed that too.
2021-09-14 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/gomp/atomic-29.c: Add -march=pentium
dg-additional-options for ia32. Use sync_long_long effective target
instead of sync_int_long.
* lib/target-supports.exp (check_effective_target_sync_long_long): Fix
a syntax error.
Jakub Jelinek [Tue, 14 Sep 2021 08:31:42 +0000 (10:31 +0200)]
openmp: Add testing checks (whether lhs appears in operands at all) to more trees
This patch adds testing checks (goa_stabilize_expr with NULL pre_p) for more
tree codes, so that we don't gimplify their operands individually unless lhs
appears in them. Also, so that we don't have exponential compile time complexity
with the added checks, I've added a depth computation, we don't expect lhs
to be found in depth 8 or above as all the atomic forms must have x expression
in specific places in the expressions.
2021-09-14 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (goa_stabilize_expr): Add depth argument, propagate
it to recursive calls, for depth above 7 just gimplify or return.
Perform a test even for MODIFY_EXPR, ADDR_EXPR, COMPOUND_EXPR with
__builtin_clear_padding and TARGET_EXPR.
(gimplify_omp_atomic): Adjust goa_stabilize_expr callers.
Eric Botcazou [Tue, 14 Sep 2021 08:32:00 +0000 (10:32 +0200)]
Implement PR ada/101385
For consistency's sake with -Wall & -w, this makes -Werror imply -gnatwe.
gcc/ada/
PR ada/101385
* doc/gnat_ugn/building_executable_programs_with_gnat.rst
(-Wall): Minor fixes.
(-w): Likewise.
(-Werror): Document that it also sets -gnatwe by default.
* gcc-interface/lang-specs.h (ada): Expand -gnatwe if -Werror is
passed and move expansion of -gnatw switches to before -gnatez.
Eric Botcazou [Tue, 14 Sep 2021 07:41:36 +0000 (09:41 +0200)]
Give more informative error message for by-reference types
Recent compilers enforce more strictly the RM C.6(18) clause, which says
that volatile record types are by-reference types. This changes the typical
error message now given in these cases.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <is_type>: Declare new
constant. Adjust error message issued by validate_size in the case
of by-reference types.
(validate_size): Always use the error strings passed by the caller.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
Jason Merrill [Tue, 14 Sep 2021 02:35:18 +0000 (22:35 -0400)]
c++: Fix warning on 32-bit x86
My C++17 hardware interference sizes patch caused a bogus warning on 32-bit
x86, where we have a default L1 cache line size of 0, and the front end
complained that the default constructive interference size of 64 was larger
than that.
gcc/cp/ChangeLog:
* decl.c (cxx_init_decl_processing): Don't warn if L1 cache line
size is smaller than maxalign.
Harald Anlauf [Mon, 13 Sep 2021 17:28:10 +0000 (19:28 +0200)]
Fortran - ensure simplification of bounds of array-valued named constants
gcc/fortran/ChangeLog:
PR fortran/82314
* decl.c (add_init_expr_to_sym): For proper initialization of
array-valued named constants the array bounds need to be
simplified before adding the initializer.
gcc/testsuite/ChangeLog:
PR fortran/82314
* gfortran.dg/pr82314.f90: New test.
Thomas Schwinge [Mon, 30 Aug 2021 20:36:47 +0000 (22:36 +0200)]
Don't maintain a warning spec for 'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' [PR101574]
This resolves PR101574 "gcc/sparseset.h:215:20: error: suggest parentheses
around assignment used as truth value [-Werror=parentheses]", as (bogusly)
reported at commit a61f6afbee370785cf091fe46e2e022748528307:
In file included from [...]/source-gcc/gcc/lra-lives.c:43:
[...]/source-gcc/gcc/lra-lives.c: In function ‘void make_hard_regno_dead(int)’:
[...]/source-gcc/gcc/sparseset.h:215:20: error: suggest parentheses around assignment used as truth value [-Werror=parentheses]
215 | && (((ITER) = sparseset_iter_elm (SPARSESET)) || 1); \
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[...]/source-gcc/gcc/lra-lives.c:304:3: note: in expansion of macro ‘EXECUTE_IF_SET_IN_SPARSESET’
304 | EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason Merrill [Thu, 15 Jul 2021 19:30:17 +0000 (15:30 -0400)]
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
Martin Liska [Thu, 12 Aug 2021 13:20:43 +0000 (15:20 +0200)]
i386: support micro-levels in target{,_clone} attrs [PR101696]
As mentioned in the PR, we do miss supports target micro-architectures
in target and target_clone attribute. While the levels
x86-64 x86-64-v2 x86-64-v3 x86-64-v4 are supported values by -march
option, they are actually only aliases for k8 CPU. That said, they are more
closer to __builtin_cpu_supports function and we decided to implement
it there.
PR target/101696
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (cpu_indicator_init): Add support
for x86-64 micro levels for __builtin_cpu_supports.
* common/config/i386/i386-cpuinfo.h (enum feature_priority):
Add priorities for the micro-arch levels.
(enum processor_features): Add new features.
* common/config/i386/i386-isas.h: Add micro-arch features.
* config/i386/i386-builtins.c (get_builtin_code_for_version):
Support the micro-arch levels by callsing
__builtin_cpu_supports.
* doc/extend.texi: Document that the levels are support by
__builtin_cpu_supports.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv30.C: New test.
* gcc.target/i386/mvc16.c: New test.
* gcc.target/i386/builtin_target.c (CHECK___builtin_cpu_supports):
New.
Andrew Pinski [Thu, 2 Sep 2021 07:08:22 +0000 (07:08 +0000)]
[aarch64] Fix target/95969: __builtin_aarch64_im_lane_boundsi interferes with gimple
This patch adds simple folding of __builtin_aarch64_im_lane_boundsi where
we are not going to error out. It fixes the problem by the removal
of the function from the IR.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
Andrew Pinski [Mon, 13 Sep 2021 06:56:57 +0000 (06:56 +0000)]
Remove m32r{,le}-*-linux* support from GCC
m32r support never made it to glibc and the support for the Linux kernel
was removed with 4.18. It does not remove much but no reason to keep
around a port which never worked or one which the support in other
projects is gone.
OK? Checked to make sure m32r-linux and m32rle-linux were rejected
when building.
contrib/ChangeLog:
* config-list.mk: Remove m32r-linux and m32rle-linux
from the list.
gcc/ChangeLog:
* config.gcc: Add m32r-*-linux* and m32rle-*-linux*
to the Unsupported targets list.
Remove support for m32r-*-linux* and m32rle-*-linux*.
* config/m32r/linux.h: Removed.
* config/m32r/t-linux: Removed.
aarch64: PR target/102252 Invalid addressing mode for SVE load predicate
In the testcase we generate invalid assembly for an SVE load predicate instruction.
The RTL for the insn is:
(insn 9 8 10 (set (reg:VNx16BI 68 p0)
(mem:VNx16BI (plus:DI (mult:DI (reg:DI 1 x1 [93])
(const_int 8 [0x8]))
(reg/f:DI 0 x0 [92])) [2 work_3(D)->array[offset_4(D)]+0 S8 A16]))
That addressing mode is not valid for the instruction [1] as it only accepts the addressing mode:
[<Xn|SP>{, #<imm>, MUL VL}]
This patch rejects the register index form for SVE predicate modes.
Bootstrapped and tested on aarch64-none-linux-gnu.
Now that the jump thread back registry has been split into the generic
copier and the custom (old) copier, it becomes trivial to remove the
FSM bits from the jump threaders.
First, there's no need for an EDGE_FSM_THREAD type. The only reason
we were looking at the threading type was to determine what type of
copier to use, and now that the copier has been split, there's no need
to even look. However, there is one check in register_jump_thread
where we verify that only the generic copier can thread through
back-edges. I've removed that check in favor of a flag passed to the
constructor.
I've also removed all the FSM references from the code and tests.
Interestingly, some tests weren't even testing the right thing. They
were testing for "FSM" which would catch jump thread paths as well as
the backward threader *failing* on registering a path. *big eye roll*
The only remaining code that was actually checking for EDGE_FSM_THREAD
was adjust_paths_after_duplication, and the checks could be written
without looking at the edge type at all. For the record, the code
there is horrible: it's convoluted, hard to read, and doesn't have any
tests. I'd smack myself if I could go back in time.
All that remains are the FSM references in the --param's themselves.
I think we should s/fsm/threader/, since I envision a day when we can
share the cost basis code between the threaders. However, I don't
know what the proper procedure is for renaming existing compiler
options.
By the way, param_fsm_maximum_phi_arguments is no longer relevant
after the rewrite. We can nuke that one right away.
Patrick Palka [Mon, 13 Sep 2021 14:29:32 +0000 (10:29 -0400)]
c++: parameter pack inside constexpr if [PR101764]
Here when partially instantiating the first pack expansion, substitution
into the condition of the constexpr if yields a still-dependent tree, so
tsubst_expr returns an IF_STMT with an unsubstituted IF_COND and with
IF_STMT_EXTRA_ARGS added to. Hence after partial instantiation the pack
expansion pattern still refers to the unlowered parameter pack 'ts' of
level 2, and it's thusly recorded in the new PACK_EXPANSION_PARAMETER_PACKS.
During the subsequent final instantiation of the regenerated lambda we
crash in tsubst_pack_expansion because it can't find an argument pack
for this unlowered 'ts', due to the level mismatch. (Likewise when the
constexpr if is replaced by a requires-expr, which also uses the extra
args mechanism for avoiding partial instantiation.)
So essentially, a pack expansion pattern that contains an "extra args"
tree doesn't play well with partial instantiation. This patch fixes
this by forcing such pack expansions to use the extra args mechanism as
well.
PR c++/101764
gcc/cp/ChangeLog:
* cp-tree.h (PACK_EXPANSION_FORCE_EXTRA_ARGS_P): New accessor
macro.
* pt.c (has_extra_args_mechanism_p): New function.
(find_parameter_pack_data::found_extra_args_tree_p): New data
member.
(find_parameter_packs_r): Set ppd->found_extra_args_tree_p
appropriately.
(make_pack_expansion): Set PACK_EXPANSION_FORCE_EXTRA_ARGS_P if
ppd.found_extra_args_tree_p.
(use_pack_expansion_extra_args_p): Return true if there were
unsubstituted packs and PACK_EXPANSION_FORCE_EXTRA_ARGS_P.
(tsubst_pack_expansion): Pass the pack expansion to
use_pack_expansion_extra_args_p.
H.J. Lu [Thu, 26 Aug 2021 12:31:50 +0000 (05:31 -0700)]
x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES
1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation
with 256-bit AVX instructions.
2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces
operations with 256-bit AVX instructions.
They are enabled only for Intel Alder Lake and Intel processors with
AVX512.
gcc/
PR target/101935
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): New.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Check TARGET_AVX256_MOVE_BY_PIECES and
TARGET_AVX256_STORE_BY_PIECES instead of
TARGET_AVX256_SPLIT_UNALIGNED_LOAD and
TARGET_AVX256_SPLIT_UNALIGNED_STORE.
(STORE_MAX_PIECES): Check TARGET_AVX256_STORE_BY_PIECES instead
of TARGET_AVX256_SPLIT_UNALIGNED_STORE.
* config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): New.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
We need to use the pointer equivalence tracking from evrp in the jump
threader. Instead of moving it to some *evrp.h header, it's cleaner for
it to live in its own file, since it's completely independent and not
evrp specific.
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations. So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.
Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.
arm: expand handling of movmisalign for DImode [PR102125]
DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.
gcc/ChangeLog:
PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.
rtl: directly handle MEM in gen_highpart [PR102125]
gen_lowpart_general handles forming a lowpart of a MEM by using
adjust_address to rework and validate a new version of the MEM.
Do the same for gen_highpart rather than calling simplify_gen_subreg
for this case.
gcc/ChangeLog:
PR target/102125
* emit-rtl.c (gen_highpart): Use adjust_address to handle
MEM rather than calling simplify_gen_subreg.