IBM zSystems: Fix function_ok_for_sibcall [PR106355]
For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.
Fix some indentation whitespace, too.
gcc/ChangeLog:
PR target/106355
* config/s390/s390.c (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.
gcc/testsuite/ChangeLog:
* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.
Martin Liska [Mon, 24 Oct 2022 13:34:39 +0000 (15:34 +0200)]
x86: fix VENDOR_MAX enum value
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in practice
since GCC simply ignored the extension until now. (The GAS
definition is OK.)
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.
Kewen Lin [Mon, 26 Sep 2022 05:33:18 +0000 (00:33 -0500)]
rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.
PR target/96072
gcc/ChangeLog:
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.
Pat Haugen [Mon, 17 Oct 2022 19:53:11 +0000 (14:53 -0500)]
Fix register count when not splitting Complex IEEE 128-bit args.
For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.
Richard Biener [Mon, 8 Aug 2022 07:07:23 +0000 (09:07 +0200)]
lto/106540 - fix LTO tree input wrt dwarf2out_register_external_die
I've revisited the earlier two workarounds for dwarf2out_register_external_die
getting duplicate entries. It turns out that r11-525-g03d90a20a1afcb
added dref_queue pruning to lto_input_tree but decl reading uses that
to stream in DECL_INITIAL even when in the middle of SCC streaming.
When that SCC then gets thrown away we can end up with debug nodes
registered which isn't supposed to happen. The following adjusts
the DECL_INITIAL streaming to go the in-SCC way, using lto_input_tree_1,
since no SCCs are expected at this point, just refs.
PR lto/106540
PR lto/106334
* lto-streamer-in.c (lto_read_tree_1): Use lto_input_tree_1
to input DECL_INITIAL, avoiding to commit drefs.
Richard Biener [Tue, 19 Jul 2022 07:57:22 +0000 (09:57 +0200)]
middle-end/106331 - fix mem attributes for string op arguments
get_memory_rtx tries hard to come up with a MEM_EXPR to record
in the memory attributes but in the last fallback fails to properly
account for an unknown offset and thus, as visible in this testcase,
incorrect alignment computed from set_mem_attributes. The following
rectifies both parts.
PR middle-end/106331
* builtins.c (get_memory_rtx): Compute alignment from
the original address and set MEM_OFFSET to unknown when
we create a MEM_EXPR from the base object of the address.
Richard Biener [Thu, 30 Jun 2022 08:33:40 +0000 (10:33 +0200)]
tree-optimization/106131 - wrong code with FRE rewriting
The following makes sure to not use the original TBAA type for
looking up a value across an aggregate copy when we had to offset
the read.
2022-06-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/106131
* tree-ssa-sccvn.c (vn_reference_lookup_3): Force alias-set
zero when offsetting the read looking through an aggregate
copy.
Eric Botcazou [Fri, 14 Oct 2022 09:52:04 +0000 (11:52 +0200)]
Fix PR target/107248
This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.
gcc/
PR target/107248
* config/sparc/sparc.c (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.
Mikael Morin [Sat, 3 Sep 2022 09:58:47 +0000 (11:58 +0200)]
fortran: Move clobbers after evaluation of all arguments [PR106817]
For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call. This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.
With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.
PR fortran/106817
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Collect all clobbers
to their own separate block. Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.
Mikael Morin [Mon, 29 Aug 2022 09:19:29 +0000 (11:19 +0200)]
fortran: Fix invalid function decl clobber ICE [PR105012]
The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself. This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable. This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.
PR fortran/105012
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.
Mikael Morin [Wed, 31 Aug 2022 09:00:45 +0000 (11:00 +0200)]
fortran: Move the clobber generation code
This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.
What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.
Behaviour remains unchanged.
gcc/fortran/ChangeLog:
* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.c (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.
Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]
gcc/fortran/ChangeLog:
PR fortran/100040
PR fortran/100029
* trans-expr.c (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.
gcc/testsuite/ChangeLog:
PR fortran/100029
* gfortran.dg/PR100029.f90: New test.
PR fortran/100040
* gfortran.dg/PR100040.f90: New test.
Harald Anlauf [Tue, 27 Sep 2022 18:54:28 +0000 (20:54 +0200)]
Fortran: error recovery while simplifying intrinsic UNPACK [PR107054]
gcc/fortran/ChangeLog:
PR fortran/107054
* simplify.c (gfc_simplify_unpack): Replace assert by condition
that terminates simplification when there are not enough elements
in the constructor of argument VECTOR.
gcc/testsuite/ChangeLog:
PR fortran/107054
* gfortran.dg/pr107054.f90: New test.
Harald Anlauf [Tue, 23 Aug 2022 20:16:14 +0000 (22:16 +0200)]
Fortran: improve error recovery while simplifying size of bad array [PR103694]
gcc/fortran/ChangeLog:
PR fortran/103694
* simplify.c (simplify_size): The size expression of an array cannot
be simplified if an error occurs while resolving the array spec.
gcc/testsuite/ChangeLog:
PR fortran/103694
* gfortran.dg/pr103694.f90: New test.
This is a GCC 10 version of https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602105.html
There are no tuning structures to rename so the patch is (even) smaller than the GCC 12 and trunk
versions.
Bootstrapped and tested on aarch64-none-linux-gnu.
It turns out that GTY(()) markers in definitions like:
GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
are not effective and are silently ignored. The GTY(()) has
to come after an extern or static.
The externs associated with the SVE ACLE GTY variables are in
aarch64-sve-builtins.h. This file is not in tm_include_list because
we don't want every target-facing file to include it. It therefore
isn't in the list of GC header files either.
In this case that's a blessing in disguise, since the variables
belong to a namespace and gengtype doesn't understand namespaces.
I think the fix is instead to add an extra extern before each
variable declaration, similarly to varasm.cc and vtable-verify.cc.
(This works due to a "using namespace" at the end of the file.)
gcc/
PR target/106491
* config/aarch64/aarch64-sve-builtins.cc (scalar_types)
(acle_vector_types, acle_svpattern, acle_svprfop): Add GTY
markup to (new) extern declarations instead of to the main
definition.
Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints
this patch fixed PR target/99184 which incorrectly rounded during 64-bit
(long) double to 16-bit and 32-bit integers.
The patch just removes the respective roundings from
libf7-asm.sx::to_integer and ::to_unsigned. Luckily, LibF7 does nowhere
use respective functions internally, the only user is in libf7.c::f7_exp
which reads
f7_round (qq, qq);
int16_t q = f7_get_s16 (qq);
so that f7_get_s16() operates on an already rounded value, and therefore
this code works unaltered with or without rounding in to_integer.
Tom de Vries [Fri, 28 Jan 2022 09:28:59 +0000 (10:28 +0100)]
[nvptx] Add uniform_warp_check insn
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx]
Update default ptx isa to 6.3".
The problem is again that the first diverging branch is not handled as such in
SASS, which causes problems with a subsequent shfl insn, but given that we
have -mptx=3.1 we can't use the bar.warp.sync insn.
Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a
lesser importance, implement the next best thing: abort when detecting
non-convergence using this insn:
...
{ .reg.b32 act;
vote.ballot.b32 act,1;
.reg.pred uni;
setp.eq.b32 uni,act,0xffffffff;
@ !uni trap;
@ !uni exit;
}
...
Interestingly, the effect of this is that rather than aborting, the test-case
now passes.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_single): Use nvptx_uniform_warp_check.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_UNIFORM_WARP_CHECK.
(define_insn "nvptx_uniform_warp_check"): New define_insn.
Tom de Vries [Thu, 27 Jan 2022 14:03:59 +0000 (15:03 +0100)]
[nvptx] Add bar.warp.sync
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx] Update
default ptx isa to 6.3".
The first divergent branch looks like:
...
{
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r59,%x,0;
}
@ %r59 bra $L15;
mov.u64 %r48,%ar0;
mov.u32 %r22,2;
ld.u64 %r53,[%r48];
mov.u32 %r55,%r22;
mov.u32 %r54,1;
$L15:
...
and when inspecting the generated SASS, the branch is not setup as a divergent
branch, but instead as a regular branch.
This causes us to execute a shfl.sync insn in divergent mode, which is likely
to cause trouble given a remark in the ptx isa version 6.3, which mentions
that for .target sm_6x or below, all threads must excute the same
shfl.sync instruction in convergence.
Fix this by placing a "bar.warp.sync 0xffffffff" at the desired convergence
point (in the example above, after $L15).
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_single): Use nvptx_warpsync.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_WARPSYNC.
(define_insn "nvptx_warpsync"): New define_insn.
Peter Bergner [Thu, 1 Sep 2022 02:14:36 +0000 (21:14 -0500)]
rs6000: Don't ICE when we disassemble an MMA variable [PR101322]
When we expand an MMA disassemble built-in with C++ using a pointer that
is cast to a valid MMA type, the type isn't passed down to the expand
machinery and we end up using the base type of the pointer which leads to
an ICE. This patch enforces we always use the correct MMA type regardless
of the pointer type being used.
2022-08-31 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101322
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin):
Enforce the use of a valid MMA pointer type.
gcc/testsuite/
PR target/101322
* g++.target/powerpc/pr101322.C: New test.
Kewen Lin [Wed, 7 Sep 2022 01:37:57 +0000 (20:37 -0500)]
rs6000/test: Fix empty TU in some cases of effective targets [PR106345]
As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
PR106345 shows, some test sources for some powerpc effective
targets use empty translation unit wrongly. The test sources
could go with options like "-ansi -pedantic-errors", then those
effective target checkings will fail unexpectedly with the
error messages like:
error: ISO C forbids an empty translation unit [-Wpedantic]
This patch is to fix empty TUs with one dummy function definition
accordingly.
PR testsuite/106345
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_has_arch_pwr5): Add
a function definition to avoid pedwarn about empty translation unit.
(check_effective_target_has_arch_pwr6): Likewise.
(check_effective_target_has_arch_pwr7): Likewise.
(check_effective_target_has_arch_pwr8): Likewise.
(check_effective_target_has_arch_pwr9): Likewise.
(check_effective_target_has_arch_ppc64): Likewise.
(check_effective_target_ppc_float128): Likewise.
(check_effective_target_ppc_float128_insns): Likewise.
(check_effective_target_powerpc_vsx): Likewise.