Patrick Palka [Thu, 28 Oct 2021 14:05:14 +0000 (10:05 -0400)]
c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
In the testcase below the two left fold expressions each expand into a
constant logical expression with 1024 terms, for which potential_const_expr
takes more than a minute to return true. This happens because p_c_e_1
performs trial evaluation of the first operand of a &&/|| in order to
determine whether to consider the potentiality of the second operand.
And because the expanded expression is left-associated, this trial
evaluation causes p_c_e_1 to be quadratic in the number of terms of the
expression.
This patch fixes this quadratic behavior by making p_c_e_1 preemptively
compute potentiality of the second operand of a &&/||, and perform trial
evaluation of the first operand only if the second operand isn't
potentially constant. We must be careful to avoid emitting bogus
diagnostics during the preemptive computation; to that end, we perform
this shortcut only when tf_error is cleared, and when tf_error is set we
now first check potentiality of the whole expression quietly and replay
the check noisily for diagnostics.
Apart from fixing the quadraticness for left-associated logical exprs,
this change also reduces compile time for the libstdc++ testcase
20_util/variant/87619.cc by about 15% even though our <variant> uses
right folds instead of left folds. Likewise for the testcase in the PR,
for which compile time is reduced by 30%. The reason for these speedups
is that p_c_e_1 no longer performs expensive trial evaluation of each term
of large constant logical expressions when determining their potentiality.
PR c++/102780
PR c++/108138
gcc/cp/ChangeLog:
* constexpr.c (potential_constant_expression_1) <case TRUTH_*_EXPR>:
When tf_error isn't set, preemptively check potentiality of the
second operand before performing trial evaluation of the first
operand.
(potential_constant_expression_1): When tf_error is set, first check
potentiality quietly and return true if successful, otherwise
proceed noisily to give errors.
Sebastian Pop [Wed, 30 Nov 2022 19:45:24 +0000 (19:45 +0000)]
AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]
Currently patchable area is at the wrong place on AArch64. It is placed
immediately after function label, before .cfi_startproc. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
modifies aarch64_print_patchable_function_entry to avoid placing
patchable area before .cfi_startproc.
Iain Buclaw [Sat, 10 Dec 2022 18:12:43 +0000 (19:12 +0100)]
d: Fix internal compiler error: in visit, at d/imports.cc:72 (PR108050)
The visitor for lowering IMPORTED_DECLs did not have an override for
dealing with importing OverloadSet symbols. This has now been
implemented in the code generator.
PR d/108050
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (Import *)): Handle build_import_decl
returning a TREE_LIST.
* imports.cc (ImportVisitor::visit (OverloadSet *)): New override.
liuhongt [Mon, 28 Nov 2022 01:59:47 +0000 (09:59 +0800)]
Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.
For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
!flag_signed_char. it's transformed to
__builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
and expanded to (const_int 255) in the rtl. But for immediate_operand,
it expects (const_int 255) to be signed extended to
(const_int -1). The mismatch caused an unrecognizable insn error.
The patch converts (const_int 255) to (const_int -1) in the backend
expander.
Iain Buclaw [Fri, 11 Nov 2022 23:54:47 +0000 (00:54 +0100)]
d: Fix ICE on named continue label in an unrolled loop [PR107592]
Continue labels in an unrolled loop require a unique label per
iteration. Previously this used the Statement body node for each
unrolled iteration to generate a new entry in the label hash table.
This does not work when the continue label has an identifier, as said
named label is pointing to the outer UnrolledLoopStatement node.
What would happen is that during the lowering of `continue label', an
automatic label associated with the unrolled loop would be generated,
and a jump to that label inserted, but because it was never pushed by
the visitor for the loop itself, it subsequently never gets emitted.
To fix, correctly use the UnrolledLoopStatement as the key to look up
and store the break/continue label pair, but remove the continue label
from the value entry after every loop to force a new label to be
generated by the next call to `push_continue_label'
PR d/107592
gcc/d/ChangeLog:
* toir.cc (IRVisitor::push_unrolled_continue_label): New method.
(IRVisitor::pop_unrolled_continue_label): New method.
(IRVisitor::visit (UnrolledLoopStatement *)): Use them instead of
push_continue_label and pop_continue_label.
While most PA 2.0 instructions support both 32 and 64-bit traps
and conditions, the addi and subi instructions only support 32-bit
traps and conditions. Thus, we need to force immediate operands
to register operands on the 64-bit target and use the add/sub
instructions which can trap on 64-bit signed overflow.
2022-11-30 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (addvdi3): Force operand 2 to a register.
Remove "addi,tsv,*" instruction from unamed pattern.
(subvdi3): Force operand 1 to a register.
Remove "subi,tsv" instruction from from unamed pattern.
Thomas Schwinge [Fri, 18 Nov 2022 22:57:52 +0000 (23:57 +0100)]
nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'
A recent nvptx-tools change: commit 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820
"ld: Don't search for input files in '-L'directories" (of
<https://github.com/MentorEmbedded/nvptx-tools/pull/38>
"Match standard 'ld' "search" behavior") in GCC/nvptx target testing
generally causes linking to fail with:
error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1
Indeed per GCC '-v' output, there is an undecorated 'crt0.o' on the linker
('collect2') command line:
..., and the fix, as used by numerous other GCC targets, is to instead use
'crt0.o%s'; for '%s' means, per 'gcc/gcc.cc', "The Specs Language":
%s current argument is the name of a library or startup file of some sort.
Search for that file in a standard list of directories
and substitute the full name found.
With that, we get the expected path to 'crt0.o'.
gcc/
* config/nvptx/nvptx.h (STARTFILE_SPEC): Fix 'crt0.o' for
'-mmainkernel'.
Richard Biener [Mon, 14 Nov 2022 16:19:20 +0000 (17:19 +0100)]
tree-optimization/107485 - fix non-call exception ICE with inlining
Inlining performs a wrong non-call exception fixup for VEC_COND_EXPRs
which on the branch fail to properly have the condition split out in
the first place.
PR tree-optimization/107485
* tree-inline.c (remap_gimple_stmt): Use correct type for
split out condition of [VEC_]COND_EXPRs.
H.J. Lu [Wed, 19 Oct 2022 19:53:35 +0000 (12:53 -0700)]
Always use TYPE_MODE instead of DECL_MODE for vector field
e034c5c8957 re PR target/78643 (ICE in convert_move, at expr.c:230)
fixed the case where DECL_MODE of a vector field is BLKmode and its
TYPE_MODE is a vector mode because of target attribute. Remove the
BLKmode check for the case where DECL_MODE of a vector field is a vector
mode and its TYPE_MODE isn't a vector mode because of target attribute.
gcc/
PR target/107304
* expr.c (get_inner_reference): Always use TYPE_MODE for vector
field with vector raw mode.
gcc/testsuite/
PR target/107304
* gcc.target/i386/pr107304.c: New test.
IBM zSystems: Fix function_ok_for_sibcall [PR106355]
For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.
Fix some indentation whitespace, too.
gcc/ChangeLog:
PR target/106355
* config/s390/s390.c (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.
gcc/testsuite/ChangeLog:
* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.
Martin Liska [Mon, 24 Oct 2022 13:34:39 +0000 (15:34 +0200)]
x86: fix VENDOR_MAX enum value
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in practice
since GCC simply ignored the extension until now. (The GAS
definition is OK.)
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.
Kewen Lin [Mon, 26 Sep 2022 05:33:18 +0000 (00:33 -0500)]
rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.
PR target/96072
gcc/ChangeLog:
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.
Pat Haugen [Mon, 17 Oct 2022 19:53:11 +0000 (14:53 -0500)]
Fix register count when not splitting Complex IEEE 128-bit args.
For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.
Richard Biener [Mon, 8 Aug 2022 07:07:23 +0000 (09:07 +0200)]
lto/106540 - fix LTO tree input wrt dwarf2out_register_external_die
I've revisited the earlier two workarounds for dwarf2out_register_external_die
getting duplicate entries. It turns out that r11-525-g03d90a20a1afcb
added dref_queue pruning to lto_input_tree but decl reading uses that
to stream in DECL_INITIAL even when in the middle of SCC streaming.
When that SCC then gets thrown away we can end up with debug nodes
registered which isn't supposed to happen. The following adjusts
the DECL_INITIAL streaming to go the in-SCC way, using lto_input_tree_1,
since no SCCs are expected at this point, just refs.
PR lto/106540
PR lto/106334
* lto-streamer-in.c (lto_read_tree_1): Use lto_input_tree_1
to input DECL_INITIAL, avoiding to commit drefs.
Richard Biener [Tue, 19 Jul 2022 07:57:22 +0000 (09:57 +0200)]
middle-end/106331 - fix mem attributes for string op arguments
get_memory_rtx tries hard to come up with a MEM_EXPR to record
in the memory attributes but in the last fallback fails to properly
account for an unknown offset and thus, as visible in this testcase,
incorrect alignment computed from set_mem_attributes. The following
rectifies both parts.
PR middle-end/106331
* builtins.c (get_memory_rtx): Compute alignment from
the original address and set MEM_OFFSET to unknown when
we create a MEM_EXPR from the base object of the address.
Richard Biener [Thu, 30 Jun 2022 08:33:40 +0000 (10:33 +0200)]
tree-optimization/106131 - wrong code with FRE rewriting
The following makes sure to not use the original TBAA type for
looking up a value across an aggregate copy when we had to offset
the read.
2022-06-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/106131
* tree-ssa-sccvn.c (vn_reference_lookup_3): Force alias-set
zero when offsetting the read looking through an aggregate
copy.
Eric Botcazou [Fri, 14 Oct 2022 09:52:04 +0000 (11:52 +0200)]
Fix PR target/107248
This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.
gcc/
PR target/107248
* config/sparc/sparc.c (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.
Mikael Morin [Sat, 3 Sep 2022 09:58:47 +0000 (11:58 +0200)]
fortran: Move clobbers after evaluation of all arguments [PR106817]
For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call. This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.
With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.
PR fortran/106817
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Collect all clobbers
to their own separate block. Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.
Mikael Morin [Mon, 29 Aug 2022 09:19:29 +0000 (11:19 +0200)]
fortran: Fix invalid function decl clobber ICE [PR105012]
The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself. This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable. This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.
PR fortran/105012
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.
Mikael Morin [Wed, 31 Aug 2022 09:00:45 +0000 (11:00 +0200)]
fortran: Move the clobber generation code
This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.
What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.
Behaviour remains unchanged.
gcc/fortran/ChangeLog:
* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.c (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.