Thomas Schwinge [Fri, 18 Nov 2022 22:57:52 +0000 (23:57 +0100)]
nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'
A recent nvptx-tools change: commit 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820
"ld: Don't search for input files in '-L'directories" (of
<https://github.com/MentorEmbedded/nvptx-tools/pull/38>
"Match standard 'ld' "search" behavior") in GCC/nvptx target testing
generally causes linking to fail with:
error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1
Indeed per GCC '-v' output, there is an undecorated 'crt0.o' on the linker
('collect2') command line:
..., and the fix, as used by numerous other GCC targets, is to instead use
'crt0.o%s'; for '%s' means, per 'gcc/gcc.cc', "The Specs Language":
%s current argument is the name of a library or startup file of some sort.
Search for that file in a standard list of directories
and substitute the full name found.
With that, we get the expected path to 'crt0.o'.
gcc/
* config/nvptx/nvptx.h (STARTFILE_SPEC): Fix 'crt0.o' for
'-mmainkernel'.
Philipp Tomsich [Mon, 7 Nov 2022 13:22:21 +0000 (14:22 +0100)]
aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU
This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).
Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.
H.J. Lu [Wed, 19 Oct 2022 19:53:35 +0000 (12:53 -0700)]
Always use TYPE_MODE instead of DECL_MODE for vector field
e034c5c8957 re PR target/78643 (ICE in convert_move, at expr.c:230)
fixed the case where DECL_MODE of a vector field is BLKmode and its
TYPE_MODE is a vector mode because of target attribute. Remove the
BLKmode check for the case where DECL_MODE of a vector field is a vector
mode and its TYPE_MODE isn't a vector mode because of target attribute.
gcc/
PR target/107304
* expr.c (get_inner_reference): Always use TYPE_MODE for vector
field with vector raw mode.
gcc/testsuite/
PR target/107304
* gcc.target/i386/pr107304.c: New test.
Eric Botcazou [Fri, 4 Nov 2022 10:23:12 +0000 (11:23 +0100)]
Fix recent thinko in operand_equal_p
There is a thinko in a recent improvement made to operand_equal_p where
the code just looks at operand 2 of COMPONENT_REF, if it is present, to
compare addresses. That's wrong because operand 2 contains the number of
DECL_OFFSET_ALIGN-bit-sized words so, when DECL_OFFSET_ALIGN > 8, not all
the bytes are included and some of them are in DECL_FIELD_BIT_OFFSET, see
get_inner_reference for the model computation.
In other words, you would need to compare operand 2 and DECL_OFFSET_ALIGN
and DECL_FIELD_BIT_OFFSET in this situation, but I'm not sure this is worth
the hassle in practice so the fix just removes this alternate handling.
gcc/
* fold-const.c (operand_compare::operand_equal_p) <COMPONENT_REF>:
Do not take into account operand 2.
(operand_compare::hash_operand) <COMPONENT_REF>: Likewise.
gcc/testsuite/
* gnat.dg/opt99.adb: New test.
* gnat.dg/opt99_pkg1.ads, gnat.dg/opt99_pkg1.adb: New helper.
* gnat.dg/opt99_pkg2.ads: Likewise.
Jakub Jelinek [Mon, 24 Oct 2022 15:53:16 +0000 (17:53 +0200)]
c, c++: Fix up excess precision handling of scalar_to_vector conversion [PR107358]
As mentioned earlier in the C++ excess precision support mail, the following
testcase is broken with excess precision both in C and C++ (though just in C++
it was triggered in real-world code).
scalar_to_vector is called in both FEs after the excess precision promotions
(or stripping of EXCESS_PRECISION_EXPR), so we can then get invalid
diagnostics that say float vector + float involves truncation (on ia32
from long double to float).
The following patch fixes that by calling scalar_to_vector on the operands
before the excess precision promotions, let scalar_to_vector just do the
diagnostics (it does e.g. fold_for_warn so it will fold
EXCESS_PRECISION_EXPR around REAL_CST to constants etc.) but will then
do the actual conversions using the excess precision promoted operands
(so say if we have vector double + (float + float) we don't actually do
vector double + (float) ((long double) float + (long double) float)
but
vector double + (double) ((long double) float + (long double) float)
2022-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/107358
gcc/c/
* c-typeck.c (build_binary_op): Pass operands before excess precision
promotions to scalar_to_vector call.
gcc/testsuite/
* c-c++-common/pr107358.c: New test.
Jakub Jelinek [Mon, 24 Oct 2022 14:25:29 +0000 (16:25 +0200)]
c++: Fix up constexpr handling of char/signed char/short pre/post inc/decrement [PR105774]
signed char, char or short int pre/post inc/decrement are represented by
normal {PRE,POST}_{INC,DEC}REMENT_EXPRs in the FE and only gimplification
ensures that the {PLUS,MINUS}_EXPR is done in unsigned version of those
types:
case PREINCREMENT_EXPR:
case PREDECREMENT_EXPR:
case POSTINCREMENT_EXPR:
case POSTDECREMENT_EXPR:
{
tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
if (INTEGRAL_TYPE_P (type) && c_promoting_integer_type_p (type))
{
if (!TYPE_OVERFLOW_WRAPS (type))
type = unsigned_type_for (type);
return gimplify_self_mod_expr (expr_p, pre_p, post_p, 1, type);
}
break;
}
This means during constant evaluation we need to do it similarly (either
using unsigned_type_for or using widening to integer_type_node).
The following patch does the latter.
2022-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/105774
* constexpr.c (cxx_eval_increment_expression): For signed types
that promote to int, evaluate PLUS_EXPR or MINUS_EXPR in int type.
Jakub Jelinek [Wed, 12 Oct 2022 15:54:08 +0000 (17:54 +0200)]
libgomp: Fix up creation of artificial teams
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes). In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s). The code would destroy/free the previous
task and replace it with the new implicit task. If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
Jakub Jelinek [Sat, 24 Sep 2022 07:24:26 +0000 (09:24 +0200)]
openmp: Fix ICE with taskgroup at -O0 -fexceptions [PR107001]
The following testcase ICEs because with -O0 -fexceptions GOMP_taskgroup_end
call isn't directly followed by GOMP_RETURN statement, but there are some
conditionals to handle exceptions and we fail to find the correct GOMP_RETURN.
The fix is to treat taskgroup similarly to target data, both of these constructs
emit a try { body } finally { end_call } around the construct's body during
gimplification and we need to see proper construct nesting during gimplification
and omp lowering (including nesting of regions checks), but during omp expansion
we don't really need their nesting anymore, all we need is emit something at
the start of the region and the end of the region is the end API call we've
already emitted during gimplification. For target data, we weren't adding
GOMP_RETURN statement during omp lowering, so after that pass it is treated
merely like stand-alone omp directives. This patch does the same for
taskgroup too.
2022-09-24 Jakub Jelinek <jakub@redhat.com>
PR c/107001
* omp-low.c (lower_omp_taskgroup): Don't add GOMP_RETURN statement
at the end.
* omp-expand.c (build_omp_regions_1): Clarify GF_OMP_TARGET_KIND_DATA
is not stand-alone directive. For GIMPLE_OMP_TASKGROUP, also don't
update parent.
(omp_make_gimple_edges) <case GIMPLE_OMP_TASKGROUP>: Reset
cur_region back after new_omp_region.
Jakub Jelinek [Sat, 24 Sep 2022 07:19:26 +0000 (09:19 +0200)]
openmp, c: Tighten up c_tree_equal [PR106981]
This patch changes c_tree_equal to work more like cp_tree_equal, be
more strict in what it accepts. The ICE on the first testcase was
due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which
ICEs if the two constants have different precision, but as the second
testcase shows, being too lenient in it can also lead to miscompilation
of valid OpenMP programs where we think certain expression is the same
even when it isn't and can be guaranteed at runtime to represent different
memory location. So, the patch looks through only NON_LVALUE_EXPRs
and for constants as well as casts requires that the types match before
actually comparing the constant values or recursing on the cast operands.
2022-09-24 Jakub Jelinek <jakub@redhat.com>
PR c/106981
gcc/c/
* c-typeck.c (c_tree_equal): Only strip NON_LVALUE_EXPRs at the
start. For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and
t2 have different types.
gcc/testsuite/
* c-c++-common/gomp/pr106981.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/pr106981.c: New test.
Jakub Jelinek [Wed, 24 Aug 2022 07:57:09 +0000 (09:57 +0200)]
i386: Fix up mode iterators that weren't expanded [PR106721]
Currently, when md file reader sees <something> and something is valid mode
(or code) attribute but which doesn't include case for the current mode
(or code), it just keeps the <something> untouched.
I went through all cases matching <[a-zA-Z] in tmp-mddump.md after make mddump.
One of the cases was related to the V*HF mode additions and there was one typo.
From what I can see, this has been voted in as a DR and as it means
we warn less often than before in -std={gnu,c}++2{0,3} modes or with
-Wvolatile, I wonder if it shouldn't be backported to affected release
branches as well.
2022-08-16 Jakub Jelinek <jakub@redhat.com>
* typeck.c (cp_build_modify_expr): Implement
P2327R1 - De-deprecating volatile compound operations. Don't warn
for |=, &= or ^= with volatile lhs.
* expr.c (mark_use) <case MODIFY_EXPR>: Adjust warning wording,
leave out simple.
* g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile
compound |=, &= and ^= operations.
* g++.dg/cpp2a/volatile3.C: Likewise.
* g++.dg/cpp2a/volatile5.C: Likewise.
Jakub Jelinek [Wed, 27 Jul 2022 10:06:22 +0000 (12:06 +0200)]
cgraphunit: Don't emit asm thunks for -dx [PR106261]
When -dx option is used (didn't know we have it and no idea what is it
useful for), we just expand functions to RTL and then omit all further
RTL passes, so the normal functions aren't actually emitted into assembly,
just variables.
The following testcase ICEs, because we don't emit the methods, but do
emit thunks pointing to that and those thunks have unwind info and rely on
at least some real functions to be emitted (which is normally the case,
thunks are only emitted for locally defined functions) because otherwise
there are no CIEs, only FDEs and dwarf2out is upset about it.
The following patch fixes that by not emitting assembly thunks for -dx
either.
Jakub Jelinek [Fri, 1 Jul 2022 09:17:41 +0000 (11:17 +0200)]
wide-int: Fix up wi::shifted_mask [PR106144]
As the following self-test testcase shows, wi::shifted_mask sometimes
doesn't create canonicalized wide_ints, which then fail to compare equal
to canonicalized wide_ints with the same value.
In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and prec 128,
while wi::shifted_mask (0, 128, false, 128) gives { -1, -1 } with len 2
and prec 128.
The problem is that the code is written with the assumption that there are
3 bit blocks (or 2 if start is 0), but doesn't consider the possibility
where there are 2 bit blocks (or 1 if start is 0) where the highest block
isn't present. In that case, there is the optional block of negate ? 0 : -1
elts, followed by just one elt (either one from the if (shift) or just
negate ? -1 : 0) and the rest is implicit sign-extension.
Only if end < prec there is 1 or more bits above it that have different bit
value and so we need to emit all the elts till end and then one more elt.
if (end == prec) would work too, because we have:
if (width > prec - start)
width = prec - start;
unsigned int end = start + width;
so end is guaranteed to be end <= prec, dunno what is preferred.
2022-07-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106144
* wide-int.cc (wi::shifted_mask): If end >= prec, return right after
emitting element for shift or if shift is 0 first element after start.
(wide_int_cc_tests): Add tests for equivalency of wi::mask and
wi::shifted_mask with 0 start.
Jakub Jelinek [Tue, 21 Jun 2022 09:40:16 +0000 (11:40 +0200)]
ifcvt: Don't introduce trapping or faulting reads in noce_try_sign_mask [PR106032]
noce_try_sign_mask as documented will optimize
if (c < 0)
x = t;
else
x = 0;
into x = (c >> bitsm1) & t;
The optimization is done if either t is unconditional
(e.g. for
x = t;
if (c >= 0)
x = 0;
) or if it is cheap. We already check that t doesn't have side-effects,
but if t is conditional, we need to punt also if it may trap or fault,
as we make it unconditional.
I've briefly skimmed other noce_try* optimizations and didn't find one that
would suffer from the same problem.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/106032
* ifcvt.c (noce_try_sign_mask): Punt if !t_unconditional, and
t may_trap_or_fault_p, even if it is cheap.
Jakub Jelinek [Tue, 21 Jun 2022 09:38:59 +0000 (11:38 +0200)]
expand: Fix up expand_cond_expr_using_cmove [PR106030]
If expand_cond_expr_using_cmove can't find a cmove optab for a particular
mode, it tries to promote the mode and perform the cmove in the promoted
mode.
The testcase in the patch ICEs on arm because in that case we pass temp which
has the promoted mode (SImode) as target to expand_operands where the
operands have the non-promoted mode (QImode).
Later on the function uses paradoxical subregs:
if (GET_MODE (op1) != mode)
op1 = gen_lowpart (mode, op1);
if (GET_MODE (op2) != mode)
op2 = gen_lowpart (mode, op2);
to change the operand modes.
The following patch fixes it by passing NULL_RTX as target if it has
promoted mode.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106030
* expr.c (expand_cond_expr_using_cmove): Pass NULL_RTX instead of
temp to expand_operands if mode has been promoted.
Jakub Jelinek [Tue, 21 Jun 2022 15:51:08 +0000 (17:51 +0200)]
libgomp: Fix up target-31.c test [PR106045]
The i variable is used inside of the parallel in:
#pragma omp simd safelen(32) private (v)
for (i = 0; i < 64; i++)
{
v = 3 * i;
ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
}
where i is predetermined linear (so while inside of the body
it is safe, private per SIMD lane var) the final value is written to
the shared variable, and in:
for (i = 0; i < 64; i++)
if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
#pragma omp atomic write
err = 1;
which is a normal loop and so it isn't in any way privatized there.
So we have a data race, fixed by adding private (i) clause to the
parallel.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
Paul Iannetta <piannetta@kalrayinc.com>
Fortran: Add missing TKR initialization to class variables [PR100097, PR100098]
gcc/fortran/ChangeLog:
PR fortran/100097
PR fortran/100098
* trans-array.c (gfc_trans_class_array): New function to
initialize class descriptor's TKR information.
* trans-array.h (gfc_trans_class_array): Add function prototype.
* trans-decl.c (gfc_trans_deferred_vars): Add calls to the new
function for both pointers and allocatables.
gcc/testsuite/ChangeLog:
PR fortran/100097
PR fortran/100098
* gfortran.dg/PR100097.f90: New test.
* gfortran.dg/PR100098.f90: New test.
In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.
Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from /src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c: In function 'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat/fp-struct-test-by-value-y.h:28:1: internal compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:22:1: note: in expansion of macro 'TEST'
Philipp Tomsich [Sun, 7 Aug 2022 22:30:52 +0000 (00:30 +0200)]
aarch64: update Ampere-1 core definition
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.
Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').
Philipp Tomsich [Mon, 3 Oct 2022 19:59:50 +0000 (21:59 +0200)]
aarch64: fix off-by-one in reading cpuinfo
Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.
IBM zSystems: Fix function_ok_for_sibcall [PR106355]
For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.
Fix some indentation whitespace, too.
gcc/ChangeLog:
PR target/106355
* config/s390/s390.c (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.
gcc/testsuite/ChangeLog:
* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.
Martin Liska [Mon, 24 Oct 2022 13:34:39 +0000 (15:34 +0200)]
x86: fix VENDOR_MAX enum value
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.
Specifically, the "class T::foo" bit. There, class_decl_loc_t::add gets
a TYPENAME_TYPE as TYPE, rather than a class/union type, so checking
TYPE_BEING_DEFINED will crash. I think it's OK to allow a TYPENAME_TYPE to
slip into that function; we just shouldn't consider the 'class' tag redundant
(which works as a 'typename'). In fact, every other compiler *requires* it.
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in practice
since GCC simply ignored the extension until now. (The GAS
definition is OK.)
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.
Kewen Lin [Mon, 26 Sep 2022 05:33:18 +0000 (00:33 -0500)]
rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.
PR target/96072
gcc/ChangeLog:
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.
Pat Haugen [Mon, 17 Oct 2022 20:11:42 +0000 (15:11 -0500)]
Fix register count when not splitting Complex IEEE 128-bit args.
For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.
Richard Biener [Wed, 14 Sep 2022 07:00:35 +0000 (09:00 +0200)]
tree-optimization/106934 - avoid BIT_FIELD_REF of bitfields
The following avoids creating BIT_FIELD_REF of bitfields in
update-address-taken. The patch doesn't implement punning to
a full precision integer type but leaves a comment according to
that.
Richard Biener [Fri, 9 Sep 2022 10:06:38 +0000 (12:06 +0200)]
tree-optimization/106892 - avoid invalid pointer association in predcom
When predictive commoning builds a reference for iteration N it
prematurely associates a constant offset into the MEM_REF offset
operand which can be invalid if the base pointer then points
outside of an object which alias-analysis does not consider valid.
PR tree-optimization/106892
* tree-predcom.c (ref_at_iteration): Do not associate the
constant part of the offset into the MEM_REF offset
operand, across a non-zero offset.
Richard Biener [Mon, 25 Jul 2022 15:24:57 +0000 (17:24 +0200)]
tree-optimization/106189 - avoid division by zero exception
The diagnostic code can end up with zero sized array elements
with T[][0] and the wide-int code nicely avoids exceptions when
dividing by zero in one codepath but not in another. The following
fixes the exception by using wide-int in both paths.
PR tree-optimization/106189
* gimple-array-bounds.cc (array_bounds_checker::check_mem_ref):
Divide using offset_ints.
Eric Botcazou [Fri, 14 Oct 2022 09:52:04 +0000 (11:52 +0200)]
Fix PR target/107248
This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.
gcc/
PR target/107248
* config/sparc/sparc.c (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.
Richard Biener [Mon, 8 Aug 2022 07:07:23 +0000 (09:07 +0200)]
lto/106540 - fix LTO tree input wrt dwarf2out_register_external_die
I've revisited the earlier two workarounds for dwarf2out_register_external_die
getting duplicate entries. It turns out that r11-525-g03d90a20a1afcb
added dref_queue pruning to lto_input_tree but decl reading uses that
to stream in DECL_INITIAL even when in the middle of SCC streaming.
When that SCC then gets thrown away we can end up with debug nodes
registered which isn't supposed to happen. The following adjusts
the DECL_INITIAL streaming to go the in-SCC way, using lto_input_tree_1,
since no SCCs are expected at this point, just refs.
PR lto/106540
PR lto/106334
* lto-streamer-in.c (lto_read_tree_1): Use lto_input_tree_1
to input DECL_INITIAL, avoiding to commit drefs.
Richard Biener [Tue, 19 Jul 2022 07:57:22 +0000 (09:57 +0200)]
middle-end/106331 - fix mem attributes for string op arguments
get_memory_rtx tries hard to come up with a MEM_EXPR to record
in the memory attributes but in the last fallback fails to properly
account for an unknown offset and thus, as visible in this testcase,
incorrect alignment computed from set_mem_attributes. The following
rectifies both parts.
PR middle-end/106331
* builtins.c (get_memory_rtx): Compute alignment from
the original address and set MEM_OFFSET to unknown when
we create a MEM_EXPR from the base object of the address.
Richard Biener [Thu, 30 Jun 2022 08:33:40 +0000 (10:33 +0200)]
tree-optimization/106131 - wrong code with FRE rewriting
The following makes sure to not use the original TBAA type for
looking up a value across an aggregate copy when we had to offset
the read.
2022-06-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/106131
* tree-ssa-sccvn.c (vn_reference_lookup_3): Force alias-set
zero when offsetting the read looking through an aggregate
copy.
Richard Biener [Mon, 20 Jun 2022 11:40:50 +0000 (13:40 +0200)]
middle-end/106027 - fix types in needle folding
The fold_to_nonsharp_ineq_using_bound folding ends up creating invalid
typed IL which confuses later foldings. The following fixes that.
2022-06-20 Richard Biener <rguenther@suse.de>
PR middle-end/106027
* fold-const.c (fold_to_nonsharp_ineq_using_bound): Use the
type of the prevailing comparison for the new comparison type.
(fold_binary_loc): Use proper types for the A < X && A + 1 > Y
to A < X && A >= Y folding.
Mikael Morin [Sat, 3 Sep 2022 09:58:47 +0000 (11:58 +0200)]
fortran: Move clobbers after evaluation of all arguments [PR106817]
For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call. This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.
With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.
PR fortran/106817
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Collect all clobbers
to their own separate block. Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.
Mikael Morin [Mon, 29 Aug 2022 09:19:29 +0000 (11:19 +0200)]
fortran: Fix invalid function decl clobber ICE [PR105012]
The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself. This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable. This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.
PR fortran/105012
gcc/fortran/ChangeLog:
* trans-expr.c (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.
Mikael Morin [Wed, 31 Aug 2022 09:00:45 +0000 (11:00 +0200)]
fortran: Move the clobber generation code
This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.
What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.
Behaviour remains unchanged.
gcc/fortran/ChangeLog:
* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.c (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.
Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]
gcc/fortran/ChangeLog:
PR fortran/100040
PR fortran/100029
* trans-expr.c (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.
gcc/testsuite/ChangeLog:
PR fortran/100029
* gfortran.dg/PR100029.f90: New test.
PR fortran/100040
* gfortran.dg/PR100040.f90: New test.