Xi Ruoyao [Wed, 20 Mar 2024 07:09:21 +0000 (15:09 +0800)]
mips: Fix C23 (...) functions returning large aggregates [PR114175]
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument. This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.
Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
gcc/ChangeLog:
PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
Xi Ruoyao [Mon, 18 Mar 2024 09:18:34 +0000 (17:18 +0800)]
LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument. This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.
Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
gcc/ChangeLog:
PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
Jakub Jelinek [Thu, 28 Mar 2024 14:00:44 +0000 (15:00 +0100)]
profile-count: Avoid overflows into uninitialized [PR112303]
The testcase in the patch ICEs with
--- gcc/tree-scalar-evolution.cc
+++ gcc/tree-scalar-evolution.cc
@@ -3881,7 +3881,7 @@ final_value_replacement_loop (class loop *loop)
/* Propagate constants immediately, but leave an unused initialization
around to avoid invalidating the SCEV cache. */
- if (CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
+ if (0 && CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
replace_uses_by (rslt, def);
/* Create the replacement statements. */
(the addition of the above made the ICE latent), because profile_count
addition doesn't check for overflows and if unlucky, we can even overflow
into the uninitialized value.
Getting really huge profile counts is very easy even when not using
recursive inlining in loops, e.g.
__attribute__((noipa)) void
bar (void)
{
__builtin_exit (0);
}
__attribute__((noipa)) void
foo (void)
{
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
for (int k = 0; k < 1000; ++k)
for (int l = 0; l < 1000; ++l)
for (int m = 0; m < 1000; ++m)
for (int n = 0; n < 1000; ++n)
for (int o = 0; o < 1000; ++o)
for (int p = 0; p < 1000; ++p)
for (int q = 0; q < 1000; ++q)
for (int r = 0; r < 1000; ++r)
for (int s = 0; s < 1000; ++s)
for (int t = 0; t < 1000; ++t)
for (int u = 0; u < 1000; ++u)
for (int v = 0; v < 1000; ++v)
for (int w = 0; w < 1000; ++w)
for (int x = 0; x < 1000; ++x)
for (int y = 0; y < 1000; ++y)
for (int z = 0; z < 1000; ++z)
for (int a = 0; a < 1000; ++a)
for (int b = 0; b < 1000; ++b)
bar ();
}
int
main ()
{
foo ();
}
reaches the maximum count already on the 11th loop.
Some other methods of profile_count like apply_scale already
do use MIN (val, max_count) before assignment to m_val, this patch
just extends that to operator{+,+=} methods.
Furthermore, one overload of apply_probability wasn't using
safe_scale_64bit and so could very easily overflow as well
- prob is required to be [0, 10000] and if m_val is near the max_count,
it can overflow even with multiplications by 8.
2024-03-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112303
* profile-count.h (profile_count::operator+): Perform
addition in uint64_t variable and set m_val to MIN of that
val and max_count.
(profile_count::operator+=): Likewise.
(profile_count::operator-=): Formatting fix.
(profile_count::apply_probability): Use safe_scale_64bit
even in the int overload.
Jakub Jelinek [Tue, 26 Mar 2024 10:21:38 +0000 (11:21 +0100)]
fold-const: Punt on MULT_EXPR in extract_muldiv MIN/MAX_EXPR case [PR111151]
As I've tried to explain in the comments, the extract_muldiv_1
MIN/MAX_EXPR optimization is wrong for code == MULT_EXPR.
If the multiplication is done in unsigned type or in signed
type with -fwrapv, it is fairly obvious that max (a, b) * c
in many cases isn't equivalent to max (a * c, b * c) (or min if c is
negative) due to overflows, but even for signed with undefined overflow,
the optimization could turn something without UB in it (where
say a * c invokes UB, but max (or min) picks the other operand where
b * c doesn't).
As for division/modulo, I think it is in most cases safe, except if
the problematic INT_MIN / -1 case could be triggered, but we can
just punt for MAX_EXPR because for MIN_EXPR if one operand is INT_MIN,
we'd pick that operand already. It is just for completeness, match.pd
already has an optimization which turns x / -1 into -x, so the division
by zero is mostly theoretical. That is also why in the testcase the
i case isn't actually miscompiled without the patch, while the c and f
cases are.
2024-03-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111151
* fold-const.cc (extract_muldiv_1) <case MAX_EXPR>: Punt for
MULT_EXPR altogether, or for MAX_EXPR if c is -1.
Jakub Jelinek [Tue, 26 Mar 2024 10:06:15 +0000 (11:06 +0100)]
tsan: Don't instrument non-generic AS accesses [PR111736]
Similar to the asan and ubsan changes, we shouldn't instrument non-generic
address space accesses with tsan, because we just have library functions
which take address of the objects as generic address space pointers, so they
can't handle anything else.
2024-03-26 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/111736
* tsan.cc (instrument_expr): Punt on non-generic address space
accesses.
Jakub Jelinek [Sat, 23 Mar 2024 10:17:44 +0000 (11:17 +0100)]
predcom: Punt for steps which aren't multiples of access size [PR111683]
On the following testcases, there is no overlap between data references
within a single iteration, but the data references have size which is twice
as large as the step, which means the data references overlap with the next
iteration which predcom doesn't take into account.
As discussed in the PR, even if the reference size is smaller than step,
if step isn't a multiple of the reference size, there could be overlaps with
some other iteration later on.
The initial version of the patch regressed (test still passed, but predcom
didn't optimize anymore) pr71083.c which has a packed char, short structure
and was reading/writing the short 2 bytes in there with step 3.
The following patch deals with that by retrying for COMPONENT_REFs also the
aggregate sizes etc., so that it then compares 3 bytes against step 3.
In make check-gcc/check-g++ this patch I believe affects code generation
for only the 2 new testcases according to statistics I've gathered.
2024-03-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111683
* tree-predcom.cc (pcom_worker::suitable_component_p): If has_write
and comp_step is RS_NONZERO, return false if any reference in the
component doesn't have DR_STEP a multiple of access size.
* gcc.dg/pr111683-1.c: New test.
* gcc.dg/pr111683-2.c: New test.
On x86 and avr some address spaces allow 0 pointers (on avr actually
even generic as, but libsanitizer isn't ported to it and
I'm not convinced we should completely kill -fsanitize=null in that
case).
The following patch makes sure those aren't diagnosed for -fsanitize=null,
though they are still sanitized for -fsanitize=alignment.
2024-03-22 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/111736
* ubsan.cc (ubsan_expand_null_ifn, instrument_mem_ref): Avoid
SANITIZE_NULL instrumentation for non-generic address spaces
for which targetm.addr_space.zero_address_valid (as) is true.
Jakub Jelinek [Wed, 20 Mar 2024 16:00:51 +0000 (17:00 +0100)]
visium: Fix up visium_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, visium seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/visium/visium.cc (visium_setup_incoming_varargs): Only skip
TARGET_FUNCTION_ARG_ADVANCE for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 16:00:08 +0000 (17:00 +0100)]
nios2: Fix up nios2_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, nios2 seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/nios2/nios2.cc (nios2_setup_incoming_varargs): Only skip
nios2_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 15:59:56 +0000 (16:59 +0100)]
nds32: Fix up nds32_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, nds32 seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/nds32/nds32.cc (nds32_setup_incoming_varargs): Only skip
function arg advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 15:59:43 +0000 (16:59 +0100)]
m32r: Fix up m32r_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, m32r seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/m32r/m32r.cc (m32r_setup_incoming_varargs): Only skip
function arg advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 15:59:32 +0000 (16:59 +0100)]
ft32: Fix up ft32_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, ft32 seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/ft32/ft32.cc (ft32_setup_incoming_varargs): Only skip
function arg advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 15:59:21 +0000 (16:59 +0100)]
epiphany: Fix up epiphany_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, epiphany seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/epiphany/epiphany.cc (epiphany_setup_incoming_varargs): Only
skip function arg advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 15:59:08 +0000 (16:59 +0100)]
csky: Fix up csky_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, csky seems to be affected too.
Just visually checked differences in c23-stdarg-9.c assembly in a cross
without/with the patch, committed to trunk.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/csky/csky.cc (csky_setup_incoming_varargs): Only skip
csky_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Wed, 20 Mar 2024 09:34:51 +0000 (10:34 +0100)]
system.h: rename vec_step to workaround powerpc/clang bug [PR114369]
On Sat, Jul 20, 2019 at 05:26:57PM +0100, Richard Sandiford wrote:
> Gerald Pfeifer <gerald@pfeifer.com> writes:
> > I have seen an increasing number of reports of GCC failing to
> > build with clang on powerpc (on FreeBSD, though that's probably
> > immaterial).
> >
> > Turns out that clang has vec_step as a reserved word on powerpc
> > with AltiVec.
> >
> > We OTOH use vec_step s as a variable name in gcc/tree-vect-loop.c.
> >
> >
> > The best approach I can see is to rename vec_step. Before I prepare
> > a patch: what alternate name/spelling would you prefer?
>
> Would it work to #define vec_step to vec_step_ or something on affected
> hosts, say in system.h?
>
> I'd prefer that to renmaing since "vec_step" does seem the most natural
> name for the variable. The equivalent scalar variable is "step" and
> other vector values in the surrounding code also use the "vec_" prefix.
So like this?
If/when clang finally fixes https://github.com/llvm/llvm-project/issues/85579
on their side, we can then limit it to clang versions which still have the
bug.
I've git grepped for vec_set and appart from altivec.h it is just used in
tree-vect-loop.cc, some Ada files which aren't preprocessed, ChangeLogs,
rs6000-vecdefines.h (but that header is only included from altivec.h and
vec_step is then redefined to the function-like macro) and in rs6000-overload.def
but that file is processed with a generator, not included in C/C++ sources.
2024-03-20 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/114369
* system.h (vec_step): Define to vec_step_ when compiling
with clang on PowerPC.
Jakub Jelinek [Tue, 19 Mar 2024 08:49:59 +0000 (09:49 +0100)]
arc: Fix up arc_setup_incoming_varargs [PR114175]
Like for x86-64, alpha or rs6000, arc seems to be affected too.
2024-03-19 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/arc/arc.cc (arc_setup_incoming_varargs): Only skip
arc_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Like in the r14-9503 change on x86-64, I think Alpha also needs to
function_arg_advance after the hidden return pointer argument if
any.
At least, the following patch changes the assembly of s1-s6 functions
on the https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647956.html
c23-stdarg-9.c testcase, and eyeballing the assembly for int f8 (...)
the ... args are passed in 16..21 registers and then on the stack,
while for struct S s8 (...) have hidden return pointer passed in 16
register and ... args in 17..21 registers and then on the stack, and
seems without this patch the incoming varargs setup does the wrong thing
(but I can't test on alpha easily).
Many targets seem to be unaffected, e.g. aarch64, arm, s390*, so I'm not
trying to change all targets together because such a change clearly isn't
needed e.g. for targets which use special register for the hidden return
pointer.
2024-03-19 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/alpha/alpha.cc (alpha_setup_incoming_varargs): Only skip
function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Tue, 19 Mar 2024 08:13:32 +0000 (09:13 +0100)]
rs6000: Fix up setup_incoming_varargs [PR114175]
The c23-stdarg-8.c test (as well as the new test below added to cover even
more cases) FAIL on powerpc64le-linux and presumably other powerpc* targets
as well.
Like in the r14-9503-g218d174961 change on x86-64 we need to advance
next_cum after the hidden return pointer argument even in case where
there are no user arguments before ... in C23.
The following patch does that.
There is another TYPE_NO_NAMED_ARGS_STDARG_P use later on:
if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
&& targetm.calls.must_pass_in_stack (arg))
first_reg_offset += rs6000_arg_size (TYPE_MODE (arg.type), arg.type);
but I believe it was added there in r13-3549-g4fe34cdc unnecessarily,
when there is no hidden return pointer argument, arg.type is NULL and
must_pass_in_stack_var_size as well as must_pass_in_stack_var_size_or_pad
return false in that case, and for the TYPE_NO_NAMED_ARGS_STDARG_P
case with hidden return pointer argument that argument should have pointer
type and it is the first argument, so must_pass_in_stack shouldn't be true
for it either.
2024-03-19 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/rs6000/rs6000-call.cc (setup_incoming_varargs): Only skip
rs6000_function_arg_advance_1 for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
Jakub Jelinek [Sat, 16 Mar 2024 14:16:33 +0000 (15:16 +0100)]
i386: Fix setup of incoming varargs for (...) functions which return large aggregates [PR114175]
The c23-stdarg-6.c testcase I've added recently apparently works fine with
-O0 but aborts with -O1 and higher on x86_64-linux.
The problem is in setup of incoming varargs.
Like function.cc before r14-9249 even ix86_setup_incoming_varargs assumes
that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there
is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
If the setup_incoming_varargs hook is called from the
if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl))
&& fnargs.is_empty ())
{
struct assign_parm_data_one data = {};
assign_parms_setup_varargs (&all, &data, false);
}
spot, i.e. where there is no hidden return argument passed, arg.type
is always NULL, while when it is called in the
if (cfun->stdarg && !DECL_CHAIN (parm))
assign_parms_setup_varargs (&all, &data, false);
spot, even when it is TYPE_NO_NAMED_ARGS_STDARG_P arg.type will be non-NULL.
The tree-stdarg.cc pass in f in c23-stdarg-6.cc at -O1 or higher determines
that va_arg is used on integral types at most twice (loads 2 words),
and because ix86_setup_incoming_varargs doesn't advance, the code saves
just the %rdi and %rsi registers to the save area. But that isn't correct,
it should save %rsi and %rdx because %rdi is the hidden return argument.
With -O0 tree-stdarg.cc doesn't attempt to optimize and we save all the
registers, so it works fine in that case.
Now, I think we'll need the same fix also on
aarch64, alpha, arc, csky, ia64, loongarch, mips, mmix, nios2, riscv, visium
which have pretty much the similarly looking snippet in their hooks
changed by the r13-3549 commit.
Then arm, epiphany, fr30, frv, ft32, m32r, mcore, nds32, rs6000, sh
have different changes but most likely need something similar too.
I don't have access to most of those, could test aarch64 and rs6000 I guess.
2024-03-16 Jakub Jelinek <jakub@redhat.com>
PR target/114175
* config/i386/i386.cc (ix86_setup_incoming_varargs): Only skip
ix86_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.
* gcc.dg/c23-stdarg-7.c: New test.
* gcc.dg/c23-stdarg-8.c: New test.
Harald Anlauf [Tue, 13 Feb 2024 19:19:10 +0000 (20:19 +0100)]
Fortran: fix passing of optional dummies to bind(c) procedures [PR113866]
PR fortran/113866
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_procedure_call): When passing an optional
dummy argument to an optional dummy argument of a bind(c) procedure
and the dummy argument is passed via a CFI descriptor, no special
presence check and passing of a default NULL pointer is needed.
Paul Thomas [Tue, 23 May 2023 05:46:37 +0000 (06:46 +0100)]
Fortran: Fix assumed length chars and len inquiry [PR103716]
2023-05-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/103716
* resolve.cc (gfc_resolve_ref): Conversion of array_ref into an
element should be done for all characters without a len expr,
not just deferred lens, and for integer expressions.
* trans-expr.cc (conv_inquiry): For len and kind inquiry refs,
set the se string_length to NULL_TREE.
gcc/testsuite/
PR fortran/103716
* gfortran.dg/pr103716.f90 : New test.
asan: Handle poly-int sizes in ASAN_MARK [PR97696]
This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects. The expansion itself was already generic
enough, but the tests for the fast path were too strict.
gcc/
PR sanitizer/97696
* asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.
gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.
Richard Biener [Fri, 4 Aug 2023 09:24:49 +0000 (11:24 +0200)]
tree-optimization/110838 - less aggressively fold out-of-bound shifts
The following adjusts the shift simplification patterns to avoid
touching out-of-bound shift value arithmetic right shifts of
possibly negative values. While simplifying those to zero isn't
wrong it's violating the principle of least surprise.
PR tree-optimization/110838
* match.pd (([rl]shift @0 out-of-bounds) -> zero): Restrict
the arithmetic right-shift case to non-negative operands.
Richard Biener [Thu, 27 Jul 2023 11:08:32 +0000 (13:08 +0200)]
tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C
The following fixes the lack of simplification of a vector shift
by an out-of-bounds shift value. For scalars this is done both
by CCP and VRP but vectors are not handled there. This results
in PR91838 differences in outcome dependent on whether a vector
shift ISA is available and thus vector lowering does or does not
expose scalar shifts here.
The following adds a match.pd pattern to catch uniform out-of-bound
shifts, simplifying them to zero when not sanitizing shift amounts.
PR tree-optimization/91838
* gimple-match-head.cc: Include attribs.h and asan.h.
* generic-match-head.cc: Likewise.
* match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
Joseph Myers [Wed, 31 Jan 2024 21:39:53 +0000 (21:39 +0000)]
c: Fix ICE for nested enum redefinitions with/without fixed underlying type [PR112571]
Bug 112571 reports an ICE-on-invalid for cases where an enum is
defined, without a fixed underlying type, inside the enum type
specifier for a definition of that same enum with a fixed underlying
type.
The ultimate cause is attempting to access ENUM_UNDERLYING_TYPE in a
case where it is NULL. Avoid this by clearing
ENUM_FIXED_UNDERLYING_TYPE_P in thie case of inconsistent definitions.
Bootstrapped wth no regressions for x86_64-pc-linux-gnu.
(Note: for this GCC 13 branch backport, the tests were changed to use
-std=c2x not -std=c23, and c23-enum-9.c was changed to expect
different diagnostics because GCC 13 branch doesn't have the C23 tag
compatibility support for redefinitions of tagged types and
enumerators.)
PR c/112571
gcc/c/
* c-decl.cc (start_enum): Clear ENUM_FIXED_UNDERLYING_TYPE_P when
defining without a fixed underlying type an enumeration previously
declared with a fixed underlying type.
gcc/testsuite/
* gcc.dg/c23-enum-9.c, gcc.dg/c23-enum-10.c: New tests.
Richard Biener [Tue, 5 Mar 2024 09:55:56 +0000 (10:55 +0100)]
tree-optimization/114231 - use patterns for BB SLP discovery root stmts
The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery. We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.
PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.
Richard Biener [Wed, 13 Dec 2023 13:23:31 +0000 (14:23 +0100)]
tree-optimization/112793 - SLP of constant/external code-generated twice
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.
PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.
Richard Biener [Mon, 29 Jan 2024 08:47:31 +0000 (09:47 +0100)]
middle-end/113622 - allow .VEC_SET and .VEC_EXTRACT for global hard regs
The following expands .VEC_SET and .VEC_EXTRACT instruction selection
to global hard registers, not only automatic variables (possibly)
promoted to registers. This can avoid some ICEs later and create
better code.
PR middle-end/113622
* gimple-isel.cc (gimple_expand_vec_set_extract_expr):
Also allow DECL_HARD_REGISTER variables.
For precision less than int we apply the adjustment to make it defined
at zero after the adjustment to make it compute CLZ rather than CTZ.
That's wrong.
PR tree-optimization/114203
* tree-ssa-loop-niter.cc (build_cltz_expr): Apply CTZ->CLZ
adjustment before making the result defined at zero.
Richard Biener [Thu, 29 Feb 2024 08:22:19 +0000 (09:22 +0100)]
middle-end/114070 - VEC_COND_EXPR folding
The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering. This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via vcond{,u,eq}, most
susceptible is the loop unrolling pass which applies VN and thus
possibly folding to the unrolled body of a vectorized loop.
This gets back the folding for targets that cannot do vectorization.
It doesn't get back the folding for x86 with AVX512 for example
since that can handle the original IL but not the folded since
it misses some vcond_mask expanders.
PR middle-end/114070
* match.pd ((c ? a : b) op d --> c ? (a op d) : (b op d)):
Allow the folding if before lowering and the current IL
isn't supported with vcond_mask.
The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.
This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.
PR middle-end/114070
* genmatch.cc (parser::parse_c_expr): Do not record operand
lists but only mark operators used.
* match.pd ((c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e)):
Properly guard the case of tcc_comparison changing the VEC_COND
value operand type.
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
PR tree-optimization/114027
* tree-vect-loop.cc (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.
Richard Biener [Wed, 14 Feb 2024 11:33:13 +0000 (12:33 +0100)]
tree-optimization/113910 - huge compile time during PTA
For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members. This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits.
The major problem with it is that it simply truncates the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits.
This reduces the compile-time for the testcase from tens of minutes
to 42 seconds and PTA time from 99% to 46%.
PR tree-optimization/113910
* bitmap.cc (bitmap_hash): Mix the full element "hash" to
the hashval_t hash.
Richard Biener [Mon, 22 Jan 2024 14:42:59 +0000 (15:42 +0100)]
debug/112718 - reset all type units with -ffat-lto-objects
When mixing -flto, -ffat-lto-objects and -fdebug-type-section we
fail to reset all type units after early output resulting in an
ICE when attempting to add then duplicate sibling attributes.
PR debug/112718
* dwarf2out.cc (dwarf2out_finish): Reset all type units
for the fat part of an LTO compile.
Jeevitha [Thu, 21 Mar 2024 04:34:46 +0000 (23:34 -0500)]
rs6000: Don't ICE when compiling the __builtin_vsx_splat_2di [PR113950]
When we expand the __builtin_vsx_splat_2di built-in, we were allowing immediate
value for second operand which causes an unrecognizable insn ICE. Even though
the immediate value was forced into a register, it wasn't correctly assigned
to the second operand. So corrected the assignment of op1 to operands[1].
François Dumont [Sun, 17 Mar 2024 18:06:55 +0000 (19:06 +0100)]
libstdc++: Fix N3344 behavior on _Safe_iterator::_M_can_advance
We shall be able to advance from a 0 offset a value-initialized iterator.
libstdc++-v3/ChangeLog:
* include/debug/safe_iterator.tcc (_Safe_iterator<>::_M_can_advance):
Accept 0 offset advance on value-initialized iterator.
* testsuite/23_containers/vector/debug/n3644.cc: New test case.
Unordered container local_iterator range shall not contain any singular
iterator unless both iterators are both value-initialized.
libstdc++-v3/ChangeLog:
* include/debug/safe_local_iterator.tcc
(_Safe_local_iterator::_M_valid_range): Add _M_value_initialized and
_M_singular checks.
* testsuite/23_containers/unordered_set/debug/114316.cc: New test case.
Recent PR111822 fix implemented REG_EH_REGION note copying to a STV converted
preload instruction in general_scalar_chain::convert_op. However, the same
issue remains in timode_scalar_chain::convert_op. Instead of copying the
newly introduced code to timode_scalar_chain::convert_op, the patch unifies
both functions to a common function.
PR target/111822
gcc/ChangeLog:
* config/i386/i386-features.cc (smode_convert_cst): New function
to handle SImode, DImode and TImode immediates.
(scalar_chain::convert_op): Unify from
general_scalar_chain::convert_op and timode_scalar_chain::convert_op.
(general_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_op): Remove.
* config/i386/i386-features.h (class scalar_chain):
Redeclare convert_op as protected class member.
(class general_calar_chain): Remove convert_op.
(class timode_scalar_chain): Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr111822.C (dg-do): Compile only for ia32 targets.
(dg-options): Add -march=x86-64.
Jonathan Wakely [Wed, 13 Mar 2024 10:02:12 +0000 (10:02 +0000)]
libstdc++: Move test error_category to global scope
A recent GDB change causes this test to fail due to missing RTTI for the
custom_cast type. This is presumably because the custom_cat type was
defined as a local class, so has no linkage. Moving it to local scope
seems to fix the test regressions, and probably makes the test more
realistic as a local class with no linkage isn't practical to use as an
error category that almost certainly needs to be referred to in other
scopes.
libstdc++-v3/ChangeLog:
* testsuite/libstdc++-prettyprinters/cxx11.cc: Move custom_cat
to namespace scope.
The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.
gcc/ChangeLog:
PR target/114160
* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.
François Dumont [Thu, 14 Mar 2024 21:13:57 +0000 (22:13 +0100)]
libstdc++: Implement N3644 on _Safe_iterator<> [PR114316]
Consider range of value-initialized iterators as valid and empty.
libstdc++-v3/ChangeLog:
PR libstdc++/114316
* include/debug/safe_iterator.tcc (_Safe_iterator<>::_M_valid_range):
First check if both iterators are value-initialized before checking if
singular.
* testsuite/23_containers/set/debug/114316.cc: New test case.
* testsuite/23_containers/vector/debug/114316.cc: New test case.
Jonathan Wakely [Tue, 15 Aug 2023 15:35:22 +0000 (16:35 +0100)]
libstdc++: Simplify chrono::__units_suffix using std::format
For std::chrono formatting we can simplify __units_suffix by using
std::format_to to generate the "[n/m]s" suffix with the correct
character type and write directly to the output iterator, so it doesn't
need to be widened using ctype. We can't remove the use of ctype::widen
for formatting a time zone abbreviation as a wide string, because that
can contain arbitrary characters that can't be widened by
__to_wstring_numeric.
This also fixes a bug in the chrono formatter for %Z which created a
dangling wstring_view.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__units_suffix_misc): Remove.
(__units_suffix): Return a known suffix as string view, do not
write unknown suffixes to a buffer.
(__fmt_units_suffix): New function that formats the suffix using
std::format_to.
(operator<<, __chrono_formatter::_M_q): Use __fmt_units_suffix.
(__chrono_formatter::_M_Z): Correct lifetime of wstring.
I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.
Jakub Jelinek [Thu, 14 Mar 2024 08:57:13 +0000 (09:57 +0100)]
gimple-iterator: Some gsi_safe_insert_*before fixes
When trying to use the gsi_safe_insert*before APIs in bitint lowering,
I've discovered 3 issues and the following patch addresses those:
1) both split_block and split_edge update CDI_DOMINATORS if they are
available, but because edge_before_returns_twice_call first splits
and then adds an extra EDGE_ABNORMAL edge and then removes another
one, the immediate dominators of both the new bb and the bb with
returns_twice call need to change
2) the new EDGE_ABNORMAL edge had uninitialized probability; this patch
copies the probability from the edge that is going to be removed
and similarly copies other flags (EDGE_EXECUTABLE, EDGE_DFS_BACK,
EDGE_IRREDUCIBLE_LOOP etc.)
3) if edge_before_returns_twice_call splits a block, then the bb with
returns_twice call changes, so the gimple_stmt_iterator for it is
no longer accurate, it points to the right statement, but gsi_bb
and gsi_seq are no longer correct; the patch updates it
2024-03-14 Jakub Jelinek <jakub@redhat.com>
* gimple-iterator.cc (edge_before_returns_twice_call): Copy all
flags and probability from ad_edge to e edge. If CDI_DOMINATORS
are computed, recompute immediate dominator of other_edge->src
and other_edge->dest.
(gsi_safe_insert_before, gsi_safe_insert_seq_before): Update *iter
for the returns_twice call case to the gsi_for_stmt (stmt) to deal
with update it for bb splitting.
Jakub Jelinek [Wed, 13 Mar 2024 08:19:05 +0000 (09:19 +0100)]
asan: Fix ICE during instrumentation of returns_twice calls [PR112709]
The following patch on top of the previously posted ubsan/gimple-iterator
one handles asan the same. While the case of returning by hidden reference
is handled differently because of the first recently posted asan patch,
this deals with instrumentation of the aggregates returned in registers
case as well as instrumentation of loads from aggregate memory in the
function arguments of returns_twice calls.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112709
* asan.cc (maybe_create_ssa_name, maybe_cast_to_ptrmode,
build_check_stmt, maybe_instrument_call, asan_expand_mark_ifn): Use
gsi_safe_insert_before instead of gsi_insert_before.
Jakub Jelinek [Wed, 13 Mar 2024 08:16:45 +0000 (09:16 +0100)]
gimple-iterator, ubsan: Fix ICE during instrumentation of returns_twice calls [PR112709]
ubsan, asan (both PR112709) and _BitInt lowering (PR113466) want to
insert some instrumentation or adjustment statements before some statement.
This unfortunately creates invalid IL if inserting before a returns_twice
call, because we require that such calls are the first statement in a basic
block and that we have an edge from the .ABNORMAL_DISPATCHER block to
the block containing the returns_twice call (in addition to other edge(s)).
The following patch adds helper functions for such insertions and uses it
for now in ubsan (I'll post a follow up which uses it in asan and will
work later on the _BitInt lowering PR).
In particular, if the bb with returns_twice call at the start has just
2 edges, one EDGE_ABNORMAL from .ABNORMAL_DISPATCHER and another
(non-EDGE_ABNORMAL/EDGE_EH) from some other bb, it just inserts the
statement or sequence on that other edge.
If the bb has more predecessor edges or the one not from
.ABNORMAL_DISPATCHER is e.g. an EH edge (this latter case likely shouldn't
happen, one would need labels or something like that), the patch splits the
block with returns_twice call such that there is just one edge next to
.ABNORMAL_DISPATCHER edge and adjusts PHIs as needed to make it happen.
The functions also replace uses of PHIs from the returns_twice bb with
the corresponding PHI arguments, because otherwise it would be invalid IL.
E.g. in ubsan/pr112709-2.c (qux) we have before the ubsan pass
<bb 10> :
# .MEM_5(ab) = PHI <.MEM_4(9), .MEM_25(ab)(11)>
# _7(ab) = PHI <_20(9), _8(ab)(11)>
# .MEM_21(ab) = VDEF <.MEM_5(ab)>
_22 = bar (*_7(ab));
where bar is returns_twice call and bb 11 has .ABNORMAL_DISPATCHER call,
this patch instruments it like:
<bb 9> :
# .MEM_4 = PHI <.MEM_17(ab)(4), .MEM_10(D)(5), .MEM_14(ab)(8)>
# DEBUG BEGIN_STMT
# VUSE <.MEM_4>
_20 = p;
# .MEM_27 = VDEF <.MEM_4>
.UBSAN_NULL (_20, 0B, 0);
# VUSE <.MEM_27>
_2 = __builtin_dynamic_object_size (_20, 0);
# .MEM_28 = VDEF <.MEM_27>
.UBSAN_OBJECT_SIZE (_20, 1024, _2, 0);
<bb 10> :
# .MEM_5(ab) = PHI <.MEM_28(9), .MEM_25(ab)(11)>
# _7(ab) = PHI <_20(9), _8(ab)(11)>
# .MEM_21(ab) = VDEF <.MEM_5(ab)>
_22 = bar (*_7(ab));
The edge from .ABNORMAL_DISPATCHER is there just to represent the
returning for 2nd and later times, the instrumentation can't be
done at that point as there is no code executed during that point.
The ubsan/pr112709-1.c testcase includes non-virtual PHIs to cover
the handling of those as well.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112709
* gimple-iterator.h (gsi_safe_insert_before,
gsi_safe_insert_seq_before): Declare.
* gimple-iterator.cc: Include gimplify.h.
(edge_before_returns_twice_call, adjust_before_returns_twice_call,
gsi_safe_insert_before, gsi_safe_insert_seq_before): New functions.
* ubsan.cc (instrument_mem_ref, instrument_pointer_overflow,
instrument_nonnull_arg, instrument_nonnull_return): Use
gsi_safe_insert_before instead of gsi_insert_before.
(maybe_instrument_pointer_overflow): Use force_gimple_operand,
gimple_seq_add_seq_without_update and gsi_safe_insert_seq_before
instead of force_gimple_operand_gsi.
(instrument_object_size): Likewise. Use gsi_safe_insert_before
instead of gsi_insert_before.
* gcc.dg/ubsan/pr112709-1.c: New test.
* gcc.dg/ubsan/pr112709-2.c: New test.
Jakub Jelinek [Fri, 15 Mar 2024 09:46:47 +0000 (10:46 +0100)]
i386: Fix a pasto in ix86_expand_int_sse_cmp [PR114339]
In r13-3803-gfa271afb58 I've added an optimization for LE/LEU/GE/GEU
comparison against CONST_VECTOR. As the comments say:
/* x <= cst can be handled as x < cst + 1 unless there is
wrap around in cst + 1. */
...
/* For LE punt if some element is signed maximum. */
...
/* For LEU punt if some element is unsigned maximum. */
and
/* x >= cst can be handled as x > cst - 1 unless there is
wrap around in cst - 1. */
...
/* For GE punt if some element is signed minimum. */
...
/* For GEU punt if some element is zero. */
Apparently I wrote the GE/GEU (second case) first and then
copied/adjusted it for LE/LEU, most of the adjustments look correct, but
I've left if (code == GE) comparison when testing if it should punt for
signed maximum. That condition is never true, because this is in
switch (code) { ... case LE: case LEU: block and we really meant to
be what the comment says, for LE punt if some element is signed maximum,
as then cst + 1 wraps around.
The following patch fixes the pasto.
2024-03-15 Jakub Jelinek <jakub@redhat.com>
PR target/114339
* config/i386/i386-expand.cc (ix86_expand_int_sse_cmp) <case LE>: Fix
a pasto, compare code against LE rather than GE.
Jakub Jelinek [Thu, 14 Mar 2024 16:48:30 +0000 (17:48 +0100)]
icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that. There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.
Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies. I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.
So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).
Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge. I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account. I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info. And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113907
* ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.
Jakub Jelinek [Thu, 14 Mar 2024 13:09:20 +0000 (14:09 +0100)]
aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]
The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.
So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.
2024-03-14 Jakub Jelinek <jakub@redhat.com>
PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.
Jakub Jelinek [Thu, 7 Mar 2024 09:02:49 +0000 (10:02 +0100)]
bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]
The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
/* If the architecture does not have unconditional branches that
can span all of memory, convert crossing unconditional branches
into indirect jumps. Since adding an indirect jump also adds
a new register usage, update the register usage information as
well. */
if (!HAS_LONG_UNCOND_BRANCH)
fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.
I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.
Jakub Jelinek [Tue, 5 Mar 2024 09:32:38 +0000 (10:32 +0100)]
lower-subreg: Fix ROTATE handling [PR114211]
On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
(nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move. The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.
2024-03-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.
Jakub Jelinek [Mon, 4 Mar 2024 09:04:19 +0000 (10:04 +0100)]
i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]
The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.
As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.
The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/114184
* config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.
Harald Anlauf [Mon, 11 Mar 2024 21:05:51 +0000 (22:05 +0100)]
Fortran: handle procedure pointer component in DT array [PR110826]
gcc/fortran/ChangeLog:
PR fortran/110826
* array.cc (gfc_array_dimen_size): When walking the ref chain of an
array and the ultimate component is a procedure pointer, do not try
to figure out its dimension even if it is a array-valued function.
gcc/testsuite/ChangeLog:
PR fortran/110826
* gfortran.dg/proc_ptr_comp_53.f90: New test.
Harald Anlauf [Mon, 4 Dec 2023 21:44:53 +0000 (22:44 +0100)]
Fortran: allow RESTRICT qualifier also for optional arguments [PR100988]
gcc/fortran/ChangeLog:
PR fortran/100988
* gfortran.h (IS_PROC_POINTER): New macro.
* trans-types.cc (gfc_sym_type): Use macro in determination if the
restrict qualifier can be used for a dummy variable. Fix logic to
allow the restrict qualifier also for optional arguments, and to
not apply it to pointer or proc_pointer arguments.
Harald Anlauf [Fri, 1 Mar 2024 18:21:27 +0000 (19:21 +0100)]
Fortran: improve checks of NULL without MOLD as actual argument [PR104819]
gcc/fortran/ChangeLog:
PR fortran/104819
* check.cc (gfc_check_null): Handle nested NULL()s.
(is_c_interoperable): Check for MOLD argument of NULL() as part of
the interoperability check.
* interface.cc (gfc_compare_actual_formal): Extend checks for NULL()
actual arguments for presence of MOLD argument when required by
Interp J3/22-146.
gcc/testsuite/ChangeLog:
PR fortran/104819
* gfortran.dg/assumed_rank_9.f90: Adjust testcase use of NULL().
* gfortran.dg/pr101329.f90: Adjust testcase to conform to interp.
* gfortran.dg/null_actual_4.f90: New test.
we must copy the REG_EH_REGION note to the first insn and split the block
after the newly added insn. The REG_EH_REGION on the second insn will be
removed later since it no longer traps.
Marc Poulhiès [Mon, 6 Mar 2023 11:15:13 +0000 (12:15 +0100)]
ada: Fix (again) incorrect handling of Aggregate aspect
Previous fix stopped the processing of the Aggregate aspect early,
skipping the call to Record_Rep_Item, making later call to
Resolve_Container_Aggregate fail.
Also, the previous fix would not handle correctly the case where the
type is private and the check for non-array type can only be done at the
freeze point with the full type.
Adapt the resolving of the aspect when the input is not correct and the
parameters can't be resolved.
gcc/ada/
* sem_ch13.adb (Analyze_One_Aspect): Call Record_Rep_Item.
(Check_Aspect_At_Freeze_Point): Check the aspect is specified on
non-array type only...
(Analyze_One_Aspect): ... instead of doing it too early here.
* sem_aggr.adb (Resolve_Container_Aggregate): Do nothing in case
the parameters failed to resolve.
On arm-none-eabi, the test case fails with
.../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct connection' pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
The error was fixed in basepoints/gcc-14-6517-gb7e4a4c626e, but it
was considered to be a too big change to be backported and thus, the
failing test is marked xfail in GCC13.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added dg-bogus with xfail on offending line for short_enums.
Jonathan Wakely [Fri, 1 Mar 2024 20:55:10 +0000 (20:55 +0000)]
libstdc++: Update expiry times for leap seconds lists
The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in <chrono> (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.
libstdc++-v3/ChangeLog:
* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.
Jonathan Wakely [Wed, 28 Feb 2024 15:05:08 +0000 (15:05 +0000)]
libstdc++: Fix std::basic_format_arg::handle for BasicFormatters
std::basic_format_arg::handle is supposed to format its value as const
if that is valid, to reduce the number of instantiations of the
formatter's format function. I made a silly typo so that it checks
formattable_with<TD, Context> not formattable_with<const TD, Context>,
which breaks support for BasicFormatters i.e. ones that can only format
non-const types.
There's a static_assert in the handle constructor which is supposed to
improve diagnostics for trying to format a const argument with a
formatter that doesn't support it. That condition can't fail, because
the std::basic_format_arg constructor is already constrained to check
that the argument type is formattable. The static_assert can be removed.
libstdc++-v3/ChangeLog:
* include/std/format (basic_format_arg::handle::__maybe_const_t):
Fix condition to check if const type is formattable.
(basic_format_arg::handle::handle(T&)): Remove redundant
static_assert.
* testsuite/std/format/formatter/basic.cc: New test.
Jonathan Wakely [Sun, 7 Jan 2024 22:21:08 +0000 (22:21 +0000)]
libstdc++: Implement P2905R2 "Runtime format strings" for C++20
This change makes std::make_format_args refuse to create dangling
references to temporaries. This makes the std::vformat API safer. This
was approved in Kona 2023 as a DR for C++20 so the change is implemented
unconditionally.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono): Always use
lvalue arguments to make_format_args.
* include/std/format (make_format_args): Change parameter pack
from forwarding references to lvalue references. Remove use of
remove_reference_t which is now unnecessary.
(format_to, formatted_size): Remove incorrect forwarding of
arguments.
* testsuite/20_util/duration/io.cc: Use lvalues as arguments to
make_format_args.
* testsuite/std/format/arguments/args.cc: Likewise.
* testsuite/std/format/arguments/lwg3810.cc: Likewise.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/vformat_to.cc: Likewise.
* testsuite/std/format/string.cc: Likewise.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/format/arguments/args_neg.cc: New test.