Nathaniel Shead [Thu, 29 Feb 2024 11:49:13 +0000 (22:49 +1100)]
c++: Ensure DECL_CONTEXT is set for temporary vars [PR114005]
Modules streaming requires DECL_CONTEXT to be set for anything streamed.
This patch ensures that 'create_temporary_var' does set a DECL_CONTEXT
for these variables (such as the backing storage for initializer_lists)
even if not inside a function declaration.
PR c++/114005
gcc/cp/ChangeLog:
* init.cc (create_temporary_var): Use current_scope instead of
current_function_decl.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114005_a.C: New test.
* g++.dg/modules/pr114005_b.C: New test.
Jeff Law [Fri, 1 Mar 2024 21:54:04 +0000 (14:54 -0700)]
[14 regression] Fix insn types in risc-v port
So one of the broad goals we've had over the last few months has been to ensure
that every insn has a scheduling type and that every insn is associated with an
insn reservation in the scheduler.
This avoids some amazingly bad behavior in the scheduler. I won't go through
the gory details.
I was recently analyzing a code quality regression with dhrystone (ugh!) and
one of the issues was poor scheduling which lengthened the lifetime of a pseudo
and ultimately resulted in needing an additional callee saved register
save/restore.
This was ultimately tracked down incorrect types on a few patterns. So I did
an audit of all the patterns that had types added/changed as part of this
effort and found a variety of problems, primarily in the various move patterns
and extension patterns. This is a regression relative to gcc-13.
Naturally the change in types affects scheduling, which in turn changes the
precise code we generate and causes some testsuite fallout.
I considered updating the regexps since the change in the resulting output is
pretty consistent. But of course the test would still be sensitive to things
like load latency. So instead I just turned off the 2nd phase scheduler in the
affected tests.
Patrick Palka [Fri, 1 Mar 2024 21:50:20 +0000 (16:50 -0500)]
c++/modules: complete_vars ICE with non-exported constexpr var
Here after stream-in of the non-exported constexpr global 'A a' we call
maybe_register_incomplete_var, which we'd expect to be a no-op here but
it manages to take its second branch and pushes {a, NULL_TREE} onto
incomplete_vars. Later after defining B we ICE from complete_vars due
to this pushed NULL_TREE class context.
Judging by the two commits that introduced/modified this part of
maybe_register_incomplete_var, r196852 and r214333, it seems this second
branch is only concerned with constexpr static data members (whose
initializer may contain a pointer-to-member for a not-yet-complete class)
So this patch restricts this branch accordingly so it's not inadvertently
taken during stream-in.
gcc/cp/ChangeLog:
* decl.cc (maybe_register_incomplete_var): Restrict second
branch to static data members from a not-yet-complete class.
gcc/testsuite/ChangeLog:
* g++.dg/modules/cexpr-4_a.C: New test.
* g++.dg/modules/cexpr-4_b.C: New test.
Marek Polacek [Thu, 25 Jan 2024 21:38:51 +0000 (16:38 -0500)]
c++: implement [[gnu::no_dangling]] [PR110358]
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site. But #pragma GCC diagnostic tend
to be onerous. A better solution would be to have an attribute.
To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:
template <typename T>
struct [[gnu::no_dangling(std::is_reference_v<T>)]] S {
// ...
};
PR c++/110358
PR c++/109642
gcc/cp/ChangeLog:
* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it. Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.
* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
David Faust [Fri, 1 Mar 2024 18:43:24 +0000 (10:43 -0800)]
testsuite: ctf: make array in ctf-file-scope-1 fixed length
The array member of struct SFOO in the ctf-file-scope-1 caused the test
to fail for the BPF target, since BPF does not support dynamic stack
allocation. The array does not need to variable length for the sake of
the test, so make it fixed length instead to allow the test to run
successfully for the bpf-unknown-none target.
gcc/testsuite/
* gcc.dg/debug/ctf/ctf-file-scope-1.c (SFOO): Make array member
fixed-length.
Harald Anlauf [Fri, 1 Mar 2024 18:21:27 +0000 (19:21 +0100)]
Fortran: improve checks of NULL without MOLD as actual argument [PR104819]
gcc/fortran/ChangeLog:
PR fortran/104819
* check.cc (gfc_check_null): Handle nested NULL()s.
(is_c_interoperable): Check for MOLD argument of NULL() as part of
the interoperability check.
* interface.cc (gfc_compare_actual_formal): Extend checks for NULL()
actual arguments for presence of MOLD argument when required by
Interp J3/22-146.
gcc/testsuite/ChangeLog:
PR fortran/104819
* gfortran.dg/assumed_rank_9.f90: Adjust testcase use of NULL().
* gfortran.dg/pr101329.f90: Adjust testcase to conform to interp.
* gfortran.dg/null_actual_4.f90: New test.
In r12-6773-g09845ad7569bac we gave CTAD placeholders a level of 0 and
ensured we never replaced them via tsubst. It turns out that autos
representing an explicit cast need the same treatment and for the same
reason: such autos appear in an expression context and so their level
gets easily messed up after partial substitution, leading to premature
replacement via an incidental tsubst instead of via do_auto_deduction.
This patch fixes this by extending the r12-6773 approach to auto(x).
PR c++/110025
PR c++/114138
gcc/cp/ChangeLog:
* cp-tree.h (make_cast_auto): Declare.
* parser.cc (cp_parser_functional_cast): If the type is an auto,
replace it with a level-less one via make_cast_auto.
* pt.cc (find_parameter_packs_r): Don't treat level-less auto
as a type parameter pack.
(tsubst) <case TEMPLATE_TYPE_PARM>: Generalize CTAD placeholder
auto handling to all level-less autos.
(make_cast_auto): Define.
(do_auto_deduction): Handle replacement of a level-less auto.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast16.C: New test.
* g++.dg/cpp23/auto-fncast17.C: New test.
* g++.dg/cpp23/auto-fncast18.C: New test.
Jakub Jelinek [Fri, 1 Mar 2024 15:59:08 +0000 (16:59 +0100)]
c++: Fix up decltype of non-dependent structured binding decl in template [PR92687]
finish_decltype_type uses DECL_HAS_VALUE_EXPR_P (expr) check for
DECL_DECOMPOSITION_P (expr) to determine if it is
array/struct/vector/complex etc. subobject proxy case vs. structured
binding using std::tuple_{size,element}.
For non-templates or when templates are already instantiated, that works
correctly, finalized DECL_DECOMPOSITION_P non-base vars indeed have
DECL_VALUE_EXPR in the former case and don't have it in the latter.
It works fine for dependent structured bindings as well, cp_finish_decomp in
that case creates DECLTYPE_TYPE tree and defers the handling until
instantiation.
As the testcase shows, this doesn't work for the non-dependent structured
binding case in templates, because DECL_HAS_VALUE_EXPR_P is set in that case
always; cp_finish_decomp ends with:
if (processing_template_decl)
{
for (unsigned int i = 0; i < count; i++)
if (!DECL_HAS_VALUE_EXPR_P (v[i]))
{
tree a = build_nt (ARRAY_REF, decl, size_int (i),
NULL_TREE, NULL_TREE);
SET_DECL_VALUE_EXPR (v[i], a);
DECL_HAS_VALUE_EXPR_P (v[i]) = 1;
}
}
and those artificial ARRAY_REFs are used in various places during
instantiation to find out what base the DECL_DECOMPOSITION_P VAR_DECLs
have and their positions.
The following patch fixes that by changing lookup_decomp_type, such that
it doesn't ICE when called on a DECL_DECOMPOSITION_P var which isn't in a
hash table, but returns NULL_TREE in that case, and for processing_template_decl
asserts DECL_HAS_VALUE_EXPR_P is non-NULL and just calls lookup_decomp_type.
If it returns non-NULL, it is a structured binding using tuple and its result
is returned, otherwise it falls through to returning unlowered_expr_type (expr)
because it is an array, structure etc. subobject proxy.
For !processing_template_decl it keeps doing what it did before,
DECL_HAS_VALUE_EXPR_P meaning it is an array/structure etc. subobject proxy,
otherwise the tuple case.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR c++/92687
* decl.cc (lookup_decomp_type): Return NULL_TREE if decomp_type_table
doesn't have entry for V.
* semantics.cc (finish_decltype_type): If ptds.saved, assert
DECL_HAS_VALUE_EXPR_P is true and decide on tuple vs. non-tuple based
on if lookup_decomp_type is NULL or not.
Jakub Jelinek [Fri, 1 Mar 2024 16:26:42 +0000 (17:26 +0100)]
OpenMP/C++: Fix (first)private clause with member variables [PR110347]
OpenMP permits '(first)private' for C++ member variables, which GCC handles
by tagging those by DECL_OMP_PRIVATIZED_MEMBER, adding a temporary VAR_DECL
and DECL_VALUE_EXPR pointing to the 'this->member_var' in the C++ front end.
The idea is that in omp-low.cc, the DECL_VALUE_EXPR is used before the
region (for 'firstprivate'; ignored for 'private') while in the region,
the DECL itself is used.
In gimplify, the value expansion is suppressed and deferred if the
lang_hooks.decls.omp_disregard_value_expr (decl, shared)
returns true - which is never the case if 'shared' is true. In OpenMP 4.5,
only 'map' and 'use_device_ptr' was permitted for the 'target' directive.
And when OpenMP 5.0's 'private'/'firstprivate' clauses was added, the
the update that now 'shared' argument could be false was missed. The
respective check has now been added.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tburnus@baylibre.com>
PR c++/110347
gcc/ChangeLog:
* gimplify.cc (omp_notice_variable): Fix 'shared' arg to
lang_hooks.decls.omp_disregard_value_expr for
(first)private in target regions.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-lambda-3.C: Moved from
gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
* testsuite/libgomp.c++/target-lambda-1.C: Modify to also
also work without offloading.
* testsuite/libgomp.c++/firstprivate-1.C: New test.
* testsuite/libgomp.c++/firstprivate-2.C: New test.
* testsuite/libgomp.c++/private-1.C: New test.
* testsuite/libgomp.c++/private-2.C: New test.
* testsuite/libgomp.c++/target-lambda-4.C: New test.
* testsuite/libgomp.c++/use_device_ptr-1.C: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/target-lambda-1.C: Moved to become a
run-time test under testsuite/libgomp.c++.
Jakub Jelinek [Fri, 1 Mar 2024 14:42:52 +0000 (15:42 +0100)]
calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR114136]
On Tue, Feb 27, 2024 at 04:41:32PM +0000, Richard Earnshaw wrote:
> On Arm the PR107453 change is causing all anonymous arguments to be passed on the
> stack, which is incorrect per the ABI. On a target that uses
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
> zero? Is it enough to guard both the statements you've added with
> !targetm.calls.pretend_outgoing_args_named?
The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
it, but it should be handled as if it was non-NULL but list length was 0.
So, for the
if (type_arg_types != 0)
n_named_args
= (list_length (type_arg_types)
/* Count the struct value address, if it is passed as a parm. */
+ structure_value_addr_parm);
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* If we know nothing, treat all args as named. */
n_named_args = num_actuals;
case, I think guarding it by any target hooks is wrong, although
I guess it should have been
n_named_args = structure_value_addr_parm;
instead of
n_named_args = 0;
For the second
if (type_arg_types != 0
&& targetm.calls.strict_argument_naming (args_so_far))
;
else if (type_arg_types != 0
&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
/* Don't include the last named arg. */
--n_named_args;
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* Treat all args as named. */
n_named_args = num_actuals;
I think we should treat those as if type_arg_types was non-NULL
with 0 elements in the list, except the --n_named_args would for
!structure_value_addr_parm lead to n_named_args = -1, I think we want
0 for that case.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114136
* calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
n_named_args initially before INIT_CUMULATIVE_ARGS to
structure_value_addr_parm rather than 0, after it don't modify
it if strict_argument_naming and clear only if
!pretend_outgoing_varargs_named.
Jakub Jelinek [Fri, 1 Mar 2024 13:57:15 +0000 (14:57 +0100)]
dwarf2out: Don't move variable sized aggregates to comdat [PR114015]
The following testcase ICEs, because we decide to move that
struct { char a[n]; } DW_TAG_structure_type into .debug_types section
/ DW_UT_type DWARF5 unit, but refer from there to a DW_TAG_variable
(created artificially for the array bounds).
Even with non-bitint, I think it is just wrong to use .debug_types
section / DW_UT_type for something that uses DW_OP_fbreg and similar
in it, things clearly dependent on a particular function.
In most cases, is_nested_in_subprogram (die) check results in such
aggregates not being moved, but in the function parameter type case
that is not the case.
The following patch fixes it by returning false from should_move_die_to_comdat
for non-constant sized aggregate types, i.e. when either we gave up on
adding DW_AT_byte_size for it because it wasn't expressable, or when
it is something non-constant (location description, reference, ...).
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR debug/114015
* dwarf2out.cc (should_move_die_to_comdat): Return false for
aggregates without DW_AT_byte_size attribute or with non-constant
DW_AT_byte_size.
Richard Biener [Thu, 29 Feb 2024 08:22:19 +0000 (09:22 +0100)]
middle-end/114070 - VEC_COND_EXPR folding
The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering. This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via vcond{,u,eq}, most
susceptible is the loop unrolling pass which applies VN and thus
possibly folding to the unrolled body of a vectorized loop.
This gets back the folding for targets that cannot do vectorization.
It doesn't get back the folding for x86 with AVX512 for example
since that can handle the original IL but not the folded since
it misses some vcond_mask expanders.
PR middle-end/114070
* match.pd ((c ? a : b) op d --> c ? (a op d) : (b op d)):
Allow the folding if before lowering and the current IL
isn't supported with vcond_mask.
xuli [Fri, 1 Mar 2024 09:10:12 +0000 (09:10 +0000)]
RISC-V: Add riscv_vector_cc function attribute
Standard vector calling convention variant will only enabled when function
has vector argument or returning value by default, however user may also
want to invoke function without that during a vectorized loop at some situation,
but it will cause a huge performance penalty due to vector register store/restore.
So user can declare function with this riscv_vector_cc attribute like below, that could enforce
function will use standard vector calling convention variant.
void foo() __attribute__((riscv_vector_cc));
[[riscv::vector_cc]] void foo(); // For C++11 and C23
For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67
gcc/ChangeLog:
* config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
attribute to riscv_attribute_table.
(riscv_vector_cc_function_p): Return true if FUNC is a riscv_vector_cc function.
(riscv_fntype_abi): Add riscv_vector_cc attribute check.
* doc/extend.texi: Add riscv_vector_cc attribute description.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New test.
Pan Li [Fri, 23 Feb 2024 07:37:28 +0000 (15:37 +0800)]
RISC-V: Introduce gcc option mrvv-vector-bits for RVV
This patch would like to introduce one new gcc option for RVV. To
appoint the bits size of one RVV vector register. Valid arguments to
'-mrvv-vector-bits=' are:
* scalable
* zvl
The scalable will pick up the zvl*b in the march as the minimal vlen.
For example, the minimal vlen will be 512 when march=rv64gcv_zvl512b
and mrvv-vector-bits=scalable.
The zvl will pick up the zvl*b in the march as exactly vlen.
For example, the vlen will be 1024 exactly when march=rv64gcv_zvl1024b
and mrvv-vector-bits=zvl.
The internal option --param=riscv-autovec-preference will be replaced
by option -mrvv-vector-bits. Aka:
You can also take -fno-tree-vectorize for --param=riscv-autovec-preference=none.
The internal option --param=riscv-autovec-preference is unavailable after this
patch.
With -march=rv64gcv_zvl128b -mrvv-vector-bits=scalable we have (for min_vlen >= 128)
csrr t0,vlenb
sub sp,sp,t0
def v1
vs1r.v v1,0(sp)
vl1re32.v v1,0(sp)
use v1
csrr t0,vlenb
add sp,sp,t0
jr ra
With -march=rv64gcv_zvl128b -mrvv-vector-bits=zvl we have (for vlen = 128)
addi sp,sp,-16
def v1
vs1r.v v1,0(sp)
vl1re32.v v1,0(sp)
use v1
addi sp,sp,16
jr ra
The below test are passed for this patch.
* The riscv fully regression test.
PR target/112817
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Replace
RVV_FIXED_VLMAX to RVV_VECTOR_BITS_ZVL.
* config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Remove.
(enum rvv_vector_bits_enum): New enum for different RVV vector bits.
* config/riscv/riscv-selftests.cc (riscv_run_selftests): Update
comments for option replacement.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Replace enum of
riscv_autovec_preference to rvv_vector_bits.
(vls_mode_valid_p): Ditto.
(estimated_poly_value): Ditto.
* config/riscv/riscv.cc (riscv_convert_vector_chunks): Rename to
vector chunks and honor new option mrvv-vector-bits.
(riscv_override_options_internal): Update comments and rename the
vector chunks.
* config/riscv/riscv.opt: Add option mrvv-vector-bits and remove
internal option param=riscv-autovec-preference.
Jakub Jelinek [Fri, 1 Mar 2024 10:07:36 +0000 (11:07 +0100)]
function: Fix another TYPE_NO_NAMED_ARGS_STDARG_P spot
When looking at PR114175 (although that bug seems to be now a riscv backend
bug), I've noticed that for the TYPE_NO_NAMED_ARGS_STDARG_P functions which
return value through hidden reference, like
#include <stdarg.h>
struct S { char a[64]; };
int n;
struct S
foo (...)
{
struct S s = {};
va_list ap;
va_start (ap);
for (int i = 0; i < n; ++i)
if ((i & 1))
s.a[0] += va_arg (ap, double);
else
s.a[0] += va_arg (ap, int);
va_end (ap);
return s;
}
we were incorrectly calling assign_parms_setup_varargs twice, once
at the start of the function and once in
if (cfun->stdarg && !DECL_CHAIN (parm))
assign_parms_setup_varargs (&all, &data, false);
where parm is the last and only "named" parameter.
The first call, guarded with TYPE_NO_NAMED_ARGS_STDARG_P, was added in
r13-3549 and is needed for int bar (...) etc. functions using
va_start/va_arg/va_end, otherwise the
FOR_EACH_VEC_ELT (fnargs, i, parm)
in which the other call is will not iterate at all. But we shouldn't
be doing that if we have the hidden return pointer.
With the following patch on the above testcase with -O0 -std=c23 the
assembly difference is:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $192, %rsp
.cfi_offset 3, -24
- movq %rdi, -192(%rbp)
- movq %rsi, -184(%rbp)
- movq %rdx, -176(%rbp)
- movq %rcx, -168(%rbp)
- movq %r8, -160(%rbp)
- movq %r9, -152(%rbp)
- testb %al, %al
- je .L2
- movaps %xmm0, -144(%rbp)
- movaps %xmm1, -128(%rbp)
- movaps %xmm2, -112(%rbp)
- movaps %xmm3, -96(%rbp)
- movaps %xmm4, -80(%rbp)
- movaps %xmm5, -64(%rbp)
- movaps %xmm6, -48(%rbp)
- movaps %xmm7, -32(%rbp)
-.L2:
movq %rdi, -312(%rbp)
movq %rdi, -192(%rbp)
movq %rsi, -184(%rbp)
movq %rdx, -176(%rbp)
movq %rcx, -168(%rbp)
movq %r8, -160(%rbp)
movq %r9, -152(%rbp)
testb %al, %al
- je .L13
+ je .L12
movaps %xmm0, -144(%rbp)
movaps %xmm1, -128(%rbp)
movaps %xmm2, -112(%rbp)
movaps %xmm3, -96(%rbp)
movaps %xmm4, -80(%rbp)
movaps %xmm5, -64(%rbp)
movaps %xmm6, -48(%rbp)
movaps %xmm7, -32(%rbp)
-.L13:
+.L12:
plus some renumbering of labels later on which clearly shows
that because of this bug, we were saving all the registers twice
rather then once. With -O2 -std=c23 some of it is DCEd, but we still get
subq $160, %rsp
.cfi_def_cfa_offset 168
- testb %al, %al
- je .L2
- movaps %xmm0, 24(%rsp)
- movaps %xmm1, 40(%rsp)
- movaps %xmm2, 56(%rsp)
- movaps %xmm3, 72(%rsp)
- movaps %xmm4, 88(%rsp)
- movaps %xmm5, 104(%rsp)
- movaps %xmm6, 120(%rsp)
- movaps %xmm7, 136(%rsp)
-.L2:
movq %rdi, -24(%rsp)
movq %rsi, -16(%rsp)
movq %rdx, -8(%rsp)
movq %rcx, (%rsp)
movq %r8, 8(%rsp)
movq %r9, 16(%rsp)
testb %al, %al
- je .L13
+ je .L12
movaps %xmm0, 24(%rsp)
movaps %xmm1, 40(%rsp)
movaps %xmm2, 56(%rsp)
movaps %xmm3, 72(%rsp)
movaps %xmm4, 88(%rsp)
movaps %xmm5, 104(%rsp)
movaps %xmm6, 120(%rsp)
movaps %xmm7, 136(%rsp)
-.L13:
+.L12:
difference, i.e. this time not all, but the floating point args
were conditionally all saved twice.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
* function.cc (assign_parms): Only call assign_parms_setup_varargs
early for TYPE_NO_NAMED_ARGS_STDARG_P functions if fnargs is empty.
Jakub Jelinek [Fri, 1 Mar 2024 10:04:51 +0000 (11:04 +0100)]
bitint: Handle VCE from large/huge _BitInt SSA_NAME from load [PR114156]
When adding checks in which case not to merge a VIEW_CONVERT_EXPR from
large/huge _BitInt to vector/complex etc., I missed the case of loads.
Those are handled differently later.
Anyway, I think the load case is something we can handle just fine,
so the following patch does that instead of preventing the merging
gimple_lower_bitint; we'd then copy from memory to memory and and do the
vce only on the second one, it is just better to vce the first one.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114156
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Allow
rhs1 of a VCE to have no underlying variable if it is a load and
handle that case.
* elf.c (elf_add): Add the symbol table from a debuginfo file.
* Makefile.am (MAKETESTS): Add buildidfull and gnudebuglinkfull
variants of buildid and gnudebuglink tests.
(%_gnudebuglinkfull, %_buildidfull): New patterns.
* Makefile.in: Regenerate.
David Malcolm [Thu, 29 Feb 2024 22:57:08 +0000 (17:57 -0500)]
analyzer: fix ICE in call summarization [PR114159]
PR analyzer/114159 reports an ICE inside playback of call summaries
for very low values of --param=analyzer-max-svalue-depth=VAL.
Root cause is that call_summary_edge_info's ctor tries to evaluate
the function ptr of a gimple call stmt and assumes it gets a function *,
but with low values of --param=analyzer-max-svalue-depth=VAL we get
back an UNKNOWN svalue, rather than a pointer to a specific function.
Fix by adding a new call_info ctor that passes a specific
const function & from the call_summary_edge_info, rather than trying
to compute the function.
In doing so, I noticed that the analyzer was using "function *" despite
not modifying functions, and was sloppy about can-be-null versus
must-be-non-null function pointers, so I "constified" the function, and
converted the many places where the function must be non-null to be
"const function &".
gcc/analyzer/ChangeLog:
PR analyzer/114159
* analyzer.cc: Include "tree-dfa.h".
(get_ssa_default_def): New decl.
* analyzer.h (get_ssa_default_def): New.
* call-info.cc (call_info::call_info): New ctor taking an explicit
called_fn.
* call-info.h (call_info::call_info): Likewise.
* call-summary.cc (call_summary_replay::call_summary_replay):
Convert param from function * to const function &.
* call-summary.h (call_summary_replay::call_summary_replay):
Likewise.
* checker-event.h (state_change_event::get_dest_function):
Constify return value.
* engine.cc (point_and_state::validate): Update for conversion to
const function &.
(exploded_node::on_stmt): Likewise.
(call_summary_edge_info::call_summary_edge_info): Likewise.
Pass in called_fn to call_info ctor.
(exploded_node::replay_call_summaries): Update for conversion to
const function &. Convert per_function_data from * to &.
(exploded_node::replay_call_summary): Update for conversion to
const function &.
(exploded_graph::add_function_entry): Likewise.
(toplevel_function_p): Likewise.
(add_tainted_args_callback): Likewise.
(exploded_graph::build_initial_worklist): Likewise.
(exploded_graph::maybe_create_dynamic_call): Likewise.
(maybe_update_for_edge): Likewise.
(exploded_graph::on_escaped_function): Likewise.
* exploded-graph.h (exploded_node::replay_call_summaries):
Likewise.
(exploded_node::replay_call_summary): Likewise.
(exploded_graph::add_function_entry): Likewise.
* program-point.cc (function_point::from_function_entry):
Likewise.
(program_point::from_function_entry): Likewise.
* program-point.h (function_point::from_function_entry): Likewise.
(program_point::from_function_entry): Likewise.
* program-state.cc (program_state::push_frame): Likewise.
(program_state::get_current_function): Constify return type.
* program-state.h (program_state::push_frame): Update for
conversion to const function &.
(program_state::get_current_function): Likewise.
* region-model-manager.cc
(region_model_manager::get_frame_region): Likewise.
* region-model-manager.h
(region_model_manager::get_frame_region): Likewise.
* region-model.cc (region_model::called_from_main_p): Likewise.
(region_model::update_for_gcall): Likewise.
(region_model::push_frame): Likewise.
(region_model::get_current_function): Constify return type.
(region_model::pop_frame): Update for conversion to
const function &.
(selftest::test_stack_frames): Likewise.
(selftest::test_get_representative_path_var): Likewise.
(selftest::test_state_merging): Likewise.
(selftest::test_alloca): Likewise.
* region-model.h (region_model::push_frame): Likewise.
(region_model::get_current_function): Likewise.
* region.cc (frame_region::dump_to_pp): Likewise.
(frame_region::get_region_for_local): Likewise.
* region.h (class frame_region): Likewise.
* sm-signal.cc (signal_unsafe_call::describe_state_change):
Likewise.
(update_model_for_signal_handler): Likewise.
(signal_delivery_edge_info_t::update_model): Likewise.
(register_signal_handler::impl_transition): Likewise.
* state-purge.cc (class gimple_op_visitor): Likewise.
(state_purge_map::state_purge_map): Likewise.
(state_purge_map::get_or_create_data_for_decl): Likewise.
(state_purge_per_ssa_name::state_purge_per_ssa_name): Likewise.
(state_purge_per_ssa_name::add_to_worklist): Likewise.
(state_purge_per_ssa_name::process_point): Likewise.
(state_purge_per_decl::add_to_worklist): Likewise.
(state_purge_annotator::print_needed): Likewise.
* state-purge.h
(state_purge_map::get_or_create_data_for_decl): Likewise.
(class state_purge_per_tree): Likewise.
(class state_purge_per_ssa_name): Likewise.
(class state_purge_per_decl): Likewise.
* supergraph.cc (supergraph::dump_dot_to_pp): Likewise.
* supergraph.h
(supergraph::get_node_for_function_entry): Likewise.
(supergraph::get_node_for_function_exit): Likewise.
Georg-Johann Lay [Thu, 29 Feb 2024 16:19:27 +0000 (17:19 +0100)]
AVR: target/114100 - Better indirect accesses for reduced Tiny
The Reduced Tiny core does not support indirect addressing with offset,
which basically means that every indirect memory access with a size
of more than one byte is effectively POST_INC or PRE_DEC. The lack of
that addressing mode is currently handled by pretending to support it,
and then let the insn printers add and subtract again offsets as needed.
For example, the following C code
- A post-reload split into "real" instructions during .split2.
- A new avr-specific mini pass .avr-fuse-add that runs before
RTL peephole and that tries to combine the generated pointer
additions into memory accesses to form POST_INC or PRE_DEC.
gcc/
PR target/114100
* doc/invoke.texi (AVR Options) <-mfuse-add>: Document.
* config/avr/avr.opt (-mfuse-add=): New target option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_1_PLUS]: Set -mfuse-add=1.
[OPT_LEVELS_2_PLUS]: Set -mfuse-add=2.
* config/avr/avr-passes.def (avr_pass_fuse_add): Insert new pass.
* config/avr/avr-protos.h (avr_split_tiny_move)
(make_avr_pass_fuse_add): New protos.
* config/avr/avr.md [AVR_TINY]: New post-reload splitter uses
avr_split_tiny_move to split indirect memory accesses.
(gen_move_clobbercc): New define_expand helper.
* config/avr/avr.cc (avr_pass_data_fuse_add): New pass data.
(avr_pass_fuse_add): New class from rtl_opt_pass.
(make_avr_pass_fuse_add, avr_split_tiny_move): New functions.
(reg_seen_between_p, emit_move_ccc, emit_move_ccc_after): New functions.
(avr_legitimate_address_p) [AVR_TINY]: Don't restrict offsets
of PLUS addressing for AVR_TINY.
(avr_regno_mode_code_ok_for_base_p) [AVR_TINY]: Ignore -mstrict-X.
(avr_out_plus_1) [AVR_TINY]: Tweak ++Y and --Y.
(avr_mode_code_base_reg_class) [AVR_TINY]: Always return POINTER_REGS.
Jonathan Wakely [Wed, 28 Feb 2024 15:05:08 +0000 (15:05 +0000)]
libstdc++: Fix std::basic_format_arg::handle for BasicFormatters
std::basic_format_arg::handle is supposed to format its value as const
if that is valid, to reduce the number of instantiations of the
formatter's format function. I made a silly typo so that it checks
formattable_with<TD, Context> not formattable_with<const TD, Context>,
which breaks support for BasicFormatters i.e. ones that can only format
non-const types.
There's a static_assert in the handle constructor which is supposed to
improve diagnostics for trying to format a const argument with a
formatter that doesn't support it. That condition can't fail, because
the std::basic_format_arg constructor is already constrained to check
that the argument type is formattable. The static_assert can be removed.
libstdc++-v3/ChangeLog:
* include/std/format (basic_format_arg::handle::__maybe_const_t):
Fix condition to check if const type is formattable.
(basic_format_arg::handle::handle(T&)): Remove redundant
static_assert.
* testsuite/std/format/formatter/basic.cc: New test.
Jonathan Wakely [Tue, 27 Feb 2024 17:50:34 +0000 (17:50 +0000)]
libstdc++: Fix conditions for using memcmp in std::lexicographical_compare_three_way [PR113960]
The change in r11-2981-g2f983fa69005b6 meant that
std::lexicographical_compare_three_way started to use memcmp for
unsigned integers on big endian targets, but for that to be valid we
need the two value types to have the same size and we need to use that
size to compute the length passed to memcmp.
I already defined a __is_memcmp_ordered_with trait that does the right
checks, std::lexicographical_compare_three_way just needs to use it.
libstdc++-v3/ChangeLog:
PR libstdc++/113960
* include/bits/stl_algobase.h (__is_byte_iter): Replace with ...
(__memcmp_ordered_with): New concept.
(lexicographical_compare_three_way): Use __memcmp_ordered_with
instead of __is_byte_iter. Use correct length for memcmp.
* testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc:
New test.
Georg-Johann Lay [Thu, 29 Feb 2024 17:08:45 +0000 (18:08 +0100)]
AVR: target/114132 - Code sets up a frame pointer without need.
The condition CUMULATIVE_ARGS.nregs == 0 in avr_frame_pointer_required_p()
means that no more argument registers are left, but that's not the same
condition that tells whether an argument pointer is required.
PR target/114132
gcc/
* config/avr/avr.h (CUMULATIVE_ARGS) <has_stack_args>: New field.
* config/avr/avr.cc (avr_init_cumulative_args): Initialize it.
(avr_function_arg): Set it.
(avr_frame_pointer_required_p): Use it instead of .nregs.
gcc/testsuite/
* gcc.target/avr/pr114132-1.c: New test.
* gcc.target/avr/torture/pr114132-2.c: New test.
Marek Polacek [Tue, 20 Feb 2024 20:55:55 +0000 (15:55 -0500)]
c++: -Wuninitialized when binding a ref to uninit DM [PR113987]
This PR asks that our -Wuninitialized for mem-initializers does
not warn when binding a reference to an uninitialized data member.
We already check !INDIRECT_TYPE_P in find_uninit_fields_r, but
that won't catch binding a parameter of a reference type to an
uninitialized field, as in:
struct S { S (int&); };
struct T {
T() : s(i) {}
S s;
int i;
};
This patch adds a new function to handle this case.
Tom Tromey [Tue, 27 Feb 2024 01:21:03 +0000 (18:21 -0700)]
Fix PR libcc1/113977
PR libcc1/113977 points out a case where a simple expression is
rejected with a compiler error message. The bug here is that gdb does
not inform the plugin of the correct alignment -- in fact, there is no
way to do that.
This patch adds a new method to allow the alignment to be set, and
bumps the C front end protocol version.
It also includes some updates to various comments in 'include', done
here to simplify the merge to binutils-gdb.
Tom Tromey [Fri, 23 Feb 2024 02:36:53 +0000 (19:36 -0700)]
Fix version negotiation in libcc1 plugins
This fixes version negotiation in the libcc1 plugins. It's done in a
simple way: the version number from the context object is now passed
to base_gdb_plugin.
The idea behind this is that when the client (gdb) requests version N,
the plugin should respond with the newest version that it knows of
that is backward compatible to N. That is, the connection can be
upgraded. Note that the protocol does not change much, and no
backward incompatibilities have ever been needed.
The C plugin is also changed to advertise GCC_C_FE_VERSION_1.
The version negotiation approach should of course be documented, but I
did that in a subsequent patch, in order to only have one patch
touching the 'include' directory and thus needing a merge to
binutils-gdb.
libcc1
* libcp1.cc (libcp1::libcp1): Use FE version number from context.
* libcc1.cc (libcc1::libcc1): Use FE version number from context.
(c_vtable): Use GCC_C_FE_VERSION_1.
Tom Tromey [Fri, 23 Feb 2024 02:34:23 +0000 (19:34 -0700)]
Change 'v1' float and int code to fall back to v0
While working on another patch, I discovered that the libcc1 plugin
code never did version negotiation correctly. So, the patches to
introduce v1 never did anything -- the new code, as far as I know, has
never been run.
Making version negotiation work shows that the existing code causes
crashes. For example, safe_lookup_builtin_type might return
error_mark_node in some cases, which the callers aren't prepared to
accept.
Looking into it some more, I couldn't find any justification for this
v1 code for the C compiler plugin. Since it's not run at all, it's
also clear that removing it doesn't cause any regressions in gdb.
However, rather than remove it, this patch changes it to handle
ERROR_MARK better, and then to fall back to the v0 code if the new
code fails to find the type it's looking for.
libcc1
* libcc1plugin.cc (safe_lookup_builtin_type): Handle ERROR_MARK.
(plugin_int_type): Fall back to plugin_int_type_v0.
(plugin_float_type): Fall back to plugin_float_type_v0.
Gaius Mulley [Thu, 29 Feb 2024 13:42:30 +0000 (13:42 +0000)]
PR modula2/102344 TestLong4.mod FAILs
This is a testsuite fix for TestLong4.mod so that it
succeeds on 32 bit systems. The original TestLong4.mod has
been rewritten as testing MAX(LONGCARD) into the variable l.
The new testlong4.mod has been added to cpp/pass. The new
testcode uses the C preprocessor to select the appropriate
constant literal depending upon __SIZEOF_LONG__.
gcc/testsuite/ChangeLog:
PR modula2/102344
* gm2/pim/pass/TestLong4.mod: Rewrite.
* gm2/cpp/pass/testlong4.mod: New test.
Andrew Pinski [Thu, 29 Feb 2024 06:39:32 +0000 (22:39 -0800)]
aarch64: Fix memtag builtins vs GC [PR108174]
The memtag builtins were being GC'ed away so we end up
with a crash sometimes (maybe even wrong code).
This fixes that issue by adding GTY on the variable/struct
aarch64_memtag_builtin_data.
Committed as obvious after a build/test for aarch64-linux-gnu.
PR target/108174
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_memtag_builtin_data): Make
static and mark with GTY.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/memtag_4.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Xi Ruoyao [Sun, 25 Feb 2024 12:44:34 +0000 (20:44 +0800)]
LoongArch: Remove unneeded sign extension after crc/crcc instructions
The specification of crc/crcc instructions is clear that the output is
sign-extended to GRLEN. Add a define_insn to tell the compiler this
fact and allow it to remove the unneeded sign extension on crc/crcc
output. As crc/crcc instructions are usually used in a tight loop,
this should produce a significant performance gain.
gcc/ChangeLog:
* config/loongarch/loongarch.md
(loongarch_<crc>_w_<size>_w_extended): New define_insn.
Nathaniel Shead [Fri, 16 Feb 2024 04:52:48 +0000 (15:52 +1100)]
c++: Support lambdas attached to more places in modules [PR111710]
The fix for PR107398 weakened the restrictions that lambdas must belong
to namespace scope. However this was not sufficient: we also need to
allow lambdas attached to FIELD_DECLs, PARM_DECLs, and TYPE_DECLs.
For field decls we key the lambda to its class rather than the field
itself. Otherwise we can run into issues when deduplicating the lambda's
TYPE_DECL, because when loading its context we load the containing field
before we've deduplicated the keyed lambda, causing mismatches; by
keying to the class instead we defer checking keyed declarations until
deduplication has completed.
Additionally, by [basic.link] p15.2 a lambda defined anywhere in a
class-specifier should not be TU-local, which includes base-class
declarations, so ensure that lambdas declared there are keyed
appropriately as well.
Because this now requires 'DECL_MODULE_KEYED_DECLS_P' to be checked on a
fairly large number of different kinds of DECLs, and that in general
it's safe to just get 'false' as a result of a check on an unexpected
DECL type, this patch also removes the tree checking from the accessor.
Finally, to handle deduplicating templated lambda fields, we need to
ensure that we can determine that two lambdas from different field decls
match, so we ensure that we also deduplicate LAMBDA_EXPRs on stream in.
PR c++/111710
gcc/cp/ChangeLog:
* cp-tree.h (DECL_MODULE_KEYED_DECLS_P): Remove tree checking.
(struct lang_decl_base): Update comments and fix whitespace.
* module.cc (trees_out::lang_decl_bools): Always write
module_keyed_decls_p flag...
(trees_in::lang_decl_bools): ...and always read it.
(trees_out::decl_value): Handle all kinds of keyed decls.
(trees_in::decl_value): Likewise.
(trees_in::tree_value): Deduplicate LAMBDA_EXPRs.
(maybe_key_decl): Also support lambdas attached to fields,
parameters, and types. Key lambdas attached to fields to their
class.
(trees_out::get_merge_kind): Likewise.
(trees_out::key_mergeable): Likewise.
(trees_in::key_mergeable): Support keyed decls in a TYPE_DECL
container.
* parser.cc (cp_parser_class_head): Start a lambda scope when
parsing base classes.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-7.h: New test.
* g++.dg/modules/lambda-7_a.H: New test.
* g++.dg/modules/lambda-7_b.C: New test.
* g++.dg/modules/lambda-7_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Kito Cheng [Wed, 28 Feb 2024 08:01:52 +0000 (16:01 +0800)]
RISC-V: Fix __atomic_compare_exchange with 32 bit value on RV64
atomic_compare_and_swapsi will use lr.w to do obtain the original value,
which sign extends to DI. RV64 only has DI comparisons, so we also need
to sign extend the expected value to DI as otherwise the comparison will
fail when the expected value has the 32nd bit set.
gcc/ChangeLog:
PR target/114130
* config/riscv/sync.md (atomic_compare_and_swap<mode>): Sign
extend the expected value if needed.
Andrew Pinski [Thu, 29 Feb 2024 02:36:12 +0000 (02:36 +0000)]
Add libcc1 to bug components
As found by Tom Tromey in https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646807.html
libcc1 is not listed as bug component even though it is there in bugzilla.
This fixes that oversight.
Committed as obvious after testing using git gcc-verify on a patch.
This patch allows parameterized derived types to compile successfully
when typebound procedures are specified in the type specification.
Furthermore, it allows function calls for PDTs by setting the
f2k_derived space of PDT instances to reference their original template,
thereby giving it referential access to the typebound procedures of the
template. Lastly, it adds a check for deferred length parameters of
PDTs in CLASS declaration statements, and correctly throws an error if
such declarations are missing POINTER or ALLOCATABLE attributes.
2024-02-25 Alexander Westbrooks <alexanderw@gcc.gnu.org>
gcc/fortran/ChangeLog:
PR fortran/82943
PR fortran/86148
PR fortran/86268
* decl.cc (gfc_get_pdt_instance): Set the PDT instance field
'f2k_derived', if not set already, to point to the given
PDT template 'f2k_derived' namespace in order to give the
PDT instance referential access to the typebound procedures
of the template.
* gfortran.h (gfc_pdt_is_instance_of): Add prototype.
* resolve.cc (resolve_typebound_procedure): If the derived type
does not have the attribute 'pdt_template' set, compare the
dummy argument to the 'resolve_bindings_derived' type like usual.
If the derived type is a 'pdt_template', then check if the
dummy argument is an instance of the PDT template. If the derived
type is a PDT template, and the dummy argument is an instance of
that template, but the dummy argument 'param_list' is not
SPEC_ASSUMED, check if there are any LEN parameters in the
dummy argument. If there are no LEN parameters, then this implies
that there are only KIND parameters in the dummy argument.
If there are LEN parameters, this would be an error, for all
LEN parameters for the dummy argument MUST be assumed for
typebound procedures of PDTs.
(resolve_pdt): Add a check for ALLOCATABLE and POINTER attributes for
SPEC_DEFERRED parameters of PDT class symbols. ALLOCATABLE and
POINTER attributes for a PDT class symbol are stored in the
'class_pointer' and 'allocatable' attributes of the '_data'
component respectively.
* symbol.cc (gfc_pdt_is_instance_of): New function.
gcc/testsuite/ChangeLog:
PR fortran/82943
PR fortran/86148
PR fortran/86268
* gfortran.dg/pdt_4.f03: Update modified error message.
* gfortran.dg/pdt_34.f03: New test.
* gfortran.dg/pdt_35.f03: New test.
* gfortran.dg/pdt_36.f03: New test.
* gfortran.dg/pdt_37.f03: New test.
Signed-off-by: Alexander Westbrooks <alexanderw@gcc.gnu.org>
Jakub Jelinek [Wed, 28 Feb 2024 22:20:13 +0000 (23:20 +0100)]
c++: Fix explicit instantiation of const variable templates after earlier implicit instantation [PR113976]
Already previously instantiated const variable templates had
cp_apply_type_quals_to_decl called when they were instantiated,
but if they need runtime initialization, their TREE_READONLY flag
has been subsequently cleared.
Explicit variable template instantiation calls grokdeclarator which
calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
flag again, but nothing clears it afterwards, so we emit such
instantiations into rodata sections and segfault when the dynamic
initialization attempts to initialize them.
The following patch fixes that by not calling cp_apply_type_quals_to_decl
on already instantiated variable declarations.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
Patrick Palka <ppalka@redhat.com>
Kernel verifier complains in some particular cases for missing func_info
implementation in .BTF.ext. This patch implements it.
Strings are cached locally in coreout.cc to avoid adding duplicated
strings in the string list. This string deduplication should eventually
be moved to the CTFC functions such that this happens widely.
With this implementation, the CO-RE relocations information was also
simplified and integrated with the FuncInfo structures.
gcc/Changelog:
PR target/113453
* config/bpf/bpf.cc (bpf_function_prologue): Define target
hook.
* config/bpf/coreout.cc (brf_ext_info_section)
(btf_ext_info): Move from coreout.h
(btf_ext_funcinfo, btf_ext_lineinfo): Add struct.
(bpf_core_reloc): Rename to btf_ext_core_reloc.
(btf_ext): Add static variable.
(btfext_info_sec_find_or_add, SEARCH_NODE_AND_RETURN)
(bpf_create_or_find_funcinfo, bpt_create_core_reloc)
(btf_ext_add_string, btf_funcinfo_type_callback)
(btf_add_func_info_for, btf_validate_funcinfo)
(btf_ext_info_len, output_btfext_func_info): Add function.
(output_btfext_header, bpf_core_reloc_add)
(output_btfext_core_relocs, btf_ext_init, btf_ext_output):
Change to support new structs.
* config/bpf/coreout.h (btf_ext_funcinfo, btf_ext_lineinfo):
Move and change in coreout.cc.
(btf_add_func_info_for, btf_ext_add_string): Add prototypes.
bpf: Always emit .BTF.ext section if generating BTF
BPF applications, when generating BTF information should always create a
.BTF.ext section.
Current implementation was only creating it when -mco-re option was used.
This patch makes .BTF.ext always be generated for BPF target objects.
The patch also adds conditions around btf_finalize function call
such that BTF deallocation happens later for BPF target.
For BPF, btf_finalize is only called after .BTF.ext is generated.
gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_option_override): Make .BTF.ext
enabled by default for BPF.
(bpf_file_end): Call BTF deallocation.
(bpf_asm_init_sections): Correct condition.
* dwarf2ctf.cc (ctf_debug_finalize): Conditionally execute BTF
deallocation.
(ctf_debuf_finish): Correct condition for calling
ctf_debug_finalize.
This patch corrects the addition of +1 on the type id, which originally
was done in the wrong location and led to func_dtd->dtd_type for
BTF_KIND_FUNC struct data to contain the type id of the previous entry.
gcc/ChangeLog:
* btfout.cc (btf_collect_dataset): Corrects BTF type id.
Richard Biener [Wed, 28 Feb 2024 11:37:07 +0000 (12:37 +0100)]
tree-optimization/114121 - wrong VN with context sensitive range info
When VN ends up exploiting range-info specifying the ao_ref offset
and max_size we have to make sure to reflect this in the hashtable
entry for the recorded expression. The PR113831 fix handled the
case where we can encode this in the operands themselves but this
bug shows the issue is more widespread.
So instead of altering the operands the following instead records
this extra info that's possibly used, only throwing it away when
the value-numbering didn't come up with a non-VARYING value which
is an important detail to preserve CSE as opposed to constant
folding which is where all cases currently known popped up.
With this the original PR113831 fix can be reverted.
PR tree-optimization/114121
* tree-ssa-sccvn.h (vn_reference_s::offset,
vn_reference_s::max_size): New fields.
(vn_reference_insert_pieces): Adjust prototype.
* tree-ssa-pre.cc (phi_translate_1): Preserve offset/max_size.
* tree-ssa-sccvn.cc (vn_reference_eq): Compare offset and
size, allow using "don't know" state.
(vn_walk_cb_data::finish): Pass along offset/max_size.
(vn_reference_lookup_or_insert_for_pieces): Take offset and
max_size as argument and use it.
(vn_reference_lookup_3): Properly adjust offset and max_size
according to the adjusted ao_ref.
(vn_reference_lookup_pieces): Initialize offset and max_size.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_call): Likewise.
(vn_reference_insert): Likewise.
(visit_reference_op_call): Likewise.
(vn_reference_insert_pieces): Take offset and max_size
as argument and use it.
Jonathan Wakely [Thu, 22 Feb 2024 13:06:59 +0000 (13:06 +0000)]
libstdc++: Test error handling in std::print
The standard requires an exception if std::print fails to write to a
FILE*. When writing to a std::ostream, failure to format the arguments
doesn't affect the stream state, but failure to write to the streadm
sets badbit.
libstdc++-v3/ChangeLog:
* testsuite/27_io/basic_ostream/print/1.cc: Check error
handling.
* testsuite/27_io/print/1.cc: Likewise.
Jakub Jelinek [Wed, 28 Feb 2024 11:09:04 +0000 (12:09 +0100)]
testsuite: XFAIL ssa-sink-18.c also on powerpc64 [PR111462]
powerpc64-linux apparently (not very surprisingly) behaves the same
way as powerpc64le-linux and has 4 sunk statements rather than 5,
so we should xfail it on powerpc64*-*-* rather than just powerpc64le-*-*.
powerpc-linux has 3 sunk statements, but the scan pattern is done for
lp64 only as the comment explains.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR testsuite/111462
* gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also on powerpc64.
Juergen Christ [Mon, 19 Feb 2024 09:10:35 +0000 (10:10 +0100)]
Only emulate integral vectors.
The emulation via word mode tries to perform integer arithmetic on floating
point values instead of floating point arithmetic. This leads to
mis-compilations.
Failure occured on s390x on these existing test cases:
gcc.dg/vect/tsvc/vect-tsvc-s112.c
gcc.dg/vect/tsvc/vect-tsvc-s113.c
gcc.dg/vect/tsvc/vect-tsvc-s119.c
gcc.dg/vect/tsvc/vect-tsvc-s121.c
gcc.dg/vect/tsvc/vect-tsvc-s131.c
gcc.dg/vect/tsvc/vect-tsvc-s132.c
gcc.dg/vect/tsvc/vect-tsvc-s2233.c
gcc.dg/vect/tsvc/vect-tsvc-s421.c
gcc.dg/vect/vect-alias-check-14.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-1.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-3.c
gcc.target/s390/vector/partial/s390-vec-length-full-run-3.c
gcc/ChangeLog:
PR tree-optimization/114075
* tree-vect-stmts.cc (vectorizable_operation): Don't emulate floating
point vectors
Jakub Jelinek [Wed, 28 Feb 2024 08:59:45 +0000 (09:59 +0100)]
graphite: Fix non-INTEGER_TYPE integral comparison handling [PR114041]
The following testcases are miscompiled, because graphite ignores boolean,
enumerated or _BitInt comparisons, rewrites the code as if the comparisons
were always true or always false.
The INTEGER_TYPE checks were initially added in r6-2239 but at that point
it was both in add_conditions_to_domain and in parameter_index_in_region.
Later on the check was also added to stmt_simple_for_scop_p, and finally
r8-3931 changed the stmt_simple_for_scop_p check to INTEGRAL_TYPE_P
and turned the parameter_index_in_region -> assign_parameter_index_in_region
into INTEGRAL_TYPE_P assertion, but the add_conditions_to_domain check
for INTEGER_TYPE remained.
The following patch uses INTEGRAL_TYPE_P to complete the change.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114041
* graphite-sese-to-poly.cc (add_conditions_to_domain): Check for
INTEGRAL_TYPE_P check rather than INTEGER_TYPE.
* gcc.dg/graphite/run-id-pr114041-1.c: New test.
* gcc.dg/graphite/run-id-pr114041-2.c: New test.
Jakub Jelinek [Wed, 28 Feb 2024 08:40:15 +0000 (09:40 +0100)]
gimple-fold: Use bitwise vector types rather than barely supported huge integral types in memcpy etc. folding [PR113988]
The following patch changes the memcpy etc. folding to use bitwise vector
types rather than huge INTEGER_TYPEs for copying of > MAX_FIXED_MODE_SIZE
lengths. The problem with the huge INTEGER_TYPEs is that they aren't
supported very much, usually there are just optabs to handle moves of them,
perhaps misaligned moves and that is it, so they pose problems e.g. to
BITINT_TYPE lowering.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113988
* stor-layout.h (bitwise_mode_for_size): Declare.
* stor-layout.cc (bitwise_mode_for_size): New function.
* gimple-fold.cc (gimple_fold_builtin_memory_op): Use it.
Use bitwise_type_for_mode instead of build_nonstandard_integer_type.
Use BITS_PER_UNIT instead of 8.
Jakub Jelinek [Wed, 28 Feb 2024 08:26:51 +0000 (09:26 +0100)]
testsuite: Add c23-stdarg-4.c test variant where all functions return large struct
I think we have no coverage for the case where structure_value_addr_parm and
TYPE_NO_NAMED_ARGS_STDARG_P are both true. The
if (type_arg_types != 0)
n_named_args
= (list_length (type_arg_types)
/* Count the struct value address, if it is passed as a parm. */
+ structure_value_addr_parm);
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* If we know nothing, treat all args as named. */
n_named_args = num_actuals;
code should probably have n_named_args = structure_value_addr_parm;
instead of n_named_args = 0;, this testcase is an attempt to see if
it is broken on any target.
Nathaniel Shead [Wed, 28 Feb 2024 00:20:53 +0000 (11:20 +1100)]
c++: Revert deferring emission of inline variables [PR114013]
This is a (partial) reversion of r14-8987-gdd9d14f7d53 to return to
eagerly emitting inline variables to the middle-end when they are
declared. 'import_export_decl' will still continue to accept them, as
allowing this is a pure extension and doesn't seem to cause issues with
modules, but otherwise deferring the emission of inline variables
appears to cause issues on some targets and prevents some code using
inline variable templates from correctly linking.
There might be a more targetted way to support this, but due to the
complexity of handling linkage and emission I'd prefer to wait till
GCC 15 to explore our options.
David Malcolm [Tue, 27 Feb 2024 19:49:42 +0000 (14:49 -0500)]
analyzer: use correct format code for string literal indices [PR110483,PR111802]
On e.g. gcc211 the use of "%li" with unsigned HOST_WIDE_INT led to this warning:
../../src/gcc/analyzer/access-diagram.cc: In member function ‘void ana::string_literal_spatial_item::add_column_for_byte(text_art::table&, const ana::bit_to_table_map&, text_art::style_manager&, ana::byte_offset_t, ana::byte_offset_t, int, int) const’:
../../src/gcc/analyzer/access-diagram.cc:1909:40: warning: format ‘%li’ expects argument of type ‘long int’, but argument 3 has type ‘long long unsigned int’ [-Wformat=]
byte_idx_within_string.ulow ()));
^
and to all values being erroneously printed as "0".
Fixed thusly.
gcc/analyzer/ChangeLog:
PR analyzer/110483
PR analyzer/111802
* access-diagram.cc
(string_literal_spatial_item::add_column_for_byte): Use %wu for
printing unsigned HOST_WIDE_INT.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Eric Botcazou [Tue, 27 Feb 2024 17:01:00 +0000 (18:01 +0100)]
Fix internal error on non-byte-aligned reference in GIMPLE DSE
This is a regression present on the mainline, 13 and 12 branches. For the
attached Ada case, it's a tree checking failure on the mainline at -O:
+===========================GNAT BUG DETECTED==============================+
| 14.0.1 20240226 (experimental) [master r14-9171-g4972f97a265] GCC error:|
| tree check: expected tree that contains 'decl common' structure, |
| have 'component_ref' in tree_could_trap_p, at tree-eh.cc:2733 |
| Error detected around /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt104.adb:
Time is a 10-byte record and Packed_Rec.T is placed at bit-offset 65 because
of the packing. so tree-ssa-dse.cc:setup_live_bytes_from_ref has computed a
const_size of 88 from ref->offset of 65 and ref->max_size of 80.
Then in tree-ssa-dse.cc:compute_trims:
411 int last_live = bitmap_last_set_bit (live);
(gdb) next
412 if (ref->size.is_constant (&const_size))
(gdb)
414 int last_orig = (const_size / BITS_PER_UNIT) - 1;
(gdb)
418 *trim_tail = last_orig - last_live;
(gdb) call debug_bitmap (live)
n_bits = 256, set = {0 1 2 3 4 5 6 7 8 9 10 }
(gdb) p last_live
$33 = 10
(gdb) p const_size
$34 = 80
(gdb) p last_orig
$35 = 9
(gdb) p *trim_tail
$36 = -1
In other words, compute_trims is overlooking the alignment adjustments that
setup_live_bytes_from_ref applied earlier. Moveover it reads:
/* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
extends through ref->size. So we know that in the original bitmap
bits 0..ref->size were true. We don't actually need the bitmap, just
the REF to compute the trims. */
but setup_live_bytes_from_ref used ref->max_size instead of ref->size.
It appears that all the callers of compute_trims assume that ref->offset is
byte aligned and that the trimmed bytes are relative to ref->size, so the
patch simply adds an early return if either condition is not fulfilled.
gcc/
* tree-ssa-dse.cc (compute_trims): Fix description. Return early
if either ref->offset is not byte aligned or ref->size is not known
to be equal to ref->max_size.
(maybe_trim_complex_store): Fix description.
(maybe_trim_constructor_store): Likewise.
(maybe_trim_partially_dead_store): Likewise.
gcc/testsuite/
* gnat.dg/opt104.ads, gnat.dg/opt104.adb: New test.
These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)
libgomp/ChangeLog:
* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
routines that simply map to their C counterpart.
* openacc.f90 (openacc): Add them.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test.
* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
David Malcolm [Tue, 27 Feb 2024 13:36:58 +0000 (08:36 -0500)]
analyzer: fix ICE on floating-point bounds [PR111881]
gcc/analyzer/ChangeLog:
PR analyzer/111881
* constraint-manager.cc (bound::ensure_closed): Assert that
m_constant has integral type.
(range::add_bound): Bail out on floating point constants.
gcc/testsuite/ChangeLog:
PR analyzer/111881
* c-c++-common/analyzer/conditionals-pr111881.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Earnshaw [Mon, 26 Feb 2024 17:20:58 +0000 (17:20 +0000)]
arm: warn about deprecation of iwmmx in mmintrin.h
GCC 13's changes file documents that iwmmx is deprecated. Raise the bar
by warning when the mmintrin.h header is included by users, but provide
a way to suppress the warning.
gcc:
* config/arm/mmintrin.h: Warn if this header is included without
defining __ENABLE_DEPRECATED_IWMMXT.
Richard Biener [Mon, 26 Feb 2024 12:33:21 +0000 (13:33 +0100)]
tree-optimization/114074 - CHREC multiplication and undefined overflow
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior. The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.
I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.
PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.
* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.
Jakub Jelinek [Tue, 27 Feb 2024 08:52:07 +0000 (09:52 +0100)]
expand: Add trivial folding for bit query builtins at expansion time [PR114044]
While it seems a lot of places in various optimization passes fold
bit query internal functions with INTEGER_CST arguments to INTEGER_CST
when there is a lhs, when lhs is missing, all the removals of such dead
stmts are guarded with -ftree-dce, so with -fno-tree-dce those unfolded
ifn calls remain in the IL until expansion. If they have large/huge
BITINT_TYPE arguments, there is no BLKmode optab and so expansion ICEs,
and bitint lowering doesn't touch such calls because it doesn't know they
need touching, functions only containing those will not even be further
processed by the pass because there are no non-small BITINT_TYPE SSA_NAMEs
+ the 2 exceptions (stores of BITINT_TYPE INTEGER_CSTs and conversions
from BITINT_TYPE INTEGER_CSTs to floating point SSA_NAMEs) and when walking
there is no special case for calls with BITINT_TYPE INTEGER_CSTs either,
those are for normal calls normally handled at expansion time.
So, the following patch adjust the expansion of these 6 ifns, by doing
nothing if there is no lhs, and also just in case and user disabled all
possible passes that would fold this handles the case of setting lhs
to ifn call with INTEGER_CST argument.
2024-02-27 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114044
* internal-fn.def (CLRSB, CLZ, CTZ, FFS, PARITY): Use
DEF_INTERNAL_INT_EXT_FN macro rather than DEF_INTERNAL_INT_FN.
* internal-fn.h (expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS,
expand_PARITY): Declare.
* internal-fn.cc (expand_bitquery, expand_CLRSB, expand_CLZ,
expand_CTZ, expand_FFS, expand_PARITY): New functions.
(expand_POPCOUNT): Use expand_bitquery.
Jakub Jelinek [Mon, 26 Feb 2024 16:55:07 +0000 (17:55 +0100)]
varasm: Handle private COMDAT function symbol reference in readonly data section [PR113617]
If default_elf_select_rtx_section is called to put a reference to some
local symbol defined in a comdat section into memory, which happens more often
since the r14-4944 RA change, linking might fail.
default_elf_select_rtx_section puts such constants into .data.rel.ro.local
etc. sections and if linker chooses comdat sections from some other TU
and discards the one to which a relocation in .data.rel.ro.local remains,
linker diagnoses error. References to private comdat symbols can only appear
from functions or data objects in the same comdat group, so the following
patch arranges using .data.rel.ro.local.pool.<comdat_name> and similar sections.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
H.J. Lu <hjl.tools@gmail.com>
PR rtl-optimization/113617
* varasm.cc (default_elf_select_rtx_section): For
references to private symbols in comdat sections
use .data.relro.local.pool.<comdat>, .data.relro.pool.<comdat>
or .rodata.<comdat> comdat sections.
* g++.dg/other/pr113617.C: New test.
* g++.dg/other/pr113617.h: New test.
* g++.dg/other/pr113617-aux.cc: New test.
Jakub Jelinek [Mon, 26 Feb 2024 15:30:16 +0000 (16:30 +0100)]
c: Improve some diagnostics for __builtin_stdc_bit_* [PR114042]
The PR complains that for the __builtin_stdc_bit_* "builtins" the
diagnostics doesn't mention the name of the builtin the user used, but
instead __builtin_{clz,ctz,popcount}g instead (which is what the FE
immediately lowers it to).
The following patch repeats the checks from check_builtin_function_arguments
which are there done on BUILT_IN_{CLZ,CTZ,POPCOUNT}G, such that they
diagnose it with the name of the "builtin" user actually used before it
is gone.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
PR c/114042
* c-parser.cc (c_parser_postfix_expression): Diagnose
__builtin_stdc_bit_* argument with ENUMERAL_TYPE or BOOLEAN_TYPE
type or if signed here rather than on the replacement builtins
in check_builtin_function_arguments.
* gcc.dg/builtin-stdc-bit-2.c: Adjust testcase for actual builtin
names rather than names of builtin replacements.
Richard Biener [Mon, 26 Feb 2024 11:27:42 +0000 (12:27 +0100)]
tree-optimization/114099 - virtual LC PHIs and early exit vect
In some cases exits can lack LC PHI nodes for the virtual operand.
We have to create them when the epilog loop requires them which also
allows us to remove some only halfway correct fixups. This is the
variant triggering for alternate exits.
PR tree-optimization/114099
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Create and fill in a needed virtual LC PHI for the alternate
exits. Remove code dealing with that missing.
* gcc.dg/vect/vect-early-break_120-pr114099.c: New testcase.
Richard Biener [Mon, 26 Feb 2024 10:25:50 +0000 (11:25 +0100)]
tree-optimization/114068 - missed virtual LC PHI after vect peeling
When we choose the IV exit to be one leading to no virtual use we
fail to have a virtual LC PHI even though we need it for the epilog
entry. The following makes sure to create it so that later updating
works.
PR tree-optimization/114068
* tree-vect-loop-manip.cc (get_live_virtual_operand_on_edge):
New function.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add a virtual LC PHI
on the main exit if needed. Remove band-aid for the case
it was missing.
* gcc.dg/vect/vect-early-break_118-pr114068.c: New testcase.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Likewise.
Eric Botcazou [Mon, 26 Feb 2024 12:13:34 +0000 (13:13 +0100)]
Finalization of object allocated by anonymous access designating local type
The finalization of objects dynamically allocated through an anonymous
access type is deferred to the enclosing library unit in the current
implementation and a warning is given on each of them.
However this cannot be done if the designated type is local, because this
would generate dangling references to the local finalization routine, so
the finalization needs to be dropped in this case and the warning adjusted.
gcc/ada/
PR ada/113893
* exp_ch7.adb (Build_Anonymous_Master): Do not build the master
for a local designated type.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Force Needs_Fin
to false if no finalization master is attached to an access type
and assert that it is anonymous in this case.
* sem_res.adb (Resolve_Allocator): Mention that the object might
not be finalized at all in the warning given when the type is an
anonymous access-to-controlled type.
H.J. Lu [Sun, 25 Feb 2024 21:14:39 +0000 (13:14 -0800)]
x86: Check interrupt instead of noreturn attribute
ix86_set_func_type checks noreturn attribute to avoid incompatible
attribute error in LTO1 on interrupt functions. Since TREE_THIS_VOLATILE
is set also for _Noreturn without noreturn attribute, check interrupt
attribute for interrupt functions instead.
Jakub Jelinek [Mon, 26 Feb 2024 10:12:39 +0000 (11:12 +0100)]
i386: Enable _BitInt support on ia32
Given the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837#c9
comment, the following patch just attempts to implement what I think
is best for ia32.
Compared to https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5 ,
like that patch for _BitInt(64) or smaller it uses the smallest containing
{,un}signed {char,short,int,long long} for passing/returning and
layout of variables including in structures for alignment/size, with any
extra bits unspecified.
Unlike the above proposal, for larger _BitInt (i.e. _BitInt(65)+), it uses
passing/returning/layout/alignment of structure containing minimum needed
number of 32-bit limbs, again with the extra bits unspecified.
This is because most operations (except copy or bitwise ops) on _BitInts
aren't really vectorizable and will be under the hood implemented in loops
over 32-bit limbs anyway (using 64-bit limbs under the hood would mean
often using library implementation for the basic operations) and because
ia32 doesn't align even long long/double in structures to 64-bit I think
it is better to just use 32-bit alignment for that. And I don't see
a reason to waste 32-bit bits say for _BitInt(224) or _BitInt(288) on ia32.
So, effectively it is like the x86-64 _BitInt ABI with everything divided by
2, the only exception is that in x86-64 psABI _BitInt(128) is said to be
already a structure of 2 limbs, which happens to be passed mostly the same
as __int128 (except for alignment).
2024-02-26 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.cc (ix86_bitint_type_info): Add support for
!TARGET_64BIT.
Jakub Jelinek [Mon, 26 Feb 2024 09:08:45 +0000 (10:08 +0100)]
match.pd: Guard 2 simplifications on integral TYPE_OVERFLOW_UNDEFINED [PR114090]
These 2 patterns are incorrect on floating point, or for -fwrapv, or
for -ftrapv, or the first one for unsigned types (the second one is
mathematically correct, but we ought to just fold that to 0 instead).
So, the following patch properly guards this.
I think we don't need && !TYPE_OVERFLOW_SANITIZED (type) because
in both simplifications there would be UB before and after on
signed integer minimum.
Jakub Jelinek [Mon, 26 Feb 2024 09:07:39 +0000 (10:07 +0100)]
fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]
In the following testcase we infinitely recurse during BIT_IOR_EXPR
reassociation.
One operand is (unsigned _BitInt(31)) a << 4 and another operand 2147483647 >> 1 | 80 where both the right shift and the | 80
trees have TREE_CONSTANT set, but weren't folded because of delayed
folding, where some foldings are apparently done even in that case
unfortunately.
Now, the fold_binary_loc reassocation code splits both operands into
variable part, minus variable part, constant part, minus constant part,
literal part and minus literal parts, to prevent infinite recursion
punts if there are just 2 parts altogether from the 2 operands and then goes
on with reassociation, merges first the corresponding parts from both
operands and then some further merges.
The problem with the above expressions is that we get 3 different objects,
var0 (the left shift), con1 (the right shift) and lit1 (80), so the infinite
recursion prevention doesn't trigger, and we eventually merge con1 with
lit1, which effectively reconstructs the original op1 and then associate
that with var0 which is original op0, and associate_trees for that case
calls fold_binary. There are some casts involved there too (the T typedef
type and the underlying _BitInt type which are stripped with STRIP_NOPS).
The following patch attempts to prevent this infinite recursion by tracking
the origin (if certain var comes from nothing - 0, op0 - 1, op1 - 2 or both - 3)
and propagates it through all the associate_tree calls which merge the vars.
If near the end we'd try to merge what comes solely from op0 with what comes
solely from op1 (or vice versa), the patch punts, because then it isn't any
kind of reassociation between the two operands, if anything it should be
handled when folding the suboperands.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114084
* fold-const.cc (fold_binary_loc): Avoid the final associate_trees
if all subtrees of var0 come from one of the op0 or op1 operands
and all subtrees of con0 come from the other one. Don't clear
variables which are never used afterwards.
The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.
This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.
PR middle-end/114070
* genmatch.cc (parser::parse_c_expr): Do not record operand
lists but only mark operators used.
* match.pd ((c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e)):
Properly guard the case of tcc_comparison changing the VEC_COND
value operand type.
Jakub Jelinek [Mon, 26 Feb 2024 06:30:05 +0000 (07:30 +0100)]
i386: Fix up x86_function_profiler -masm=intel support [PR114094]
In my r14-8214 changes I apparently forgot one \n at the end of an instruction.
The corresponding AT&T line looks like:
"1:\tcall\t*%s@GOTPCREL(%%rip)\n"
but the Intel variant was
"1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]"
and the memory operand size is 1 byte. As the result, the rest of 511
bytes is ignored by GCC. Implement ldtilecfg and sttilecfg intrinsics
with a pointer to XImode to honor the 512-byte memory block.
gcc/ChangeLog:
PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.cc (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.
gcc/testsuite/ChangeLog:
PR target/114098
* gcc.target/i386/amxtile-4.c: New test.
Jerry DeLisle [Sun, 25 Feb 2024 22:50:07 +0000 (14:50 -0800)]
libgfortran: Propagate user defined iostat and iomsg.
PR libfortran/105456
libgfortran/ChangeLog:
* io/list_read.c (list_formatted_read_scalar): Add checks
for the case where a user defines their own error codes
and error messages and generate the runtime error.
* io/transfer.c (st_read_done): Whitespace.
Gaius Mulley [Sun, 25 Feb 2024 11:08:37 +0000 (11:08 +0000)]
PR modula2/113749 m2 enabled build times out on i686-gnu-hurd
The bug fix changes the FIO module to use the target O_RDONLY,
O_WRONLY, SEEK_SET and SEEK_END (now available from the module wrapc).
Also rebuilt are the bootstrap tools mc and pge as they have their
own wrapc and C translations of FIO.