Jason Merrill [Fri, 25 Jul 2025 19:57:26 +0000 (15:57 -0400)]
c++: constexpr uninitialized union [PR120577]
This was failing for two reasons:
1) We were wrongly treating the basic_string constructor as
zero-initializing the object, which it doesn't.
2) Given that, when we went to look for a value for the anonymous union,
we concluded that it was value-initialized, and trying to evaluate that
broke because we weren't setting ctx->ctor for it.
This patch fixes both issues, #1 by setting CONSTRUCTOR_NO_CLEARING and #2
by inserting a new CONSTRUCTOR for the member rather than evaluate it out of
context, which is consistent with cxx_eval_store_expression.
PR c++/120577
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Set
CONSTRUCTOR_NO_CLEARING on initial value for ctor.
(cxx_eval_component_reference): Make value-initialization
of an aggregate member explicit.
Bob Duff [Wed, 16 Jul 2025 14:29:01 +0000 (10:29 -0400)]
ada: Bug in Indefinite_Holders instance passed to formal package
Fix bug when an instance of Indefinite_Holders with a class-wide type is
passed as a generic formal package; Program_Error was raised when
dealing with the implicit "=" function.
The fix is to disable legality checks in formal packages when the
entity is an E_Subprogram_Body, because these are implicitly generated
for class-wide predefined functions when passed to generics.
gcc/ada/ChangeLog:
* sem_ch12.adb (Check_Formal_Package_Instance):
Do nothing in case of E_Subprogram_Body.
ada: Fix regression of finalization primitive selection
A recent patch introduced a new flag to mark the types for which looking
up finalization primitives needs special handling. But there was one
place in Build_Derived_Record_Type where the flag was not set when it
should, which introduced a regression in some cases.
This patch adds the missing setting of the flag.
gcc/ada/ChangeLog:
* sem_ch3.adb (Build_Derived_Record_Type): Set flag appropriately.
Patrick Palka [Wed, 23 Jul 2025 12:31:46 +0000 (08:31 -0400)]
c++: fix __is_invocable for std::reference_wrapper [PR121055]
Our implementation of the INVOKE spec ([func.require]) was incorrectly
treating reference_wrapper<T>::get() as returning T instead of T&, which
notably makes a difference when invoking a ref-qualified memfn pointer.
PR c++/121055
gcc/cp/ChangeLog:
* method.cc (build_invoke): Correct reference_wrapper handling.
Patrick Palka [Tue, 15 Jul 2025 19:17:23 +0000 (15:17 -0400)]
libstdc++: Add missing initializers for __maybe_present_t members [PR119962]
Data members of type __maybe_present_t where the conditionally present
type might be an aggregate or fundamental type need to be explicitly
value-initialized (rather than implicitly default-initialized), so that
default-initialization of the containing class always results in an
completely initialized object.
PR libstdc++/119962
libstdc++-v3/ChangeLog:
* include/std/ranges (join_view::_Iterator::_M_outer): Initialize.
(lazy_split_view::_OuterIter::_M_current): Initialize.
(join_with_view::_Iterator::_M_outer_it): Initialize.
* testsuite/std/ranges/adaptors/join.cc (test15): New test.
* testsuite/std/ranges/adaptors/join_with/1.cc (test05): New test.
* testsuite/std/ranges/adaptors/lazy_split.cc (test13): New test.
Reviewed-by: Tomasz KamiĆski <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 0828600f586e75a2056a4fc7eb0a340c363d6c66)
When we have a vector shift with a scalar the shift operand can be
external - in that case we should not use the shift operand def
as hint where to place the vector shift instruction. The ICE
in the PR is because stmt dominance queries only work inside of
the vector region. But we should also never place stmts outside
of it.
PR tree-optimization/121202
* tree-vect-slp.cc (vect_schedule_slp_node): Do not take
an out-of-region stmt as "last".
Nathaniel Shead [Sat, 24 May 2025 00:56:22 +0000 (10:56 +1000)]
c++/modules: Support re-streaming TU_LOCAL_ENTITYs [PR120412]
When emitting a primary module interface, we must re-stream any TU-local
entities that we saw in a partition. This patch adds the missing
members from core_vals.
As a drive-by fix, in some cases we might have a typedef referring to a
TU-local entity; we need to handle that case as well.
Martin Jambor [Fri, 18 Jul 2025 10:42:11 +0000 (12:42 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
ada: Fix generation of Initialize and Adjust calls
Before this patch, Make_Init_Call and Make_Adjust_Call made the
assumption that if the type they were called with was untagged and a
derived type, it was the untagged private view of a tagged type. That
assumption made it possible to inspect the root type's primitives to
handle the case where the underlying type was implicitly generated by
the compiler without all inherited primitives.
The introduction of the Finalizable aspect broke that assumption, so
this patch adds a new field to type entities that make the generated
full view stand out, and updates Make_Init_Call and Make_Adjust_Call to
only jump to the root type when they're passed one of those generated
types.
Make_Final_Call and Finalize_Address are two other subprograms that
perform the same test on the types they're passed. They did not suffer
from the same bug as Make_Init_Call and Make_Adjust_Call because of an
earlier, more ad hoc fix, but this patch switches them over to the newly
introduced mechanism for the sake of consistency.
gcc/ada/ChangeLog:
* gen_il-fields.ads (Is_Implicit_Full_View): New field.
* gen_il-gen-gen_entities.adb (Type_Kind): Use new field.
* einfo.ads (Is_Implicit_Full_View): Document new field.
* exp_ch7.adb (Make_Adjust_Call, Make_Init_Call, Make_Final_Call): Use
new field.
* exp_util.adb (Finalize_Address): Likewise.
* sem_ch3.adb (Copy_And_Build): Set new field.
Eric Botcazou [Tue, 8 Jul 2025 19:40:44 +0000 (21:40 +0200)]
ada: Remove obsolete code from Safe_Unchecked_Type_Conversion
That's a kludge added to work around the limitations of the stack checking
mechanism used in the early days.
gcc/ada/ChangeLog:
* exp_util.ads (May_Generate_Large_Temp): Delete.
* exp_util.adb (May_Generate_Large_Temp): Likewise.
(Safe_Unchecked_Type_Conversion): Do not take stack checking into
account to compute the result.
What happens is that Remove_Side_Effects uses a renaming to remove the side
effects of L but, at the end, the renamed object is substituted back for the
renamed object in the node by Expand_Renaming, which is invoked because the
Is_Renaming_Of_Object flag is set on the renaming after Evaluate_Name has
been invoked on its Name.
This is a general discrepancy between Evaluate_Name and Side_Effect_Free of
Exp_Util, coming from the call to Safe_Unchecked_Type_Conversion present in
Side_Effect_Free in this case. The long term goal is probably to remove the
call but, in the meantime, this change is sufficient to fix the failure.
gcc/ada/ChangeLog:
* exp_util.adb (Safe_Unchecked_Type_Conversion): Always return True
if the expression is the prefix of an N_Selected_Component.
Eric Botcazou [Wed, 2 Jul 2025 13:25:55 +0000 (15:25 +0200)]
ada: Fix wrong indirect access to bit-packed array in iterated loop
This comes from a missing expansion of the bit-packed array reference in
the loop, because the actual subtype created for the dereference lacks a
Packed_Array_Impl_Type as it is ultimately created by the Preanalyze_Range
call present in Analyze_Loop_Statement.
gcc/ada/ChangeLog:
* sem_util.adb (Get_Actual_Subtype): Only create a new subtype when
the expander is active. Remove a useless test of type inequality,
as well as a useless call to Set_Has_Delayed_Freeze on the subtype.
ada: exp_util.adb: prevent infinite loop in case of broken code
A recent commit modified exp_util.adb in order to fix the selection of
Finalize subprograms in the case of untagged objects.
This introduced regressions for GNATSAS in fixedbugs by causing
GNAT2SCIL to loop over the same type over and over in case of broken
code.
We fix this by simply checking that the loop is making progress, and if
it doesn't, assume that we're done.
and the latter, libstdc++-v3/include/bits/ostream.tcc, has:
// Inhibit implicit instantiations for required instantiations,
// which are defined via explicit instantiations elsewhere.
#if _GLIBCXX_EXTERN_TEMPLATE
extern template class basic_ostream<char>;
extern template ostream& endl(ostream&);
Before this commit, omp_discover_declare_target_tgt_fn_r marked 'endl'
as (implicitly) declare target - but not the calls in it due to the
'extern' (DECL_EXTERNAL).
Thanks to inlining and as 'endl' is (therefore) not used and, hence,
discarded by the linker; hencet, it works with -O0 and -O1. However,
as the (unused) function still exits, IPA CP (enabled by -O2) will try
to do constant-value propagation and fails as the definition of 'widen'
is not available.
Solution is to still walk 'endl' despite being an 'extern(al)' decl;
this has been restricted for now to DECL_DECLARED_INLINE_P.
gcc/ChangeLog:
* omp-offload.cc (omp_discover_declare_target_tgt_fn_r): Also
walk external functions that are declare inline (and have a
DECL_SAVED_TREE).
libgomp/ChangeLog:
* testsuite/libgomp.c++/declare_target-2.C: New test.
Thomas Schwinge [Mon, 26 May 2025 11:31:54 +0000 (13:31 +0200)]
Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX code
In commit 50be486dff4ea2676ed022e9524ef190b92ae2b1
"nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup", some
additional tracking of the PTX code was added, and this assumes that
potentially every single character of PTX code needs to be tracked as a new
chunk of PTX code. That's problematic if we're dealing with voluminous PTX
code (for example, non-trivial C++ code), and the 'file_idx' 'alloca'tion then
causes stack overflow. For example:
FAIL: libgomp.c++/target-std__valarray-1.C (test for excess errors)
UNRESOLVED: libgomp.c++/target-std__valarray-1.C compilation failed to produce executable
lto-wrapper: fatal error: [...]/build-gcc/gcc//accel/nvptx-none/mkoffload terminated with signal 11 [Segmentation fault], core dumped
gcc/
* config/nvptx/mkoffload.cc (process): Use an 'auto_vec' for
'file_idx'.
Waffl3x [Mon, 26 May 2025 08:38:27 +0000 (02:38 -0600)]
Add 'libgomp.c++/target-flex-[...].C' test cases
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-flex-10.C: New test.
* testsuite/libgomp.c++/target-flex-100.C: New test.
* testsuite/libgomp.c++/target-flex-101.C: New test.
* testsuite/libgomp.c++/target-flex-11.C: New test.
* testsuite/libgomp.c++/target-flex-12.C: New test.
* testsuite/libgomp.c++/target-flex-2000.C: New test.
* testsuite/libgomp.c++/target-flex-2001.C: New test.
* testsuite/libgomp.c++/target-flex-2002.C: New test.
* testsuite/libgomp.c++/target-flex-2003.C: New test.
* testsuite/libgomp.c++/target-flex-30.C: New test.
* testsuite/libgomp.c++/target-flex-300.C: New test.
* testsuite/libgomp.c++/target-flex-31.C: New test.
* testsuite/libgomp.c++/target-flex-32.C: New test.
* testsuite/libgomp.c++/target-flex-33.C: New test.
* testsuite/libgomp.c++/target-flex-41.C: New test.
* testsuite/libgomp.c++/target-flex-60.C: New test.
* testsuite/libgomp.c++/target-flex-61.C: New test.
* testsuite/libgomp.c++/target-flex-62.C: New test.
* testsuite/libgomp.c++/target-flex-70.C: New test.
* testsuite/libgomp.c++/target-flex-80.C: New test.
* testsuite/libgomp.c++/target-flex-81.C: New test.
* testsuite/libgomp.c++/target-flex-90.C: New test.
* testsuite/libgomp.c++/target-flex-common.h: New test.
/* Handle empty records as per the x86-64 psABI. */
TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);
(Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P',
calling 'gcc/tree.cc-default_is_empty_record'.)
And so it happens that for an empty struct used in code offloaded from x86_64
host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in
offloading compilation (where the offload targets (currently?) don't use it
themselves, and therefore aren't prepared to handle it).
For nvptx offloading compilation, this causes wrong code generation:
'ptxas [...] error : Call has wrong number of parameters', as nvptx code
generation for function definition doesn't pay attention to this flag (say, in
'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be
appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call'
via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus
get mismatching function definition vs. function call.
This issue apparently isn't a problem for GCN offloading, but I don't know if
that's by design or by accident.
Richard Biener:
> It looks like TYPE_EMPTY_P is only used during RTL expansion for ABI
> purposes, so computing it during layout_type is premature as shown here.
>
> I would suggest to simply re-compute it at offload stream-in time.
(For avoidance of doubt, the additions to 'gcc.target/nvptx/abi-struct-arg.c',
'gcc.target/nvptx/abi-struct-ret.c' are not dependent on the offload streaming
code changes, but are just to mirror the changes to
'libgomp.oacc-c-c++-common/abi-struct-1.c'.)
Thomas Schwinge [Wed, 16 Jul 2025 20:13:46 +0000 (22:13 +0200)]
GCN, nvptx offloading: Restrain 'WARNING: program timed out.' while in 'dynamic_cast' only for effective-target 'offload_device' [PR119692]
In PR119692 "C++ 'typeinfo', 'vtable' vs. OpenACC, OpenMP 'target' offloading":
> --- Comment #8 from Rainer Orth <ro at gcc dot gnu.org> ---
> The last commit made things worse on sparc-sun-solaris2.11: since that one
> (dg-timeout 10) I regularly get
>
> WARNING: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> program timed out.
> FAIL: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C compilation failed to produce executable
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C scan-tree-dump-times optimized "gimple_call <__cxa_bad_cast, " 1
>
> Before that, the test had no issue. Compiling the test on an unloaded system
> usually takes less than 1 sec, but when fully loaded, times can go up.
To keep things simple, let's restrict this temporary (yeah...) workaround to
apply only for effective-target 'offload_device', just like the
'dg-xfail-run-if' itself.
Thomas Schwinge [Fri, 18 Jul 2025 10:56:13 +0000 (12:56 +0200)]
Adjust 'libgomp.c++/target-cdtor-{1,2}.C' for 'targetm.cxx.use_aeabi_atexit' [PR119853, PR119854]
Fix-up for commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9
"GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]":
we need to adjust for 'targetm.cxx.use_aeabi_atexit':
gcc/config/arm/arm.cc:/* The EABI says __aeabi_atexit should be used to register static
gcc/config/arm/arm.cc- destructors. */
gcc/config/arm/arm.cc-
gcc/config/arm/arm.cc-static bool
gcc/config/arm/arm.cc:arm_cxx_use_aeabi_atexit (void)
gcc/config/arm/arm.cc-{
gcc/config/arm/arm.cc- return TARGET_AAPCS_BASED;
gcc/config/arm/arm.cc-}
..., which 'gcc/cp/decl.cc:get_atexit_node' then acts on: call '__aeabi_atexit'
instead of '__cxa_atexit', and swap two arguments.
Andrew Pinski [Mon, 21 Apr 2025 22:32:26 +0000 (22:32 +0000)]
GCN: Properly switch sections in 'gcn_hsa_declare_function_name' [PR119737]
There are GCN/C++ target as well as offloading codes, where the hard-coded
section names in 'gcn_hsa_declare_function_name' do not fit, and assembly thus
fails:
LLVM ERROR: Size expression must be absolute.
This commit progresses GCN target:
[-FAIL: g++.dg/init/call1.C -std=gnu++17 (internal compiler error: Aborted signal terminated program as)-]
[-FAIL:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL: g++.dg/init/call1.C -std=gnu++26 (internal compiler error: Aborted signal terminated program as)-]
[-FAIL:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++26 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
UNSUPPORTED: g++.dg/init/call1.C -std=gnu++98: exception handling not supported
..., and GCN offloading:
[-XFAIL: libgomp.c++/target-exceptions-throw-1.C (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.c++/target-exceptions-throw-1.C PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.c++/target-exceptions-throw-1.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-exceptions-throw-1.C [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.c++/target-exceptions-throw-1.C output pattern test+}
[-XFAIL: libgomp.c++/target-exceptions-throw-2.C (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.c++/target-exceptions-throw-2.C PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.c++/target-exceptions-throw-2.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-exceptions-throw-2.C [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.c++/target-exceptions-throw-2.C output pattern test+}
[-XFAIL: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 output pattern test+}
[-XFAIL: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 PR119737 at line 9 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 output pattern test+}
Thomas Schwinge [Tue, 22 Apr 2025 11:41:22 +0000 (13:41 +0200)]
Adjust 'libgomp.c++/target-exceptions-pr118794-1.C' for 'targetm.arm_eabi_unwinder' [PR118794]
Fix-up for commit aa3e72f943032e5f074b2bd2fd06d130dda8760b
"Add test cases for exception handling constructs in dead code for GCN, nvptx target and OpenMP 'target' offloading [PR118794]":
we need to adjust for configurations with 'targetm.arm_eabi_unwinder', as per:
..., which for ARM is conditional to '#if ARM_UNWIND_INFO' (defined in
'gcc/config/arm/bpabi.h', used for various GCC configurations), and for
C6x unconditional.
aarch64: Tweak handling of general SVE permutes [PR121027]
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.
Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false. This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.
This exposed a problem with the way that we handled general 2-input
permutations for SVE. Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations. We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time. The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR. It certainly couldn't be considered a "native"
operation.
However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted. This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.
This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors. In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.
TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case. Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked. But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.
This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle. It fixes the PR for all vector lengths
except 2048 bits.
A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets. But that would be a significant amount of work
and would not be backportable.
gcc/
PR target/121027
* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
operations that can be handled by vec_perm.
gcc/testsuite/
PR target/121027
* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
aarch64: Fix LD1Q and ST1Q failures for big-endian
LD1Q gathers and ST1Q scatters are unusual in that they operate
on 128-bit blocks (effectively VNx1TI). However, we don't have
modes or ACLE types for 128-bit integers, and 128-bit integers
are not the intended use case. Instead, the instructions are
intended to be used in "hybrid VLA" operations, where each 128-bit
block is an Advanced SIMD vector.
The normal SVE modes therefore capture the intended use case better
than VNx1TI would. For example, VNx2DI is effectively N copies
of V2DI, VNx4SI N copies of V4SI, etc.
Since there is only one LD1Q instruction and one ST1Q instruction,
the ACLE support used a single pattern for each, with the loaded or
stored data having mode VNx2DI. The ST1Q pattern was generated by:
where the force_lowpart_subreg bitcast the stored data to VNx2DI.
But such subregs require an element reverse on big-endian targets
(see the comment at the head of aarch64-sve.md), which wasn't the
intention. The code should have used aarch64_sve_reinterpret instead.
which always returns a VNx2DI value, leaving the caller to bitcast
that to the correct mode. That bitcast again uses subregs and has
the same problem as above.
However, for the reasons explained in the comment, using
aarch64_sve_reinterpret does not work well for LD1Q. The patch
instead parameterises the LD1Q based on the required data mode.
gcc/
* config/aarch64/aarch64-sve2.md (aarch64_gather_ld1q): Replace with...
(@aarch64_gather_ld1q<mode>): ...this, parameterizing based on mode.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svld1q_gather_impl::expand): Update accordingly.
(svst1q_scatter_impl::expand): Use aarch64_sve_reinterpret
instead of force_lowpart_subreg.
testsuite: Add -funwind-tables to sve*/pfalse* tests
The SVE svpfalse folding tests use CFI directives to delimit the
function bodies. That requires -funwind-tables to be enabled,
which is true by default for *-linux-gnu targets, but not for *-elf.
TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
"hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
This matching was conditional on !BYTES_BIG_ENDIAN.
The ACLE code also lowered the associated SVE2.1 intrinsics into
suitable VEC_PERM_EXPRs. This lowering was not conditional on
!BYTES_BIG_ENDIAN.
The mismatch led to lots of ICEs in the ACLE tests on big-endian
targets: we lowered to VEC_PERM_EXPRs that are not supported.
I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
SVE maps the first memory element to the least significant end of
the register for both endiannesses, so no endian correction or lane
number adjustment is necessary.
This is in some ways a bit counterintuitive. ZIPQ1 is conceptually
"apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
matter when choosing between Advanced SIMD ZIP1 and ZIP2. For example,
the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
endian and ZIP2 for big-endian. But the difference between the hybrid
VLA and Advanced SIMD permute selectors is a consequence of the
difference between the SVE and Advanced SIMD element orders.
The same thing applies to ACLE intrinsics. The current lowering of
svzipq1 etc. is correct for both endiannesses. If ACLE code does:
2x svld1_s32 + svzipq1_s32 + svst1_s32
then the byte-for-byte result is the same for both endiannesses.
On big-endian targets, this is different from using the Advanced SIMD
sequence below for each 128-bit block:
depends on endianness, since the quadword gathers and scatters use
Advanced SIMD byte ordering for each 128-bit block. This gather/scatter
sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
sequence for both endiannesses.
Programmers writing ACLE code have to be aware of this difference
if they want to support both endiannesses.
The patch includes some new execution tests to verify the expansion
of the VEC_PERM_EXPRs.
aarch64: Fix endianness of DFmode vector constants
aarch64_simd_valid_imm tries to decompose a constant into a repeating
series of 64 bits, since most Advanced SIMD and SVE immediate forms
require that. (The exceptions are handled first.) It does this by
building up a byte-level register image, lsb first. If the image does
turn out to repeat every 64 bits, it loads the first 64 bits into an
integer.
At this point, endianness has mostly been dealt with. Endianness
applies to transfers between registers and memory, whereas at this
point we're dealing purely with register values.
However, one of things we try is to bitcast the value to a float
and use FMOV. This involves splitting the value into 32-bit chunks
(stored as longs) and passing them to real_from_target. The problem
being fixed by this patch is that, when a value spans multiple 32-bit
chunks, real_from_target expects them to be in memory rather than
register order. Thus index 0 is the most significant chunk if
FLOAT_WORDS_BIG_ENDIAN and the least significant chunk otherwise.
This fixes aarch64/sve/cond_fadd_1.c and various other tests
for aarch64_be-elf.
gcc/
* config/aarch64/aarch64.cc (aarch64_simd_valid_imm): Account
for FLOAT_WORDS_BIG_ENDIAN when building a floating-point value.
When using SVE INDEX to load an Advanced SIMD vector, we need to
take account of the different element ordering for big-endian
targets. For example, when big-endian targets store the V4SI
constant { 0, 1, 2, 3 } in registers, 0 becomes the most
significant element, whereas INDEX always operates from the
least significant element. A big-endian target would therefore
load V4SI { 0, 1, 2, 3 } using:
INDEX Z0.S, #3, #-1
rather than little-endian's:
INDEX Z0.S, #0, #1
While there, I noticed that we would only check the first vector
in a multi-vector SVE constant, which would trigger an ICE if the
other vectors turned out to be invalid. This is pretty difficult to
trigger at the moment, since we only allow single-register modes to be
used as frontend & middle-end vector modes, but it can be seen using
the RTL frontend.
gcc/
* config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New
function, split out from...
(aarch64_simd_valid_imm): ...here. Account for the different
SVE and Advanced SIMD element orders on big-endian targets.
Check each vector in a structure mode.
gcc/testsuite/
* gcc.dg/rtl/aarch64/vec-series-1.c: New test.
* gcc.dg/rtl/aarch64/vec-series-2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected
output for this big-endian test.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian
targets and add more tests.
* gcc.target/aarch64/sve/vec_init_4.c: New big-endian version
of vec_init_3.c.
While working on a new testcase that uses the RTL frontend,
I hit a bug where a (reg ...) that spans multiple hard registers
had REG_NREGS set to 1. This caused various things to misbehave.
For example, if the (reg ...) in question was used as crtl->return_rtx,
only the first register in the group would be marked as live on exit.
gcc/
* read-rtl-function.cc (function_reader::read_rtx_operand_r): Use
hard_regno_nregs to work out REG_NREGS for hard registers.
aarch64: Fix ZIP1 order in aarch64_expand_vector_init [PR118891]
aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.
This fixes many execution failures on aarch64_be-elf, including
gcc.c-torture/execute/pr28982a.c.
gcc/
PR target/118891
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Fix the
ZIP1 operand order for big-endian targets.
aarch64: Fix neon-sve-bridge.c failures for big-endian
Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for the implementation.)
This means that arm_sve_neon_bridge.h needs to use custom define_insns
for big-endian targets, in lieu of using lowpart subregs. However,
one of those define_insns relied on the prohibited lowparts internally,
to convert an Advanced SIMD register to an SVE register. Since the
lowpart is not allowed, the lowpart_subreg would return null, leading
to a later ICE.
The simplest fix seems to be to use %Z instead, to force the Advanced
SIMD register to be written as an SVE register.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_set_neonq_<mode>):
Use %Z instead of lowpart_subreg. Tweak formatting.
if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
{
bit = subreg_lsb (dst).to_constant ();
if (bit >= HOST_BITS_PER_WIDE_INT)
bit = HOST_BITS_PER_WIDE_INT - 1;
dst = SUBREG_REG (dst);
But a constant SUBREG_BYTE doesn't guarantee a constant subreg_lsb.
If the SUBREG_REG is a pair of N-bit registers on a big-endian target,
the most significant end has a SUBREG_BYTE of 0 but a subreg_lsb of N.
This N would then be non-constant for variable-length registers.
The patch fixes gcc.dg/torture/pr120276.c and other failures on
aarch64_be-elf.
gcc/
* ext-dce.cc (ext_dce_process_uses): Apply is_constant directly
to the subreg_lsb.
vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]
In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations. Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part handles the second half.
For tree codes, supportable_widening_operation first chooses hi/lo
pairs based on little-endian order and then uses:
if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
std::swap (c1, c2);
to adjust. However, the handling for internal functions was missing
an equivalent fixup. This led to several execution failures in vect.exp
on aarch64_be-elf.
If the hi/lo code fails, the internal function handling goes on to try
even/odd. But I couldn't see anything obvious that would put the even/
odd results back into the right order later, so there might be a latent
bug there too.
gcc/
PR tree-optimization/118891
* tree-vect-stmts.cc (supportable_widening_operation): Swap the
hi and lo internal functions on big-endian targets.
Martin Jambor [Mon, 23 Jun 2025 21:52:20 +0000 (23:52 +0200)]
rust: Silence a clang warning in borrow-checker-diagnostics
When compiling
gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
with clang, it emits the following warning:
gcc/rust/checks/errors/borrowck/rust-borrow-checker-diagnostics.cc:145:46: warning: non-constant-expression cannot be narrowed from type 'Polonius::Loan' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]
I'd hope that for indexing that is never really a problem,
nevertheless if narrowing is taking place, I guess it can be argued it
should be made explicit.
gcc/rust/ChangeLog:
2025-06-23 Martin Jambor <mjambor@suse.cz>
* checks/errors/borrowck/rust-borrow-checker-diagnostics.cc
(BorrowCheckerDiagnostics::get_loan): Type cast loan to uint32_t.
* checks/errors/borrowck/rust-bir-place.h
(IndexVec::size_type): Add.
(IndexVec::MAX_INDEX): Add.
(IndexVec::size): Change the return type to the type of the
internal value used by the index type.
(PlaceDB::lookup_or_add_variable): Use the return value from the
PlaceDB::add_place call.
* checks/errors/borrowck/rust-bir.h
(struct BasicBlockId): Move this definition before the
definition of the struct Function.
Thomas Schwinge [Sat, 19 Apr 2025 13:49:34 +0000 (15:49 +0200)]
Disable parallel testing for 'rust/compile/nr2/compile.exp' [PR119508]
..., using the standard idiom. This '*.exp' file doesn't adhere to the
parallel testing protocol as defined in 'gcc/testsuite/lib/gcc-defs.exp'.
This also restores proper behavior for '*.exp' files executing after (!) this
one, which erroneously caused hundreds or even thousands of individual test
cases get duplicated vs. skipped, randomly, depending on the '-jN' level.
Besides this commit working as a release-branch fix for the
PR, code inspection shows slightly better code for TImode
libgcc functions, and a modified
gcc.c-torture/execute/arith-rand-ll.c (basically s/long
long/__int128 and cutting out the non-128-bit cases) shows a
1.4% improvement. (Coremark code is identical, as
expected.)
Richard Biener [Fri, 18 Jul 2025 07:02:09 +0000 (09:02 +0200)]
tree-optimization/120924 - up --param uninit-max-chain-len
The PR shows that the uninit analysis limits are set too low in
cases we lower switches to ifs as happens on s390x for a linux
kernel TU. This causes false positive uninit diagnostics as we
abort the attempt to prove that a value is initialized on all
paths. The new testcase only would require upping to 9.
PR tree-optimization/120924
* params.opt (uninit-max-chain-len): Up from 8 to 12.
Richard Biener [Mon, 14 Jul 2025 12:09:28 +0000 (14:09 +0200)]
tree-optimization/121059 - fixup loop mask query
When we opportunistically mask an operand of a AND with an already
available loop mask we need to query that set with the correct number
of masks we expect.
PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Query
scalar_cond_masked_set with the correct number of masks.
Richard Biener [Wed, 16 Jul 2025 13:07:58 +0000 (15:07 +0200)]
tree-optimization/121049 - avoid loop masking with even/odd reduction
The following disables loop masking when we are using an even/odd
widening operation in a reduction because the loop mask then aligns
to the wrong elements.
PR tree-optimization/121049
* internal-fn.h (widening_evenodd_fn_p): Declare.
* internal-fn.cc (widening_evenodd_fn_p): New function.
* tree-vect-stmts.cc (vectorizable_conversion): When using
an even/odd widening function disable loop masking.
Richard Biener [Wed, 16 Jul 2025 18:19:44 +0000 (20:19 +0200)]
tree-optimization/121035 - handle stray VN values without expression
When VN iterates we can end up with unreachable inserted expressions
in the expression tables which in turn will not be added to their
value by PREs compute_avail. This will later ICE when we pick
them up and want to generate them. Deal with this by giving up.
PR tree-optimization/121035
* tree-ssa-pre.cc (find_or_generate_expression): Handle
values without expression.
Gaius Mulley [Fri, 18 Jul 2025 07:48:22 +0000 (08:48 +0100)]
[PATCH] [PR modula2/117203] Followup add Delete procedure function
This patch provides GetFileName procedure function for
FIO.File, FileSystem.File and IOChan.ChanId. The
return result from these procedures can be passed into
StringFileSysOp.Unlink to complete the required delete.
gcc/m2/ChangeLog:
PR modula2/117203
* gm2-libs-log/FileSystem.def (GetFileName): New
procedure function.
(WriteString): New procedure.
* gm2-libs-log/FileSystem.mod (GetFileName): New
procedure function.
(WriteString): New procedure.
* gm2-libs/SFIO.def (GetFileName): New procedure function.
* gm2-libs/SFIO.mod (GetFileName): New procedure function.
* gm2-libs-iso/IOChanUtils.def: New file.
* gm2-libs-iso/IOChanUtils.mod: New file.
PR modula2/117203
* gm2/isolib/run/pass/testdelete2.mod: New test.
* gm2/pimlib/logitech/run/pass/testdelete2.mod: New test.
* gm2/pimlib/run/pass/testdelete.mod: New test.
Jakub Jelinek [Fri, 18 Jul 2025 07:20:30 +0000 (09:20 +0200)]
gimple-fold: Fix up big endian _BitInt adjustment [PR121131]
The following testcase ICEs because SCALAR_INT_TYPE_MODE of course
doesn't work for large BITINT_TYPE types which have BLKmode.
native_encode* as well as e.g. r14-8276 use in cases like these
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE ()) and TREE_INT_CST_LOW (TYPE_SIZE_UNIT
()) for the BLKmode ones.
In this case, it wants bits rather than bytes, so I've used
GET_MODE_BITSIZE like before and TYPE_SIZE otherwise.
Furthermore, the patch only computes encoding_size for big endian
targets, for little endian we don't really adjust anything, so there
is no point computing it.
2025-07-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/121131
* gimple-fold.cc (fold_nonarray_ctor_reference): Use
TREE_INT_CST_LOW (TYPE_SIZE ()) instead of
GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE ()) for BLKmode BITINT_TYPEs.
Don't compute encoding_size at all for little endian targets.
Gaius Mulley [Fri, 18 Jul 2025 07:02:52 +0000 (08:02 +0100)]
[PATCH] [PR modula2/120731] error in Strings.Pos causing sigsegv
This patch corrects the m2log library procedure function
Strings.Pos which incorrectly sliced the wrong component
of the source string. The incorrect slice could cause
a sigsegv if negative slice indices were generated.
Gaius Mulley [Fri, 18 Jul 2025 00:47:40 +0000 (01:47 +0100)]
[PATCH] PR modula2/120673: Mutually dependent types crash the compiler
This patch fixes an ICE which will occur if cyclic dependent types
are used when declaring a variable. This patch detects the
cyclic dependency and issues an error message for each outstanding
component.
gcc/m2/ChangeLog:
PR modula2/120673
* gm2-compiler/M2GCCDeclare.mod (ErrorDepList): New
global variable set containing every errant dependency symbol.
(mystop): Remove.
(EmitCircularDependancyError): Replace with ...
(EmitCircularDependencyError): ... this.
(AssertAllTypesDeclared): Rewrite.
(DoVariableDeclaration): Ditto.
(TypeDependentsDeclared): New procedure function.
(PrepareGCCVarDeclaration): Ditto.
(DeclareVariable): Remove assert.
(DeclareLocalVariable): Ditto.
(Constructor): Initialize ErrorDepList.
* gm2-compiler/M2MetaError.mod (doErrorScopeProc): Rewrite
and ensure that a symbol with a module scope does not lookup
from a definition module.
* gm2-compiler/P2SymBuild.mod (BuildType): Rewrite so that
a synonym type is created using the token refering to the name
on the lhs.
gcc/testsuite/ChangeLog:
PR modula2/120673
* gm2/pim/fail/badmodvar.mod: New test.
* gm2/pim/fail/cyclictypes.mod: New test.
* gm2/pim/fail/cyclictypes2.mod: New test.
* gm2/pim/fail/cyclictypes4.mod: New test.
Gaius Mulley [Thu, 17 Jul 2025 19:41:10 +0000 (20:41 +0100)]
[PATCH] PR modula2/120606: FOR loop ICE if the last expression uses an array
This patch fixes the ICE which occurs if the last expression is an array.
It ensures that the start and end values of the for loop expressions are
dereferenced.
gcc/m2/ChangeLog:
PR modula2/120606
* gm2-compiler/M2Quads.mod (ForLoopLastIterator): Dereference
start and end expressions e1 and e2 respectively.
gcc/testsuite/ChangeLog:
PR modula2/120606
* gm2/pim/pass/forarray.mod: New test.
Gaius Mulley [Thu, 17 Jul 2025 16:51:03 +0000 (17:51 +0100)]
[PATCH] [PR modula2/119650, PR modula2/117203]: WriteString and Delete are missing from base libraries
This patch introduces a Write procedure for an array of char,
the string and char datatype. It uses the m2r10 style of
naming the module on the datatype. This uncovered a bug
in the import handling inside Quadident. It also includes
an Unlink procedure from a new module FileSysOp and a String
interface to this module.
gcc/m2/ChangeLog:
PR modula2/119650
PR modula2/117203
* gm2-compiler/P2Build.bnf (CheckModuleQualident): New
procedure.
(Qualident): Rewrite.
* gm2-compiler/P3Build.bnf (PushTFQualident): New procedure.
(CheckModuleQualident): Ditto.
(Qualident): Rewrite.
* gm2-compiler/PCBuild.bnf (PushTFQualident): New procedure.
(CheckModuleQualident): Ditto.
(Qualident): Rewrite.
* gm2-compiler/PHBuild.bnf (PushTFQualident): New procedure.
(CheckModuleQualident): Ditto.
(Qualident): Rewrite.
* gm2-libs/ARRAYOFCHAR.def: New file.
* gm2-libs/ARRAYOFCHAR.mod: New file.
* gm2-libs/CFileSysOp.def: New file.
* gm2-libs/CHAR.def: New file.
* gm2-libs/CHAR.mod: New file.
* gm2-libs/FileSysOp.def: New file.
* gm2-libs/FileSysOp.mod: New file.
* gm2-libs/String.def: New file.
* gm2-libs/String.mod: New file.
* gm2-libs/StringFileSysOp.def: New file.
* gm2-libs/StringFileSysOp.mod: New file.
libgm2/ChangeLog:
PR modula2/119650
PR modula2/117203
* libm2pim/Makefile.am (M2MODS): Add ARRAYOFCHAR,
CHAR.mod, StringFileSysOp.mod and String.mod.
(M2DEFS): Add ARRAYOFCHAR, CHAR.mod,
StringFileSysOp.mod and String.mod.
(libm2pim_la_SOURCES): Add CFileSysOp.c.
* libm2pim/Makefile.in: Regenerate.
* libm2pim/CFileSysOp.cc: New file.
gcc/testsuite/ChangeLog:
PR modula2/119650
* gm2/iso/fail/CHAR.mod: New test.
* gm2/iso/run/pass/CHAR.mod: New test.
* gm2/iso/run/pass/importself.mod: New test.
* gm2/pimlib/run/pass/testwrite.mod: New test.
* gm2/pimlib/run/pass/testwritechar.mod: New test.
Gaius Mulley [Thu, 17 Jul 2025 12:57:52 +0000 (13:57 +0100)]
[PATCH] PR modula2/120542: Return statement in the main procedure crashes the compiler
The patch checks whether a return statement is allowed. It also checks
to see that a return expression is allowed.
gcc/m2/ChangeLog:
PR modula2/120542
* gm2-compiler/M2Quads.mod (BuildReturnLower): New procedure.
(BuildReturn): Allow return without an expression from
module initialization blocks. Generate an error if an
expression is provided. Call BuildReturnLower if no error
was seen.
gcc/testsuite/ChangeLog:
PR modula2/120542
* gm2/iso/fail/badreturn.mod: New test.
* gm2/iso/fail/badreturn2.mod: New test.
* gm2/iso/pass/modulereturn.mod: New test.
* gm2/iso/pass/modulereturn2.mod: New test.
Gaius Mulley [Wed, 16 Jul 2025 20:17:51 +0000 (21:17 +0100)]
[PATCH] PR modula2/120497: error is generated for good code when returning a pointer var variable
The return type checking needs to skip over the Lvalue part of the VAR
parameter or variable.
gcc/m2/ChangeLog:
PR modula2/120497
* gm2-compiler/M2Range.mod (IsAssignmentCompatible): Remove from
import list.
(FoldTypeReturnFunc): Rewrite to skip the Lvalue of a var
variable.
(CodeTypeReturnFunc): Ditto.
(CodeTypeIndrX): Call AssignmentTypeCompatible rather than
IsAssignmentCompatible.
(FoldTypeIndrX): Ditto.
gcc/testsuite/ChangeLog:
PR modula2/120497
* gm2/pim/pass/ReturnType.mod: New test.
* gm2/pim/pass/ReturnType2.mod: New test.
Gaius Mulley [Wed, 16 Jul 2025 18:33:37 +0000 (19:33 +0100)]
[PATCH] PR modula2/120389 Assigning wrong type to an array causes an ICE
Although cherry picked as described. The cherry pick does not include
the command option (-fm2-strict-type-reason) introduced in:
gcc/m2/gm2-lang.cc, gcc/m2/lang.opt and gcc/doc/gm2.texi from the
original patch.
This patch provides follow on fixes for undetected type violations
which can occur then Lvalues are generated during assignment.
For example array accesses and with statements. The type checker
M2Check.mod has been overhauled and cleaned up.
gcc/m2/ChangeLog:
PR modula2/120389
* gm2-compiler/M2Check.def (AssignmentTypeCompatible): Add new
parameter enableReason.
* gm2-compiler/M2Check.mod (EquivalenceProcedure): New type.
(falseReason2): New procedure function.
(falseReason1): Ditto.
(falseReason0): Ditto.
(checkTypeEquivalence): Rewrite.
(checkUnboundedArray): Ditto.
(checkUnbounded): Ditto.
(checkArrayTypeEquivalence): Ditto.
(checkCharStringTypeEquivalence): Ditto.
(buildError4): Add false reason.
(buildError2): Ditto.
(IsTyped): Use GetDType.
(IsTypeEquivalence): New procedure function.
(checkVarTypeEquivalence): Ditto.
(checkVarEquivalence ): Rewrite.
(checkConstMeta): Ditto.
(checkEnumField): New procedure function.
(checkEnumFieldEquivalence): Ditto.
(checkSubrangeTypeEquivalence): Rewrite.
(checkSystemEquivalence): Ditto.
(checkTypeKindViolation): Ditto.
(doCheckPair): Ditto.
(InitEquivalenceArray): New procedure.
(addEquivalence): Ditto.
(checkProcType): Rewrite.
(deconstruct): Deallocate reason string.
(AssignmentTypeCompatible): Initialize reason and reasonEnable
fields.
(ParameterTypeCompatible): Ditto.
(doExpressionTypeCompatible): Ditto.
* gm2-compiler/M2GenGCC.mod (CodeIndrX) Rewrite.
(CheckBinaryExpressionTypes): Rewrite and simplify now that the
type checker is more robust.
(CheckElementSetTypes): Ditto.
(CodeXIndr): Add new range assignment type check.
* gm2-compiler/M2MetaError.def: Correct comments.
* gm2-compiler/M2Options.def (SetStrictTypeAssignment): New procedure.
(SetStrictTypeReason): Ditto.
* gm2-compiler/M2Options.mod: (SetStrictTypeAssignment): New procedure.
(SetStrictTypeReason): Ditto.
(StrictTypeReason): Initialize.
(StrictTypeAssignment): Ditto.
* gm2-compiler/M2Quads.mod (CheckBreak): Delete.
(BreakQuad): New global variable.
(BreakAtQuad): Delete.
(gdbhook): New procedure.
(BreakWhenQuadCreated): Ditto.
(CheckBreak): Ditto.
(Init): Call BreakWhenQuadCreated and gdbhook.
(doBuildAssignment): Add type assignment range check.
(CheckProcTypeAndProcedure): Only check if the procedure
types differ.
(doIndrX): Add type IndrX range check.
(CheckReturnType): Add range return type check.
* gm2-compiler/M2Range.def (InitTypesIndrXCheck): New procedure
function.
(InitTypesReturnTypeCheck): Ditto.
* gm2-compiler/M2Range.mod (InitTypesIndrXCheck): New procedure
function.
(InitTypesReturnTypeCheck): Ditto.
(HandlerExists): Add new clauses.
(FoldAssignment): Pass extra FALSE parameter to
AssignmentTypeCompatible.
(FoldTypeReturnFunc): New procedure.
(FoldTypeAssign): Ditto.
(FoldTypeIndrX): Ditto.
(CodeTypeAssign): Rewrite.
(CodeTypeIndrX): New procedure.
(CodeTypeReturnFunc): Ditto.
(FoldTypeCheck): Add new case clauses.
(CodeTypeCheck): Ditto.
(FoldRangeCheckLower): Ditto.
(IssueWarning): Ditto.
* gm2-gcc/m2options.h (M2Options_SetStrictTypeAssignment): New
function prototype.
(M2Options_SetStrictTypeReason): Ditto.
gcc/testsuite/ChangeLog:
PR modula2/120389
* gm2/pim/fail/testcharint.mod: New test.
* gm2/pim/fail/testindrx.mod: New test.
* gm2/pim/pass/testxindr.mod: New test.
* gm2/pim/pass/testxindr2.mod: New test.
* gm2/pim/pass/testxindr3.mod: New test.
i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F
In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This
leads to re-consideration on the imply for AMX-AVX512.
Since it is using zmm register and using zmm register only, we
need to at least imply AVX512F. AVX512VL is not needed.
On the other hand, if we imply AVX10.1 for AMX-AVX512, it will
cause -mno-avx10.1 disabling AMX-AVX512. This would be a surprise
for users.
Based on the two reasons above, the patch is decoupling AMX-AVX512
from AVX10.2 and imply AVX512F.
gcc/ChangeLog:
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_AVX512_SET): Do not set AVX10.2.
(OPTION_MASK_ISA2_AVX10_2_UNSET): Remove AMX-AVX512 unset.
(OPTION_MASK_ISA2_AVX512F_UNSET): Unset AMX-AVX512.
(ix86_handle_option): Imply AVX512F for AMX-AVX512.
gcc/testsuite/ChangeLog:
* gcc.target/i386/amxavx512-cvtrowd2ps-2.c: Add -mavx512fp16 to
use FP16 related intrins for convert.
* gcc.target/i386/amxavx512-cvtrowps2bf16-2.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowps2ph-2.c: Ditto.
* gcc.target/i386/amxavx512-movrow-2.c: Ditto.
Gaius Mulley [Tue, 15 Jul 2025 18:38:04 +0000 (19:38 +0100)]
[PATCH] PR modula2/120389 ICE if assigning a constant char to an integer array
This patch fixes an ICE which occurs if a constant char is assigned
into an integer array. The fix it to introduce type checking in
M2GenGCC.mod:CodeXIndr.
gcc/m2/ChangeLog:
PR modula2/120389
* gm2-compiler/M2GenGCC.mod (CodeXIndr): Check to see that
the type of left is assignment compatible with the type of
right.
gcc/testsuite/ChangeLog:
PR modula2/120389
* gm2/iso/fail/badarray3.mod: New test.
openmp, fortran: Fix ICE when the procedure name cannot be found in declare variant directives [PR104428]
The result of searching for the procedure name symbol should be checked in
case the symbol cannot be found to avoid a null dereference.
gcc/fortran/
PR fortran/104428
* trans-openmp.cc (gfc_trans_omp_declare_variant): Check that proc_st
is non-NULL before dereferencing. Add line number to error message.
Fortran: Ensure finalizers are created correctly [PR120637]
Finalize_component freeed an expression that it used to remember which
components in which context it had finalized already. While it makes
sense to free the copy of the expression, if it is unused, it causes
issues, when comparing to a non existent expression. This is now
detected by returning true, when the expression has been used.
PR fortran/120637
gcc/fortran/ChangeLog:
* class.cc (finalize_component): Return true, when a finalizable
component was detect and do not free it.
Andrew Pinski [Sun, 6 Jul 2025 17:20:26 +0000 (10:20 -0700)]
crc: Error out on non-constant poly arguments for the crc builtins [PR120709]
These builtins requires a constant integer for the third argument but currently
there is assert rather than error. This fixes that and updates the documentation too.
Uses the same terms as was being used for the __builtin_prefetch arguments.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120709
gcc/ChangeLog:
* builtins.cc (expand_builtin_crc_table_based): Error out
instead of asserting the 3rd argument is an integer constant.
* internal-fn.cc (expand_crc_optab_fn): Likewise.
* doc/extend.texi (crc): Document requirement of the poly argument
being a constant.
aarch64: PR target/120999: Adjust operands for movprfx alternative of NBSL implementation of NOR
While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
due to its tied operands, the destination of the movprfx cannot be also
a source operand. But the offending pattern in aarch64-sve2.md tries
to do exactly that for the "=?&w,w,w" alternative and gas warns for the
attached testcase.
This patch adjusts that alternative to avoid taking operand 0 as an input
in the NBSL again.
So for the testcase in the patch we now generate:
nor_z:
movprfx z0, z1
nbsl z0.d, z0.d, z2.d, z1.d
ret
instead of the previous:
nor_z:
movprfx z0, z1
nbsl z0.d, z0.d, z2.d, z0.d
ret
which generated a gas warning.
Bootstrapped and tested on aarch64-none-linux-gnu.
Richard Earnshaw [Mon, 14 Apr 2025 15:41:16 +0000 (16:41 +0100)]
aarch64: Fix up commutative and early-clobber markers on compact insns
For constraints there are operand modifiers and constraint qualifiers.
Operand modifiers apply to all alternatives and must appear, in
traditional syntax before the first alternative. Constraint
qualifiers, on the other hand must appear in each alternative to which
they apply.
There's no easy way to validate the distinction in the traditional md
format, but when using the new compact format we can enforce some
semantic checking of these characters to avoid some potentially
surprising code generation.
Fortunately, all of these errors are benign, but the two misplaced
early-clobber markers were quite suspicious at first sight - it's only
by luck that the second alternative does not need an early-clobber.
The syntax checking will be added in the following patch, but first of
all, fix up the errors in aarch64.md.
Jeff Law [Mon, 30 Jun 2025 20:38:33 +0000 (14:38 -0600)]
[committed] [PR rtl-optimization/120242] Fix SUBREG_PROMOTED_VAR_P after ext-dce's actions
I've gone back and forth of these problems multiple times. We have two passes,
ext-dce and combine which eliminate extensions using totally different
mechanisms.
ext-dce looks for cases where the state of upper bits in an object aren't
observable and if they aren't observable, then eliminates extensions which set
those bits.
combine looks for cases where we know the state of the upper bits and can prove
an extension is just setting those bits to their prior value. Combine also
looks for cases where the precise extension isn't really important, just the
knowledge that the upper bits are zero or sign extended from a narrower mode
is needed.
Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. If the
actions of ext-dce (or any other pass for that matter) make
SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then
combine can end up generating incorrect code.
--
When ext-dce eliminates an extension and turns it into a subreg copy (without
any known SUBREG_PROMOTED_VAR state). Since we can no longer guarantee the
destination object has any known extension state, we scurry around and wipe
SUBREG_PROMOTED_VAR state for the destination object.
That's fine and dandy, but ultimately insufficient. Consider if the
destination of the optimized extension was used as a source in a simple copy
insn. Furthermore assume that the destination of that copy is used within a
SUBREG expression with SUBREG_PROMOTED_VAR set. ext-dce's actions have
clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy, albeit
indirectly.
This patch addresses this problem by taking the set of pseudos directly
impacted by ext-dce's actions and expands that set by building a transitive
closure for pseudos connected via copies. We then scurry around finding
SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of
pseudos. Voila, everything just works.
--
The other approach here would be to further expand the liveness sets inside
ext-dce. That's a simpler path forward, but ultimately regresses the quality
of codes we do care about.
One good piece of news is that with the transitive closure bits in place, we
can eliminate a bit of the live set expansion we had in place for
SUBREG_PROMOTED_VAR objects.
--
So let's take one case of the 5 that have been reported.
Combine will do its thing on insns 30/31. Essentially the sign extension is
not necessary in this context, assuming the promoted subreg status in insn 30
-- the equality test doesn't really care about the kind of extension, just
knowing the value is extended is enough to safely elide the extension.
And now we've come to the crux the problem. That promotion state needs to be
adjusted. The new ext-dce code will see that copy at insn 26 and add (reg 144)
to the set of registers that need promotion state wiped. And everything is
happy after that.
The other cases are similar in nature.
--
This has been bootstrapped and regression tested on x86_64 and aarch64.
Variants have bootstrapped & regression tested on several other platforms and
it's survived testing on the crosses as well.
* ext-dce.cc (ext_dce_process_uses): Remove some cases where we
unnecessarily expanded live sets for promoted subregs.
(expand_changed_pseudos): New function.
(reset_subreg_promoted_p): Use it.
RISC-V: prefetch: fix LRA failing to allocate reg [PR118241]
prefetch was recently fixed/tightened (with Q reg constraint) to only
support right address patterns (REG or REG+D with lower 5 bits clear).
However in some cases that's too restrictive for LRA and it fails to
allocate a reg resulting in following ICE...
Jeff Law [Sat, 21 Jun 2025 14:24:58 +0000 (08:24 -0600)]
[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V
The RISC-V prefetch support is broken in a few ways. This addresses the data
side prefetch problems. I'd mistakenly thought this BZ was a prefetch.i
related (which has deeper problems).
The basic problem is we were accepting any valid address when in fact there are
restrictions. This patch more precisely defines the predicate such that we
allow
REG
REG+D
Where D must have the low 5 bits clear. Note that absolute addresses fall into
the REG+D form using the x0 for the register operand since it always has the
value zero. The test verifies REG, REG+D, ABS addressing modes that are valid
as well as REG+D and ABS which must be reloaded into a REG because the
displacement has low bits set.
An earlier version of this patch has gone through testing in my tester on rv32
and rv64. Obviously I'll wait for pre-commit CI to do its thing before moving
forward.
This is a good backport candidate after simmering on the trunk for a bit.
PR target/118241
gcc/
* config/riscv/predicates.md (prefetch_operand): New predicate.
* config/riscv/constraints.md (Q): New constraint.
* config/riscv/riscv.md (prefetch): Use new predicate and constraint.
(riscv_prefetchi_<mode>): Similarly.
gcc/testsuite/
* gcc.target/riscv/pr118241.c: New test.