The initial signal handling code introduced for aarch64-android
overlooked details of the tasking runtime, not in the initial testing
perimeter.
Specifically, a reference to __gnat_sigtramp from __gnat_error_handler,
initially introduced for the arm port, was prevented if !arm on the
grounds that other ports would rely on kernel CFI. aarch64-android
does provide kernel CFI and __gnat_sigtramp was not provided for this
configuration.
But there is a similar reference from s-intman__android, which kicks in
as soon as the tasking runtime gets activated, triggering link failures.
Testing for more precise target specific parameters from Ada
code is inconvenient and replicating the logic is not attractive in
any case, so this change addresses the problem in the following
fashion:
- Always provide a __gnat_sigtramp entry point, common to the
tasking and non-tasking signal handling code for all the Android
configurations,
- There (C code), from target definition macros, select a path
that either routes directly to the actual signal handler or goes
through the intermediate layer providing hand crafted CFI
information which allows unwinding up to the interrupted code.
- Similarily to what was done for VxWorks, move the arm specific
definitions to a separate header file to make the general structure
of the common C code easier to grasp,
- Adjust the comments in the common sigtramp.h header to
account for such an organisation possibility.
gcc/ada/ChangeLog:
* sigtramp-armdroid.c: Refactor into ...
* sigtramp-android.c, sigtramp-android-asm.h: New files.
* Makefile.rtl (arm/aarch64-android section): Add
sigtramp-android.o to EXTRA_LIBGNAT_OBJS unconditionally. Add
sigtramp.h and sigtramp-android-asm.h to EXTRA_LIBGNAT_SRCS.
* init.c (android section, __gnat_error_handler): Defer to
__gnat_sigramp unconditionally again.
* sigtramp.h: Adjust comments to allow neutral signal handling
relays, merely forwarding to the underlying handler without any
intermediate CFI magic.
Eric Botcazou [Thu, 12 Sep 2024 10:45:27 +0000 (12:45 +0200)]
ada: Fix bogus Constraint_Error for 'Wide_Wide_Value on wide enumeration literal
The problem is that 'Wide_Wide_Value is piggybacked on 'Value and the latter
invokes System.Val_Util.Normalize_String, which incorrectly normalizes the
input string in the presence of enumeration literals with wide characters.
gcc/ada/ChangeLog:
PR ada/115507
* exp_imgv.adb (Expand_Valid_Value_Attribute): Add actual parameter
for Is_Wide formal in the call to Valid_Value_Enumeration_NN.
(Expand_Value_Attribute): Likewise.
* libgnat/s-vaen16.ads (Value_Enumeration_16): Add Is_Wide formal.
(Valid_Value_Enumeration_16): Likewise.
* libgnat/s-vaen32.ads (Value_Enumeration_32): Likewise.
(Valid_Value_Enumeration_32): Likewise.
* libgnat/s-vaenu8.ads (Value_Enumeration_8): Likewise.
(Valid_Value_Enumeration_8): Likewise.
* libgnat/s-valboo.adb (Value_Boolean): Pass True for To_Upper_Case
formal parameter in call to Normalize_String.
* libgnat/s-valcha.adb (Value_Character): Likewise.
* libgnat/s-valuen.ads (Value_Enumeration): Add Is_Wide formal.
(Valid_Value_Enumeration): Likewise.
* libgnat/s-valuen.adb (Value_Enumeration_Pos): Likewise and pass
its negation for To_Upper_Case formal in call to Normalize_String.
(Valid_Value_Enumeration): Add Is_Wide formal and forward it in
call to Value_Enumeration_Pos.
(Value_Enumeration): Likewise.
* libgnat/s-valuti.ads (Normalize_String): Add To_Upper_Case formal
parameter and adjust post-condition accordingly.
* libgnat/s-valuti.adb (Normalize_String): Add To_Upper_Case formal
parameter and adjust implementation accordingly.
* libgnat/s-valwch.adb (Value_Wide_Wide_Character): Pass False for
To_Upper_Case formal parameter in call to Normalize_String.
Eric Botcazou [Wed, 11 Sep 2024 17:42:03 +0000 (19:42 +0200)]
ada: Fix bogus error in instantiation with formal package
The compiler reports that an actual does not match the formal when there
is a defaulted formal discrete type because Check_Formal_Package_Instance
fails to skip the implicit base type generated by the compiler.
gcc/ada/ChangeLog:
PR ada/114636
* sem_ch12.adb (Check_Formal_Package_Instance): For a defaulted
formal discrete type, skip the generated implicit base type.
Eric Botcazou [Wed, 11 Sep 2024 17:37:08 +0000 (19:37 +0200)]
ada: Fix negative value returned by 'Image for array with nonnegative component
The problem is that Exp_Put_Image.Build_Elementary_Put_Image_Call uses the
signedness of the base type but the size of the first subtype, hence the
discrepancy between them.
gcc/ada/ChangeLog:
PR ada/115535
* exp_put_image.adb (Build_Elementary_Put_Image_Call): Use the size
of the underlying type to find the support type.
Eric Botcazou [Wed, 11 Sep 2024 17:26:18 +0000 (19:26 +0200)]
ada: Fix internal error on elsif part of if-statement containing if-expression
The problem occurs when the compiler is trying to find a context to which
it can hoist finalization actions coming from the if-expression, because
Find_Hook_Context incorrectly returns the N_Elsif_Part node.
gcc/ada/ChangeLog:
PR ada/114640
* exp_util.adb (Find_Hook_Context): For a node present within a
conditional expression, do not return an N_Elsif_Part node.
A container aggregate can either be empty, contain only
positional elements or named element associations. Reject the
scenario where the latter two are both used.
gcc/ada/ChangeLog:
* diagnostics-constructors.adb
(Make_Mixed_Container_Aggregate_Error): New function for the error
message
(Record_Mixed_Container_Aggregate_Error): New function for the
error message.
* diagnostics-constructors.ads: Likewise.
* diagnostics-repository.ads: register new diagnostics id
* diagnostics.ads: add new diagnostics id
* errout.adb (First_And_Last_Node): Detect the span for component
associations.
* sem_aggr.adb (Resolve_Container_Aggregate): reject container
aggregates that have both named and positional elements.
ada: Add mechanism to test internal error machinery
This patch adds a pragma that triggers an internal compiler error when
analyzed. It is not externally documented and makes it possible to test
the code that runs when the compiler encounters an internal error.
gcc/ada/ChangeLog:
* snames.ads-tmpl: Add new pragma definition.
* par-prag.adb (Prag): Handle new pragma.
* sem_prag.adb (Analyze_Pragma): Implement new pragma.
This patch puts a comment explaining the absence of Storage_Size in an
alphabetically sorted list at the spot where Storage_Size would be in
that list.
gcc/ada/ChangeLog:
* snames.ads-tmpl: Tweak position of comment.
gcc/ada/ChangeLog:
* doc/gnat_rm/gnat_language_extensions.rst: replace
references to RFC's with appropriate text from the rfc
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
ada: Add dependency lines for External_Initialization
When a file included through External_Initialization has been modified,
the unit including it must be recompiled. This patch adds the
generation of dependency lines to the handling of the
External_Initialization aspect, to signal that fact to gnatmake and
other tools that invoke GNAT.
ada: Use corect capacity with two dimensional arrays
Previously when a bounded list was initialized with an array aggregate
then we used the correct size only if the array was one dimensional.
This patch adds support for deriving the size for multidimensional array
types as well.
gcc/ada/ChangeLog:
* exp_aggr.adb (Build_Siz_Exp): Support deriving the size of the
container aggregate with multi-dimensional arrays. Make the
function return an node of an expression instead of an integer.
Additionally calculate the size expression for
Component_Associations.
(To_Int) make this method available for more functions.
(Aggregate_Size) Relocate the calculation of
Componenet_Associations to Build_Siz_Exp.
Eric Botcazou [Tue, 10 Sep 2024 12:58:21 +0000 (14:58 +0200)]
ada: Add Is_Rep_To_Pos predicate and export it for use in gigi
This is modeled on the existing Is_Init_Proc predicate.
gcc/ada/ChangeLog:
* exp_tss.ads (Is_Rep_To_Pos): New function declaration.
* exp_tss.adb (Is_Rep_To_Pos): New function body.
* fe.h (Is_Rep_To_Pos): New macro and extern declaration.
Eric Botcazou [Tue, 10 Sep 2024 10:09:48 +0000 (12:09 +0200)]
ada: Avoid dependency on Long_Long_Long_Integer and System.Img_LLLI for 'Image
When the Image attribute is applied directly to another attribute returning
Universal_Integer, for example Enum_Rep, it is converted to the equivalent
of Universal_Integer'Image, which is implemented by Long_Long_Long_Integer
and thus triggers a dependency on System.Img_LLLI, both being unnecessary
in most practical cases.
gcc/ada/ChangeLog:
* exp_imgv.adb (Rewrite_Object_Image): When the prefix is a type
conversion to Universal_Integer, use its expression directly. When
the prefix is an integer literal with Universal_Integer type, try
to compute a narrower type.
Raphaël AMIARD [Thu, 29 Aug 2024 10:43:54 +0000 (12:43 +0200)]
ada: Use semantics from the RFC for declarative items mixed with statements
We want to allow statements lists with declarations *and* an exception
handler. What follows from this is that declarations declared in the
statement list are *not* visible from the exception handler, and that
the following code:
declare
A : Integer := 12;
begin
A : Integer := 15;
<stmts>
exception
when others => ...
Roughly expands to:
declare
A : Integer := 12;
begin
declare
A : Integer := 15;
begin
<stmts>
exception
when others => ...
As such, in the code above, there is no more error triggered for
conflicting declarations of `A`.
Move "Local declarations without block" into curated extensions
Restrict legal local decls in statement lists
Only accept object declarations & renamings, as well as use clauses for
gcc/ada/ChangeLog:
* par-ch11.adb (P_Sequence_Of_Statements): Remove Handled
parameter. Always wrap the statements in a block when there are
declarations in it.
* par-ch5.adb: Adapt call to P_Sequence_Of_Statements Update
outdated comment, remove useless `Style_Checks` pragma.
(P_Sequence_Of_Statements): Don't emit an error in core extensions
mode. Emit an error when a non valid declaration is parsed in
sequence of statements.
* par.adb: Adapt P_Sequence_Of_Statements' signature
* doc/gnat_rm/gnat_language_extensions.rst: Adapt documentation
now.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Steve Baird [Thu, 5 Sep 2024 20:42:20 +0000 (13:42 -0700)]
ada: Improved support for incomplete parameter types
Fix two bugs uncovered by a recent ACATS test C3A1005: a freezing problem
and a case where a user-defined equality function for an incomplete type
was incorrectly hidden from use-clause visibility by the "corresponding"
predefined op (which doesn't actually exist).
gcc/ada/ChangeLog:
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Don't freeze here
if Has_Delayed_Freeze returns True.
* sem_type.adb (Valid_Equality_Arg): Treat an incomplete type like
a limited type because neither has an implicitly-defined equality
primitive.
(Covers): If either argument is an incomplete type
whose full view is available, then look through to the full view.
* sem_res.adb (Resolve_Actuals): If the actual parameter type is
complete and the formal parameter type is not, then update the
formal parameter type to use the complete view.
squirek [Tue, 13 Aug 2024 11:42:41 +0000 (11:42 +0000)]
ada: Early freezeing of types with 'Size'Class
This patch fixes an issue in the compiler whereby declarations of derived types
whose parent is a mutably tagged type cause early freezing of the parent type -
leading to spurious compile-time errors.
gcc/ada/ChangeLog:
* sem_ch3.adb (Derived_Type_Declaration): Modify generation of
compile time check.
Eric Botcazou [Fri, 6 Sep 2024 08:05:58 +0000 (10:05 +0200)]
ada: Print the load address in symbolic backtraces
The load address of PIE executables is printed in non-symbolic backtraces
(-E binder switch) but it makes sense to print it in symbolic backtraces
(-Es binder switch) too, because symbolic backtraces may degenerate into
non-symbolic ones when the executable is stripped for example.
gcc/ada/ChangeLog:
* libgnat/s-trasym__dwarf.adb (LDAD_Header): New String constant.
(Symbolic_Traceback): Print the load address of the executable at
the beginning if it is not null.
If a limited private partial view of a type has an access discriminant with
a default expression, and if the type (perhaps tagged, perhaps not) is
completed by deriving from an immutably limited type, then the default
discriminant expression should not be rejected.
gcc/ada/ChangeLog:
* sem_ch6.adb (Check_Discriminant_Conformance): In testing whether
a default expression is permitted for an access discriminant, we
need to know whether the discriminated type is immutably limited.
Handle another part of this test that cannot easily be handled in
Sem_Aux.Is_Immutably_Limited. This involves declaring a new local
function, Is_Derived_From_Immutably_Limited_Type.
Steve Baird [Thu, 29 Aug 2024 22:17:54 +0000 (15:17 -0700)]
ada: Missing constraint check for 'Length attribute reference
In some cases involving a universal-integer-valued attribute reference
(typically a 'Length attribute reference) occurring as an actual parameter
in a call, the runtime check that the constraints of the formal parameter
are satisfied is incorrectly not performed.
gcc/ada/ChangeLog:
* sem_attr.adb (Resolve_Attribute): When setting the Etype of a
universal-integer-valued attribute reference to the subtype
determined by its context, use the basetype of that subtype
instead of the subtype itself if there is a possibility that the
attribute value will not satisfy the constraints of that subtype.
Otherwise the compiler is, in effect, assuming something that
might not be true. Except use the subtype in the case of a
not-from-source 'Pos attribute reference in order to avoid
breaking things.
This patch adds a way to have the adareducer tool run on a appropriate
set of files when GNAT crashes. This feature is behind the -gnatd_m
debugging switch.
gcc/ada/ChangeLog:
* comperr.adb (Compiler_Abort): Add call to
Generate_Minimal_Reproducer and replace call to Namet.Unlock with
call to Unlock_If_Locked.
* debug.adb: Document new purpose of -gnatd_m and -gnatd_M.
* fname-uf.adb (Instantiate_SFN_Pattern): New procedure.
(Get_Default_File_Name): New function.
(Get_File_Name): Replace inline code with call to
Instantiate_SFN_Pattern.
* fname-uf.ads (Get_Default_File_Name): New function.
* generate_minimal_reproducer.adb (Generate_Minimal_Reproducer):
New procedure.
* namet.adb (Unlock_If_Locked): New function.
* namet.ads (Unlock_If_Locked): Likewise.
* par-prag.adb (Prag): Add special behavior with -gnatd_M.
* set_targ.adb: Minor fixes to comments.
* gcc-interface/Make-lang.in: Update list of object files.
Eric Botcazou [Wed, 4 Sep 2024 22:19:25 +0000 (00:19 +0200)]
ada: Fix wrong finalization of anonymous array aggregate
The issue arises when the aggregate consists only of iterated associations
because, in this case, its expansion uses a 2-pass mechanism which creates
a temporary that needs a fully-fledged initialization, thus running afoul
of the optimization that avoids building the initialization procedure in
the anonymous array case.
gcc/ada/ChangeLog:
* exp_aggr.ads (Is_Two_Pass_Aggregate): New function declaration.
* exp_aggr.adb (Is_Two_Pass_Aggregate): New function body.
(Expand_Array_Aggregate): Call Is_Two_Pass_Aggregate to detect the
aggregates that need the 2-pass expansion.
* exp_ch3.adb (Expand_Freeze_Array_Type): In the anonymous array
case, build the initialization procedure if the initial value in
the object declaration is a 2-pass aggregate.
This patch introduces a GNAT extension that adds a new aspect,
External_Initialization. A section is added to the reference
manual with a description of what the aspect does.
The implementation reuses existing mechanisms, in particular
Sinput.L.Load_Source_File and Sem_Res.Set_String_Literal_Subtype.
A new node kind is added, and nodes of that type are present in what
is passed to the back ends. That makes it necessary to update the back
ends to handle the new node type. The C interface is extended to make
that possible.
gcc/ada/ChangeLog:
* aspects.ads: Add entities for External_Initialization.
* checks.adb (Selected_Length_Checks): Add support for
N_External_Initializer nodes.
* doc/gnat_rm/gnat_language_extensions.rst: Add section for the added
extension.
* exp_util.adb (Insert_Actions): Add support for N_External_Initializer
nodes.
* fe.h (C_Source_Buffer): New function.
* gen_il-fields.ads: Add new field.
* gen_il-gen-gen_nodes.adb: Add N_External_Initializer node kind.
* gen_il-gen.adb: Add new field type.
* gen_il-types.ads: Add new node kind and new field type.
* pprint.adb (Expr_Name): Handle new node kind.
* sem.adb (Analyze): Add support for N_External_Initializer nodes.
* sem_ch13.adb (Analyze_Aspect_Specifications, Check_Aspect_At_Freeze_Point):
Add support for External_Initialization aspect.
* sem_ch3.adb (Apply_External_Initialization): New subprogram.
(Analyze_Object_Declaration): Add support for External_Initialization aspect.
* sem_res.adb (Resolve_External_Initializer): New procedure.
(Resolve): Add support for N_External_Initializer nodes.
(Set_String_Literal_Subtype): Extend to handle N_External_Initializer nodes.
* sinfo-utils.adb (Is_In_Union_Id): Adapt to new field addition.
* sinfo.ads: Add documentation for new node kind and new field.
* sinput.adb, sinput.ads (C_Source_Buffer): Add new C interface function.
* snames.ads-tmpl: Add new aspect identifier.
* sprint.adb (Sprint_Node_Actual): Add nop handling of N_External_Initializer
nodes.
* types.ads: Modify type to allow for new C interface.
* gcc-interface/trans.cc (gnat_to_gnu): Handle new GNAT node type.
* gcc-interface/Make-lang.in: Update list of stage1 run-time library units.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Olivier Hainque [Fri, 16 Aug 2024 17:04:37 +0000 (19:04 +0200)]
ada: Use a-nallfl__wraplf.ads for Android
This is the most common definition. Otherwise, from the default:
a-nallfl.ads:51:13: ... intrinsic binding type mismatch on result
a-nallfl.ads:51:13: ... intrinsic binding type mismatch on parameter 1
a-nallfl.ads:51:13: ... profile of "Sin" doesn't match the builtin it binds
gcc/ada/ChangeLog:
* Makefile.rtl (arm/aarch64-android): Associate a-nallfl.ads with
libgnat/a-nallfl__wraplf.ads.
union {
sighandler_t sa_handler;
void (*sa_sigaction)(int, struct siginfo*, void*);
};
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
```
gcc/ada/ChangeLog:
* libgnarl/s-linux__android-arm.ads: New file, renaming of ...
* libgnarl/s-linux__android.ads: ... this file.
* libgnarl/s-linux__android-aarch64.ads: New file. Based on the
-arm variant, with sa_ field positions adjusted.
* Makefile.rtl (arm/aarch64-android pairs): Adjust accordingly.
* libgnarl/s-osinte__android.ads: Rather than making assumptions
on the actual type of the C sigset_t, use
Os_Constants.SIZEOF_sigset_t to define an Ada sigset_t type of the
proper size. Use C.int instead of unsigned_long for sa_flags.
Olivier Hainque [Fri, 16 Aug 2024 15:10:59 +0000 (17:10 +0200)]
ada: Account for aarch64 in init.c section for Android
Unlike the ARM port already there, aarch64 is dwarf CFI based
for unwinding and Android-Linux exposes kernel CFI for signal
handlers.
gcc/ada/ChangeLog:
* init.c (__gnat_error_handler): Map signals straight to Ada
exceptions, without a local CFI trampoline.
(__gnat_adjust_context_for_raise): Guard arm specific code on __arm__
compilation. Do nothing otherwise, relying on libgcc's signal
frame recognition for PC/RA adjustments.
Olivier Hainque [Fri, 16 Aug 2024 15:12:13 +0000 (17:12 +0200)]
ada: Extend arm-android section of Makefile.rtl to aarch64
gcc/ada/ChangeLog:
* Makefile.rtl: Extend arm-android section to aarch64, in a similar
fashion as other arm/arch64 configurations. Introduce pair
selection guards to prevent match of aarch64-linux-android on the
regular aarch64-linux% cross as well.
ada: sem_prag.adb: ignore compile_time_{warning,error} in CodePeer mode
GNAT sometimes needs help from the GCC back-end in order to check
whether Compile_Time_{Warning,Error} are true. As CodePeer does not have
access to a GCC back-end, it is unable to perform these checks. Thus we
need to remove said pragmas from the tree.
gcc/ada/ChangeLog:
* sem_prag.adb (Process_Compile_Time_Warning_Or_Error): Turn
Compile_Time pragmas into null nodes
Recompute TYPE_MODE and DECL_MODE for vector_type for accelerator.
gcc/ChangeLog:
PR ipa/96265
* lto-streamer-in.cc (lto_read_tree_1): Set TYPE_MODE and DECL_MODE
for vector_type if offloading is enabled.
(lto_input_mode_table): Remove handling of vector modes.
* tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out
VOIDmode for vector_type if offloading is enabled.
(pack_ts_decl_common_value_fields): Likewise.
Xi Ruoyao [Thu, 11 Jul 2024 11:43:48 +0000 (19:43 +0800)]
LoongArch: Add support to annotate tablejump
This is per the request from the kernel developers. For generating the
ORC unwind info, the objtool program needs to analysis the control flow
of a .o file. If a jump table is used, objtool has to correlate the
jump instruction with the table.
On x86 (where objtool was initially developed) it's simple: a relocation
entry natrually correlates them because one single instruction is used
for table-based jump. But on an RISC machine objtool would have to
reconstruct the data flow if it must find out the correlation on its
own.
So, emit an additional section to store the correlation info as pairs of
addresses, each pair contains the address of a jump instruction (jr) and
the address of the jump table. This is very trivial to implement in
GCC.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch.opt.in
(mannotate-tablejump): New option.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.md (tablejump<mode>): Emit
additional correlation info between the jump instruction and the
jump table, if -mannotate-tablejump.
* doc/invoke.texi: Document -mannotate-tablejump.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/jump-table-annotate.c: New test.
Suggested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
There is a description in <https://github.com/riscv/riscv-isa-manual/blob/main/src/zawrs.adoc>:
"The instructions in the Zawrs extension are only useful in conjunction
with the LR instruction, which is provided by the Zalrsc component
of the A extension."
Tobias Burnus [Mon, 7 Oct 2024 21:57:42 +0000 (23:57 +0200)]
Move gfortran.dg/gomp/allocate-static.f90 to libgomp.fortran/
The testcase was turned into a 'dg-do run' check to check for the alignment,
but this only works in testsuite/gfortran.dg, causing link errors for
out-of-tree testing. The test was added in r15-4104-ga8caeaacf499d5.
gcc/testsuite/:
* gfortran.dg/gomp/allocate-static.f90: Move to libgomp/testsuite/.
libgomp/:
* testsuite/libgomp.fortran/allocate-static.f90: Moved from
gcc/testsuite/ as it is a dg-do run test; use real omp_lib_kinds
instead of local definition
Jakub Jelinek [Mon, 7 Oct 2024 19:25:22 +0000 (21:25 +0200)]
libcpp: Use constexpr for _cpp_trigraph_map initialization for C++14
The _cpp_trigraph_map initialization used to be done for C99+ using
designated initializers, but can't be done that way for C++ because
the designated initializer support in C++ as array designators are just
an extension there and don't allow skipping anything nor going backwards.
But, we can get the same effect using C++14 constexpr constructor.
With the following patch we get rid of the runtime initialization
and the array can be in .rodata.
2024-10-07 Jakub Jelinek <jakub@redhat.com>
* internal.h (_cpp_trigraph_map_s): New type for C++14 or later.
(_cpp_trigraph_map_d): New variable for C++14 or later.
(_cpp_trigraph_map): Define to _cpp_trigraph_map_d.map for C++14 or
later.
* init.cc (init_trigraph_map): Define to nothing for C++14 or later.
(TRIGRAPH_MAP, END, s): Define differently for C++14 or later.
Thomas Koenig [Sat, 5 Oct 2024 12:17:49 +0000 (14:17 +0200)]
Implement MAXLOC and MINLOC for unsigned.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_minloc_maxloc): Handle BT_UNSIGNED.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Likewise.
* gfortran.texi: Document MAXLOC and MINLOC for UNSIGNED.
libgfortran/ChangeLog:
* Makefile.am: Add files for unsigned MINLOC and MAXLOC.
* Makefile.in: Regenerated.
* gfortran.map: Add files for unsigned MINLOC and MAXLOC.
* generated/maxloc0_16_m1.c: New file.
* generated/maxloc0_16_m16.c: New file.
* generated/maxloc0_16_m2.c: New file.
* generated/maxloc0_16_m4.c: New file.
* generated/maxloc0_16_m8.c: New file.
* generated/maxloc0_4_m1.c: New file.
* generated/maxloc0_4_m16.c: New file.
* generated/maxloc0_4_m2.c: New file.
* generated/maxloc0_4_m4.c: New file.
* generated/maxloc0_4_m8.c: New file.
* generated/maxloc0_8_m1.c: New file.
* generated/maxloc0_8_m16.c: New file.
* generated/maxloc0_8_m2.c: New file.
* generated/maxloc0_8_m4.c: New file.
* generated/maxloc0_8_m8.c: New file.
* generated/maxloc1_16_m1.c: New file.
* generated/maxloc1_16_m2.c: New file.
* generated/maxloc1_16_m4.c: New file.
* generated/maxloc1_16_m8.c: New file.
* generated/maxloc1_4_m1.c: New file.
* generated/maxloc1_4_m16.c: New file.
* generated/maxloc1_4_m2.c: New file.
* generated/maxloc1_4_m4.c: New file.
* generated/maxloc1_4_m8.c: New file.
* generated/maxloc1_8_m1.c: New file.
* generated/maxloc1_8_m16.c: New file.
* generated/maxloc1_8_m2.c: New file.
* generated/maxloc1_8_m4.c: New file.
* generated/maxloc1_8_m8.c: New file.
* generated/minloc0_16_m1.c: New file.
* generated/minloc0_16_m16.c: New file.
* generated/minloc0_16_m2.c: New file.
* generated/minloc0_16_m4.c: New file.
* generated/minloc0_16_m8.c: New file.
* generated/minloc0_4_m1.c: New file.
* generated/minloc0_4_m16.c: New file.
* generated/minloc0_4_m2.c: New file.
* generated/minloc0_4_m4.c: New file.
* generated/minloc0_4_m8.c: New file.
* generated/minloc0_8_m1.c: New file.
* generated/minloc0_8_m16.c: New file.
* generated/minloc0_8_m2.c: New file.
* generated/minloc0_8_m4.c: New file.
* generated/minloc0_8_m8.c: New file.
* generated/minloc1_16_m1.c: New file.
* generated/minloc1_16_m16.c: New file.
* generated/minloc1_16_m2.c: New file.
* generated/minloc1_16_m4.c: New file.
* generated/minloc1_16_m8.c: New file.
* generated/minloc1_4_m1.c: New file.
* generated/minloc1_4_m16.c: New file.
* generated/minloc1_4_m2.c: New file.
* generated/minloc1_4_m4.c: New file.
* generated/minloc1_4_m8.c: New file.
* generated/minloc1_8_m1.c: New file.
* generated/minloc1_8_m16.c: New file.
* generated/minloc1_8_m2.c: New file.
* generated/minloc1_8_m4.c: New file.
* generated/minloc1_8_m8.c: New file.
Jeff Law [Mon, 7 Oct 2024 17:49:21 +0000 (11:49 -0600)]
[RISC-V] Add splitters to restore condops generation after recent phiopt changes
V2:
Fix typo in ChangeLog.
Remove now extraneous comment in cset-sext.c.
Throttle back branch cost to 1 in various tests
--
Andrew P's recent improvements to phiopt regressed on the riscv testsuite.
Essentially the new code presented to the RTL optimizers is straightline code rather than branchy for the CE pass to analyze and optimize. In the absence of conditional move support or sfb, the new code would be better.
Unfortunately the presented form isn't a great fit for xventanacondops, zicond or xtheadcondmov. The net is the resulting code is actually slightly worse than before. Essentially sne+czero turned into sne+sne+and.
As the RHS of a set. We can use a 3->2 splitter to guide combine on how to profitably rewrite the sequence in a form suitable for condops. Just splitting that would be enough to fix the regression, but I'm fairly confident that other cases need to be handled and would have regressed had the testsuite been more thorough.
One arm of the AND is going to turn into an sCC instruction. We have a variety of those that we support. The codes vary as do the allowed operands of the sCC. That produces a set of new splitters to handle those cases.
The other arm is going to turn into a czero (or similar) instruction. That one can be generalized to eq/ne. So another set for that generalization.
We can remove a couple of XFAILs in the rv32 space as it's behaving much more like rv64 at this point.
For SFB targets it's unclear if the new code is better or worse. In both cases it's a 3 instruction sequence. So I just adjusted the test. If the new code is worse for SFB, someone with an understanding of the tradeoffs for an SFB target will need to make adjustments.
Tested in my tester on rv64gcv and rv32gc. Will wait for the pre-commit testers to render their verdict before moving forward.
gcc/
* config/riscv/iterators.md (scc_0): New code iterator.
* config/riscv/zicond.md: New splitters to improve code generated for
cases like (and (scc) (scc)) for zicond, xventanacondops, xtheadcondmov.
gcc/testsuite/
* gcc.target/riscv/cset-sext-sfb.c: Turn off ssa-phiopt.
* gcc.target/riscv/cset-sext-thead.c: Do not check CE output anymore.
* gcc.target/riscv/cset-sext-ventana.c: Similarly. Adjust branch cost.
* gcc.target/riscv/cset-sext-zicond.c: Similarly.
* gcc.target/riscv/cset-sext.c: Similarly. No longer allow
"neg" in asm output.
When handling the counted_by attribute, if the corresponding field
doesn't exit, in additiion to issue error, we should also remove
the already added non-existing "counted_by" attribute from the
field_decl.
PR c/116735
gcc/c/ChangeLog:
* c-decl.cc (verify_counted_by_attribute): Remove the attribute
when error.
Jason Merrill [Sat, 5 Oct 2024 02:23:04 +0000 (22:23 -0400)]
c++: -Wmismatched-tags and modules
In Wmismatched-tags-6.C, we try to compare two declarations of the Cp alias
template, and ICE trying to check whether they're in module purview. We
need to check DECL_LANG_SPECIFIC like elsewhere in the compiler.
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Only check PURVIEW_P if
DECL_LANG_SPECIFIC.
Jason Merrill [Fri, 4 Oct 2024 14:33:16 +0000 (10:33 -0400)]
c++: modules don't require preprocessor output
init_modules has rejected -M -fmodules-ts on the premise that module
dependency analysis requires macro expansion, but this is no longer
accurate; P1857 prohibited module directives produced by macro expansion.
They can still be dependent on #if directives, but those are still handled
with -fdirectives-only.
What wasn't working was -M or -dM, because cpp_scan_nooutput never called
module_token_pre to implement the import. The simplest fix is to use the
-fdirectives-only scan when modules are enabled and teach directives_only_cb
about flag_no_output.
gcc/cp/ChangeLog:
* module.cc (init_modules): Don't warn about -M.
gcc/c-family/ChangeLog:
* c-ppoutput.cc (preprocess_file): For modules,
use directives-only scan even with flag_no_output.
(directives_only_cb): Respect flag_no_output.
gcc/ChangeLog:
* doc/invoke.texi (C++ Module Preprocessing): Allow -M,
refer to -fdeps.
gcc/testsuite/ChangeLog:
* g++.dg/modules/macro-8_a.H: New test.
* g++.dg/modules/macro-8_b.C: New test.
* g++.dg/modules/macro-8_c.C: New test.
* g++.dg/modules/macro-8_d.C: New test.
Jakub Jelinek [Mon, 7 Oct 2024 12:30:21 +0000 (14:30 +0200)]
gcc: Remove executable permissions of testcases and *.md files
I've noticed some files were marked as executable, as can be
seen with
find . \( -name \*.[chSC] -o -name \*.md -o -name \*.cc \) -a -perm /111 | xargs ls -l
middle-end: reorder masking priority of math functions
Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.
This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.
As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code. If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.
We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.
gcc/ChangeLog:
* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
function declaration before IFN fallback.
vect: Add more dump messages for VLA SLP permutation [PR116583]
Taking the !repeating_p route for VLA vectors causes analysis
to fail, but it wasn't clear from the dump files when this
had happened, and which node caused it.
gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Add more
dump messages.
vect: Support more VLA SLP permutations [PR116583]
This is the main patch for PR116583. Previously, we only
supported VLA SLP permutations for which the output and inputs
have the same number of lanes, and for which that number of
lanes divides the number of vector elements.
The patch extends this to handle:
(1) "packs" of a single 2N-vector input into an N-vector output
(2) "unpacks" of N-vector inputs into an XN-vector output
Hopefully the comments in the code explain the approach.
The contents of the:
for (unsigned i = 0; i < ncopies; ++i)
loop do not change; the patch simply adds an outer loop around it.
The patch removes the XFAIL in slp-13.c and also improves
the SVE vect.exp results with vect-force-slp=1. I haven't
added new tests specifically for this, since presumably the
existing ones will cover it once the SLP switch is flipped.
vect: Restructure repeating_p case for SLP permutations [PR116583]
The repeating_p case previously handled the specific situation
in which the inputs have N lanes and the output has N lanes,
where N divides the number of vector elements. In that case,
every output uses the same permute vector.
The code was therefore structured so that the outer loop only
constructed one permute vector, with an inner loop generating
as many VEC_PERM_EXPRs from it as required.
However, the main patch for PR116583 adds support for cycling
through N permute vectors, rather than just having one.
The current structure doesn't really handle that case well.
(We'd need to interleave the results after generating them,
which sounds a bit fragile.)
This patch instead makes the transform phase calculate each output
vector's permutation explicitly, like for the !repeating_p path.
As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS.
This arguably undermines one of the justifications for using repeating_p
for constant-length vectors: that the repeating_p path involved less
work than the !repeating_p path. That justification does still hold for
the analysis phase, though, and that should be the more time-sensitive
part. And the other justification -- to get more coverage of the code --
still applies. So I'd prefer that we continue to use repeating_p for
constant-length vectors unless that causes a known missed optimisation.
gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
the noutputs_per_mask inner loop and instead generate a
separate permute vector for each output.
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC
patch triggered an unrecognisable insn ICE for the svbfloat16_t tests.
This was because the implementation of general two-vector permutes
requires two TBLs and an ORR, with the ORR being represented as an
unspec for floating-point modes. The associated pattern did not
cover VNx8BF.
gcc/
* config/aarch64/iterators.md (SVE_I): Move further up file.
(SVE_F): New mode iterator.
(SVE_ALL): Redefine in terms of SVE_I and SVE_F.
* config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend
to all SVE_F.
gcc/testsuite/
* gcc.target/aarch64/sve/permute_5.c: New test.
aarch64: Handle SVE modes in aarch64_evpc_reencode [PR116583]
For Advanced SIMD modes, aarch64_evpc_reencode tests whether
a permute in a narrow element mode can be done more cheaply
in a wider mode. For example, { 0, 1, 8, 9, 4, 5, 12, 13 }
on V8HI is a natural TRN1 on V4SI ({ 0, 4, 2, 6 }).
This patch extends the code to handle SVE data and predicate
modes as well. This is a prerequisite to getting good results
for PR116583.
gcc/
PR target/116583
* config/aarch64/aarch64.cc (aarch64_coalesce_units): New function,
extending the Advanced SIMD handling from...
(aarch64_evpc_reencode): ...here to SVE data and predicate modes.
Before running a test with specific torture options, gcc-dg-runtest
sets the global variable torture_current_flags to the set of torture
options that will be used. However, it never unset the variable
afterwards, which meant that the last options would hang around
and potentially confuse later non-torture tests.
I saw this with a follow-on patch to check-function-bodies, but it's
probably possible to construct aritificial test combinations that
expose it with check-function-bodies's existing flag filtering.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-runtest): Unset torture_current_flags
after each test.
Richard Biener [Mon, 7 Oct 2024 09:05:17 +0000 (11:05 +0200)]
tree-optimization/116982 - analyze scalar loop exit early
The following makes sure to discover the scalar loop IV exit during
analysis as failure to do so (if DCE and friends are disabled this
can happen due to if-conversion doing DCE and FRE on the if-converted
loop) would ICE later.
I refrained from larger refactoring to be able to eventually backport.
PR tree-optimization/116982
* tree-vectorizer.h (vect_analyze_loop): Pass in .LOOP_VECTORIZED
call.
(vect_analyze_loop_form): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_form): Reject loops where we
cannot determine a IV exit for the scalar loop.
(vect_analyze_loop): Adjust.
* tree-vectorizer.cc (try_vectorize_loop_1): Likewise.
* tree-parloops.cc (gather_scalar_reductions): Likewise.
which passes on most targets. However, on PowerPC, the loop in main
gets unrolled too, causing the scan-ltrans-rtl-dump-times check to fail
as the statement now appears twice in the dump. I think the extra
unrolling is due to different unrolling heuristics in the rs6000 port.
This patch therefore explicitly tries to block the unrolling in main with an
appropriate #pragma.
gcc/testsuite/ChangeLog:
PR testsuite/116683
* g++.dg/ext/pragma-unroll-lambda-lto.C (main): Add #pragma to
prevent unrolling of the setup loop.
The PR notes that we don't emit optimal code for C++ spaceship
operator if the result is returned as an integer rather than the
result just being compared against different values and different
code executed based on that.
So e.g. for
template <typename T>
auto foo (T x, T y) { return x <=> y; }
for both floating point types, signed integer types and unsigned integer
types. auto in that case is std::strong_ordering or std::partial_ordering,
which are fancy C++ abstractions around struct with signed char member
which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
ordering (but for -ffast-math 2 is never the case).
I'm afraid functions like that are fairly common and unless they are
inlined, we really need to map the comparison to those -1, 0, 1 or
-1, 0, 1, 2 values.
Now, for floating point spaceship I've in the past already added an
optimization (with tree-ssa-math-opts.cc discovery and named optab, the
optab only defined on x86 though right now), which ensures there is just
a single comparison instruction and then just tests based on flags.
Now, if we have code like:
auto a = x <=> y;
if (a == std::partial_ordering::less)
bar ();
else if (a == std::partial_ordering::greater)
baz ();
else if (a == std::partial_ordering::equivalent)
qux ();
else if (a == std::partial_ordering::unordered)
corge ();
etc., that results in decent code generation, the spaceship named pattern
on x86 optimizes for the jumps, so emits comparisons on the flags, followed
by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
well. But if the result needs to be stored into an integer and just
returned that way or there are no immediate jumps based on it (or turned
into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
doesn't do a good job for that, we end up with say
comiss %xmm1, %xmm0
jp .L4
seta %al
movl $0, %edx
leal -1(%rax,%rax), %eax
cmove %edx, %eax
ret
.L4:
movl $2, %eax
ret
The jp is good, that is the unlikely case and can't be easily handled in
straight line code due to the layout of the flags, but the rest uses cmov
which often isn't a win and a weird math.
With the patch below we can get instead
xorl %eax, %eax
comiss %xmm1, %xmm0
jp .L2
seta %al
sbbl $0, %eax
ret
.L2:
movl $2, %eax
ret
The patch changes the discovery in the generic code, by detecting if
the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
-1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
a new argument to .SPACESHIP ifn, so that the named pattern is told whether
it should optimize for branches or for loading the result into a -1, 0, 1
(, 2) integer. Additionally, it doesn't detect just floating point <=>
anymore, but also integer and unsigned integer, but in those cases only
if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
result in good code).
The backend then can for those integer or unsigned integer <=>s return
effectively (x > y) - (x < y) in a way that is efficient on the target
(so for x86 with ensuring zero initialization first when needed before
setcc; one for floating point and unsigned, where there is just one setcc
and the second one optimized into sbb instruction, two for the signed int
case). So e.g. for signed int we now emit
xorl %edx, %edx
xorl %eax, %eax
cmpl %esi, %edi
setl %dl
setg %al
subl %edx, %eax
ret
and for unsigned
xorl %eax, %eax
cmpl %esi, %edi
seta %al
sbbb $0, %al
ret
Note, I wonder if other targets wouldn't benefit from defining the
named optab too...
2024-10-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/116896
* optabs.def (spaceship_optab): Use spaceship$a4 rather than
spaceship$a3.
* internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
rather than 2, expand the last one, expect 4 operands of
spaceship_optab.
* tree-ssa-math-opts.cc: Include cfghooks.h.
(optimize_spaceship): Check if a single PHI is initialized to
-1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
argument to .SPACESHIP and optimize away the comparisons,
otherwise pass 0. Also check for integer comparisons rather than
floating point, in that case do it only if there is a single PHI
with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
if the <=> is signed, 2 if unsigned.
* config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add
another rtx argument.
(ix86_expand_int_spaceship): Declare.
* config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add
arg3 argument, if it is const0_rtx, expand like before, otherwise
emit optimized sequence for setting the result into a GPR.
(ix86_expand_int_spaceship): New function.
* config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code.
(setcc_si_slp): New define_expand.
(*setcc_si_slp): New define_insn_and_split.
(setcc + setcc + movzbl): New define_peephole2.
(spaceship<mode>3): Renamed to ...
(spaceship<mode>4): ... this. Add an extra operand, pass it
to ix86_expand_fp_spaceship.
(spaceshipxf3): Renamed to ...
(spaceshipxf4): ... this. Add an extra operand, pass it
to ix86_expand_fp_spaceship.
(spaceship<mode>4): New define_expand for SWI modes.
* doc/md.texi (spaceship@var{m}3): Renamed to ...
(spaceship@var{m}4): ... this. Document the meaning of last
operand.
* g++.target/i386/pr116896-1.C: New test.
* g++.target/i386/pr116896-2.C: New test.
Tobias Burnus [Mon, 7 Oct 2024 08:45:14 +0000 (10:45 +0200)]
OpenMP: Allocate directive for static vars, clean up
For the 'allocate' directive, remove the sorry for static variables and
just keep using normal memory, but honor the requested alignment and set
a DECL_ATTRIBUTE in case a target may want to make use of this later on.
The documentation is updated accordingly.
The C diagnostic to check for predefined allocators (req. for static vars)
failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was
already okay; but both now use new common #defined value for checking.)
And while Fortran common block variables are still rejected, the check
has been improved as before the sorry diagnostic did not work for
common blocks in modules.
Finally, for 'allocate' clause on the target/task/taskloop directives,
there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator
with access = thread), which is undefined behavior according to the
OpenMP specification.
And, last, testing showed that var decl + static_assert sets TREE_USED
but does not produce a statement list in C, which did run into an assert
in gimplify. This special case is now also handled.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Set alignment for alignof;
accept static variables and fix predef allocator check.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Use gomp-constants.h consts.
* trans-common.cc (translate_common): Reject OpenMP allocate directives.
* trans-decl.cc (gfc_finish_var_decl): Handle allocate directive
for static variables.
(gfc_trans_deferred_vars): Update for the latter.
gcc/ChangeLog:
* gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP
allocate directive.
(gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used
as allocator with the target/task/taskloop directive.
include/ChangeLog:
* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX,
GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX,
GOMP_OMP_PREDEF_ALLOC_THREADS): New defines.
libgomp/ChangeLog:
* allocator.c: Add static asserts for news
GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values.
* libgomp.texi (OpenMP Impl. Status): Allocate directive for
static vars is now supported. Refer to PR for allocate clause.
(Memory allocation): Update for static vars; minor word tweaking.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-9.c: Update for removed sorry.
* gfortran.dg/gomp/allocate-15.f90: Likewise.
* gfortran.dg/gomp/allocate-pinned-1.f90: Likewise.
* gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for
previously missing diagnostic.
* c-c++-common/gomp/allocate-18.c: New test.
* c-c++-common/gomp/allocate-19.c: New test.
* gfortran.dg/gomp/allocate-clause.f90: New test.
* gfortran.dg/gomp/allocate-static-2.f90: New test.
* gfortran.dg/gomp/allocate-static.f90: New test.
Thomas Schwinge [Thu, 3 Oct 2024 10:52:30 +0000 (12:52 +0200)]
Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', GCN
As of commit d34cda720988674bcf8a24267c9e1ec61335d6de
"Handle non-grouped stores as single-lane SLP", we see for
'--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'):
PASS: gcc.dg/vect/slp-26.c (test for excess errors)
PASS: gcc.dg/vect/slp-26.c execution test
PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
gcc.dg/vect/slp-26.c: pattern found 2 times
Apply the same change to 'amdgcn-*-*' as done for 'riscv_v'.
Thomas Schwinge [Mon, 28 Nov 2022 12:49:06 +0000 (13:49 +0100)]
nvptx: Disable effective-target 'freestanding'
After 2014's commit 157e859ffe3b5d43db1e19475711c1a3d21ab57a "remove picochip",
the effective-target 'freestanding' (later) was only ever used for nvptx.
However, the relevant I/O library functions have long been implemented in nvptx
newlib.
These test cases generally PASS, just a few need to get XFAILed; see
<https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/#system-calls>,
and then supposedly
<https://docs.nvidia.com/cuda/cuda-c-programming-guide/#formatted-output> for
description of the non-standard PTX 'vprintf' return value:
> Unlike the C-standard 'printf()', which returns the number of characters
> printed, CUDA's 'printf()' returns the number of arguments parsed. If no
> arguments follow the format string, 0 is returned. If the format string is
> NULL, -1 is returned. If an internal error occurs, -2 is returned.
(I've tried a few variants to confirm that PTX 'vprintf' -- which supposedly is
underlying the CUDA 'printf' -- is what's implementing this behavior.)
Probably, we ought to fix that up in nvptx newlib.
hppa: Use stack slot SP-40 to copy between integer and floating-point registers
2024-10-06 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa-64.h (PA_SECONDARY_MEMORY_NEEDED): Define
to false. Update comment.
* config/pa/pa.md: Modify 64-bit move patterns to support
copying between integer and floating-point registers using
stack slot SP-40.
Added by P2985R0 for C++26. This simply exposes the compiler
builtin, and adds the feature-testing macro.
libstdc++-v3/ChangeLog:
* include/bits/version.def: Added the feature-testing macro.
* include/bits/version.h: Regenerated.
* include/std/type_traits: Add support for
std::is_virtual_base_of and std::is_virtual_base_of_v,
implemented in terms of the compiler builtin.
* testsuite/20_util/is_virtual_base_of/value.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Eric Botcazou [Sat, 5 Oct 2024 12:39:14 +0000 (14:39 +0200)]
Fix various issues of -ftrivial-auto-var-init=zero with Ada
This polishes a few rough edges that prevent -ftrivial-auto-var-init=zero
from working in Ada:
- build_common_builtin_nodes declares BUILT_IN_CLEAR_PADDING with 3
instead 2 parameters, now gimple_fold_builtin_clear_padding contains
the assertion:
gcc_assert (gimple_call_num_args (stmt) == 2)
This causes gimple_builtin_call_types_compatible_p to always return false
in Ada (this works in C/C++ because another declaration is used).
- gimple_add_init_for_auto_var uses EXPR_LOCATION to fetch the location
of a DECL node, which always returns UNKNOWN_LOCATION.
- the machinery attempts to initialize Out parameters.
gcc/
PR middle-end/116933
* gimplify.cc (gimple_add_init_for_auto_var): Use the correct macro
to fetch the source location of the variable.
* tree.cc (common_builtin_nodes): Remove the 3rd parameter in the
type of BUILT_IN_CLEAR_PADDING.
gcc/ada/
PR middle-end/116933
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Out_Parameter>: Add
the "uninitialized" attribute on Out parameters.
* gcc-interface/utils.cc (gnat_internal_attributes): Add entry for
the "uninitialized" attribute.
(handle_uninitialized_attribute): New function.
gcc/testsuite/
* gnat.dg/auto_var_init.adb: New test.
Richard Biener [Fri, 4 Oct 2024 09:13:58 +0000 (11:13 +0200)]
Improve load permutation lowering
The following makes sure the emitted even/odd extraction scheme
follows one that ends up with actual trivial even/odd extract permutes.
When we choose a level 2 extract we generate { 0, 1, 4, 5, ... }
which for example the x86 backend doesn't recognize with just SSE
and QImode elements. So this now follows what the non-SLP interleaving
code would do which is element granular even/odd extracts.
This resolves gcc.dg/vect/vect-strided[-a]-u8-i8-gap*.c FAILs with
--param vect-force-slp=1 on x86_64.
David Malcolm [Fri, 4 Oct 2024 22:31:17 +0000 (18:31 -0400)]
diagnostics: bulletproof opening of SARIF output [PR116978]
Introduce a new RAII class diagnostic_output_file to track ownership
of the FILE * for SARIF output.
In particular, the .sarif file is now opened immediately, rather
than at the end of the compile, and so will fail earlier if the
file can't be opened.
Doing so fixes a couple of ICEs in -fdiagnostics-format=sarif-file when
invoking, say, cc1 directly, rather than from the driver.
gcc/ChangeLog:
PR other/116978
* diagnostic-format-sarif.cc (sarif_builder::sarif_builder):
Gracefully handle "main_input_filename_" being NULL.
(sarif_output_format::sarif_output_format): Replace param
"base_file_name" with "output_file" and assert that the file
was opened successfully and has a non-NULL filename.
(sarif_output_format::~sarif_file_output_format): Move
responsibility for building the filename and opening the file from
here to the creator of the instance.
(sarif_output_format::m_base_file_name): Replace with...
(sarif_output_format::m_output_file): ...this.
(diagnostic_output_format_init_sarif_file): Make "line_maps" param
non-const. Gracefully handle "base_file_name" being NULL.
Construct the filename and open the file here, rather than in
~sarif_file_output_format, and handle failures immediately here,
rather than at the end of the compile.
* diagnostic-format-sarif.h: Include "diagnostic-output-file.h".
(diagnostic_output_format_init_sarif_file): Make "line_maps" param
non-const.
* diagnostic-output-file.h: New file.
* diagnostic.cc (diagnostic_context::emit_diagnostic): New.
(diagnostic_context::emit_diagnostic_va): New.
* diagnostic.h (diagnostic_context::emit_diagnostic): New decl.
(diagnostic_context::emit_diagnostic_va): New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
I should have put new unspecs in SVE_COND_FP_MAXMIN but I put it in
SVE_COND_FP_BINARY_REG instead. That was incorrect because the
SVE_COND_FP_MAXMIN iterator is being used for predicated floating-point
maximum/minimum, not SVE_COND_FP_BINARY_REG.
Also added a testcase to validate the new change.
Regression tested on aarch64-unknown-linux-gnu and found no regressions.
There are some test cases with "libitm" in their directory names which
appear in compare_tests output as changed tests but it looks like they
are in the output just because of changed build directories, like from
build-patched/aarch64-unknown-linux-gnu/./libitm/* to
build-pristine/aarch64-unknown-linux-gnu/./libitm/*. I didn't think it
was a cause of concern and have pushed this for review.
gcc/ChangeLog:
PR target/116934
* config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and
UNSPEC_COND_SMIN to correct iterators.
gcc/testsuite/ChangeLog:
PR target/116934
* gcc.target/aarch64/sve2/pr116934.c: New test.
AVR: target/116953 - ICE due to operands clobber in avr_out_sbxx_branch.
PR target/116953
gcc/
* config/avr/avr.cc (avr_out_sbxx_branch): Work on a copy of
the operands rather than on operands itself, which is just
recog_data.operand and may be clobbered by jump_over_one_insn_p.
gcc/testsuite/
* gcc.target/avr/torture/pr116953.c: New test.
testsuite - Some float64 and float32x test require double64plus.
Some of the float64 and float32x test cases are using double built-ins
and hence require double64plus resp. that double is at least as good
as float32x (double_float32xplus).
testsuite: Fix fallout of turning warnings into errors on 32-bit Arm
Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
into a permerror") these tests fail with errors such as:
FAIL: gcc.target/arm/pr59858.c (test for excess errors)
FAIL: gcc.target/arm/pr65647.c (test for excess errors)
FAIL: gcc.target/arm/pr65710.c (test for excess errors)
FAIL: gcc.target/arm/pr97969.c (test for excess errors)
Here's one example of the excess errors:
FAIL: gcc.target/arm/pr65647.c (test for excess errors)
Excess errors:
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: initialization of 'int' from 'int *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: initialization of 'int' from 'struct S1 *' makes integer from pointer without a cast [-Wint-conversion]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: implicit declaration of function 'fn3'; did you mean 'fn2'? [-Wimplicit-function-declaration]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: implicit declaration of function 'fn5'; did you mean 'fn4'? [-Wimplicit-function-declaration]
/path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: implicit declaration of function 'fn6'; did you mean 'fn4'? [-Wimplicit-function-declaration]
PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
PR target/65647 and PR target/97969 test for a compilation infinite loop.
Therefore, add -fpermissive so that the tests behave as they did previously.
Tested on armv8l-linux-gnueabihf.
Tsung Chun Lin [Fri, 4 Oct 2024 14:02:07 +0000 (08:02 -0600)]
[PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in __riscv_save_[0-3] on ilp32e.
From 8b3c5ebe8aacbcc4ddf1be8dea9a555e7e1bcc39 Mon Sep 17 00:00:00 2001
From: Jim Lin <jim@andestech.com>
Date: Fri, 4 Oct 2024 14:48:12 +0800
Subject: [PATCH] RISC-V/libgcc: Fix incorrect .cfi_offset for saving ra in
__riscv_save_[0-3] on ilp32e.
libgcc/ChangeLog:
* config/riscv/save-restore.S: Fix .cfi_offset for saving ra in
__riscv_save_[0-3] on ilp32e.
Patrick Palka [Fri, 4 Oct 2024 14:01:39 +0000 (10:01 -0400)]
libstdc++/ranges: Implement various small LWG issues
This implements the following small LWG issues:
3848. adjacent_view, adjacent_transform_view and slide_view missing base accessor
3851. chunk_view::inner-iterator missing custom iter_move and iter_swap
3947. Unexpected constraints on adjacent_transform_view::base()
4001. iota_view should provide empty
4012. common_view::begin/end are missing the simple-view check
4013. lazy_split_view::outer-iterator::value_type should not provide default constructor
4035. single_view should provide empty
4053. Unary call to std::views::repeat does not decay the argument
4054. Repeating a repeat_view should repeat the view
libstdc++-v3/ChangeLog:
* include/std/ranges (single_view::empty): Define as per LWG 4035.
(iota_view::empty): Define as per LWG 4001.
(lazy_split_view::_OuterIter::value_type): Remove default
constructor and make other constructor private as per LWG 4013.
(common_view::begin): Disable non-const overload for simple
views as per LWG 4012.
(common_view::end): Likewise.
(adjacent_view::base): Define as per LWG 3848.
(adjacent_transform_view::base): Likewise.
(chunk_view::_InnerIter::iter_move): Define as per LWG 3851.
(chunk_view::_InnerIter::itep_swap): Likewise.
(slide_view::base): Define as per LWG 3848.
(repeat_view): Adjust deduction guide as per LWG 4053.
(_Repeat::operator()): Adjust single-parameter overload as per
LWG 4054.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Verify existence
of base member function.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: Likewise.
* testsuite/std/ranges/adaptors/chunk/1.cc: Test LWG 3851 example.
* testsuite/std/ranges/adaptors/slide/1.cc: Verify existence of
base member function.
* testsuite/std/ranges/iota/iota_view.cc: Test LWG 4001 example.
* testsuite/std/ranges/repeat/1.cc: Test LWG 4053/4054 examples.
Jakub Jelinek [Fri, 4 Oct 2024 13:24:24 +0000 (15:24 +0200)]
testsuite: Fix up unevalstr2.C test
The CWG2521 changes adjusted the unevalstr1.C test but didn't adjust
unevalstr2.C test, which now FAILs in C++23 mode.
The intent in both of those tests was to test the separate (now deprecated)
syntax, so instead of removing the space between closing " and _ I've
adjusted the testcase to expect those 17 extra warnings. And I've also
adjusted the unevalstr1.C testcase to do the same, when it is removed from
C++29 or whatever, that can be just guarded by #if.
But it is actually useful to also test the UDL variant without space between
closing " and _, so I've added new test coverage for that too to both tests.
2024-10-04 Jakub Jelinek <jakub@redhat.com>
* g++.dg/cpp26/unevalstr1.C: Revert the 2024-10-03 changes, instead
expect extra warnings. Add another set of tests without space
between " and _.
* g++.dg/cpp26/unevalstr2.C: Expect extra warnings for C++23. Add
another set of tests without space between " and _.
aarch64: Set Armv9-A generic L1 cache line size to 64 bytes
I'd like to use a value of 64 bytes for the L1 cache size for Armv9-A
generic tuning.
As described in g:9a99559a478111f7fbeec29bd78344df7651c707 this value is used
to set the std::hardware_destructive_interference_size value which we want to
be not overly large when running concurrent applications on large core-count
systems.
The generic value for Armv8-A systems and the port baseline is 256 bytes
because that's what the A64FX CPU has, as set de-facto in
aarch64_override_options_internal.
But for Armv9-A CPUs as far as I know there isn't anything larger
than 64 bytes, so we should be able to use the smaller value here and reduce
the size of concurrent structs that use
std::hardware_destructive_interference_size to pad their fields.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
* config/aarch64/tuning_models/generic_armv9_a.h
(generic_armv9a_prefetch_tune): Define.
(generic_armv9_a_tunings): Use the above.
Andre Vieira [Fri, 4 Oct 2024 12:43:46 +0000 (13:43 +0100)]
arm: Fix missed CE optimization for armv8.1-m.main [PR 116444]
This patch restores missed optimizations for armv8.1-m.main targets that were
missed when the generation of csinc, csinv and csneg were enabled for the same
with patch series containing:
[PATCH 2/5][Arm] New pattern for CSINV instructions
The original patch series makes use of the "noce" machinery to transform RTL
into patterns that later match the Armv8.1-M Mainline, by getting the target
hook TARGET_HAVE_CONDITIONAL_EXECUTION, to return FALSE for such targets prior
to reload_completed. The same machinery however was transforming other RTL
patterns which were later on causing the "ce" pass post reload_completed to no
longer optimize conditional execution opportunities, which was causing the
regression observed in PR target/116444, a regression of 'testsuite/gcc.target/arm/thumb-ifcvt-2.c'
when ran for an Armv8.1-M Mainline target.
This patch implements the target hook TARGET_NOCE_CONVERSION_PROFITABLE_P to
only allow "noce" to generate patterns that match CSINV, CSINC and CSNEG. Thus
ensuring that the early "ce" passes do not ruin things for later ones.
gcc/ChangeLog:
PR target/116444
* config/arm/arm-protos.h (arm_noce_conversion_profitable_p): New
declaration.
* config/arm/arm.cc (arm_is_v81m_cond_insn): New helper function used
in ...
(arm_noce_conversion_profitable_p): ... here. New function to implement
...
(TARGET_NOCE_PROFITABLE_P): ... this target hook. New define.