Georg-Johann Lay [Wed, 11 Dec 2024 12:28:47 +0000 (13:28 +0100)]
AVR: target/118001 - Add __flashx as 24-bit named address space.
This patch adds __flashx as a new named address space that allocates
objects in .progmemx.data. The handling is mostly the same or similar
to that of 24-bit space __memx, except that the asm routines are
simpler and more efficient. Loads are emit inline when ELPMX or
LPMX is available. The address space uses a 24-bit addresses even
on devices with a program memory size of 64 KiB or less.
PR target/118001
gcc/
* doc/extend.texi (AVR Named Address Spaces): Document __flashx.
* config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value.
* config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p)
(avr_fload_libgcc_p, avr_load_libgcc_mem_p)
(avr_load_libgcc_insn_p): New.
* config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX.
(avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p)
(avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload):
New functions.
(avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case.
(avr_progmem_p) [avr_decl_flashx_p]: return 2.
(avr_addr_space_legitimate_address_p) [ADDR_SPACE_FLASHX]:
Has same behavior like ADDR_SPACE_MEMX.
(avr_addr_space_convert): Use pointer sizes rather then ASes.
(avr_addr_space_contains): New function.
(avr_convert_to_type): Use it.
(avr_emit_cpymemhi): Handle ADDR_SPACE_FLASHX.
* config/avr/avr.md (adjust_len) <fload>: New attr value.
(gen_load<mode>_libgcc): Renamed from load<mode>_libgcc.
(xload8<mode>_A): Iterate over MOVMODE rather than over ALL1.
(fxmov<mode>_A): New from xloadv<mode>_A.
(xmov<mode>_8): New from xload<mode>_A.
(fmov<mode>): New insns.
(fxload<mode>_A): New from xload<mode>_A.
(fxload_<mode>_libgcc): New from xload_<mode>_libgcc.
(*fxload_<mode>_libgcc): New from *xload_<mode>_libgcc.
(mov<mode>) [avr_mem_flashx_p]: Hande ADDR_SPACE_FLASHX.
(cpymemx_<mode>): Make sure the address space is not lost
when splitting.
(*cpymemx_<mode>) [ADDR_SPACE_FLASHX]: Use __movmemf_<mode> for asm.
(*ashlqi.1.zextpsi_split): New combine pattern.
* config/avr/predicates.md (nox_general_operand): Don't match
when avr_mem_flashx_p is true.
* config/avr/avr-passes.cc (AVR_LdSt_Props):
ADDR_SPACE_FLASHX has no post_inc.
gcc/testsuite/
* gcc.target/avr/torture/addr-space-1.h [AVR_HAVE_ELPM]:
Use a function to bump .progmemx.data to a high address.
* gcc.target/avr/torture/addr-space-2.h: Same.
* gcc.target/avr/torture/addr-space-1-fx.c: New test.
* gcc.target/avr/torture/addr-space-2-fx.c: New test.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _fload_1, _fload_2,
_fload_3, _fload_4, _movmemf.
* config/avr/lib1funcs.S (.branch_plus): New .macro.
(__xload_1, __xload_2, __xload_3, __xload_4): When the address is
located in flash, then forward to...
(__fload_1, __fload_2, __fload_3, __fload_4): ...these new
functions, respectively.
(__movmemx_hi): When the address is located in flash, forward to...
(__movmemf_hi): ...this new function.
Martin Uecker [Sat, 23 Nov 2024 07:04:05 +0000 (08:04 +0100)]
Fix type compatibility for types with flexible array member 2/2 [PR113688,PR114713,PR117724]
For checking or computing TYPE_CANONICAL, ignore the array size when it is
the last element of a structure or union. To not get errors because of
an inconsistent number of members, zero-sized arrays which are the last
element are not ignored anymore when checking the fields of a struct.
testsuite: arm: Check that a far jump is used in thumb1-far-jump-2.c
With the changes in r15-1579-g792f97b44ff, the code used as "padding" in
the test case is optimized way. Prevent this optimization by forcing a
read of the volatile memory.
Also, validate that there is a far jump in the generated assembler.
Without this patch, the generated assembler is reduced to:
f3:
cmp r0, #0
beq .L1
ldr r4, .L6
.L1:
bx lr
.L7:
.align 2
.L6:
.word g_0_1
With the patch, the generated assembler is:
f3:
movs r2, #1
ldr r3, .L6
push {lr}
str r2, [r3]
cmp r0, #0
bne .LCB10
bl .L1 @far jump
.LCB10:
b .L7
.L8:
.align 2
.L6:
.word .LANCHOR0
.L7:
str r2, [r3]
...
str r2, [r3]
.L1:
pop {pc}
gcc/testsuite/ChangeLog:
* gcc.target/arm/thumb1-far-jump-2.c: Write to volatile memmory
in macro to avoid optimization.
testsuite: arm: Use effective-target for its.c test [PR94531]
The test case gcc.target/arm/its.c was created together with restriction
of IT blocks for Cortex-M7. As the test case fails on all tunes that
does not match Cortex-M7, explicitly test it for Cortex-M7. To have some
additional faith that GCC does the correct thing, I also added another
variant of the test for Cortex-M3 that should allow longer IT blocks.
gcc/testsuite/ChangeLog:
PR testsuite/94531
* gcc.target/arm/its.c: Removed.
* gcc.target/arm/its-1.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m7.
* gcc.target/arm/its-2.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m3.
Eric Botcazou [Thu, 21 Nov 2024 14:28:43 +0000 (15:28 +0100)]
ada: Elide the copy for bit-packed aggregates in object declarations
The in-place expansion has been historically disabled for them, but there
does not seem to be any good reason left for this. However, this requires
a small trick in order for the expanded code not to be flagged as using the
object uninitialized by the code generator.
gcc/ada/ChangeLog:
* exp_aggr.adb (Convert_Aggr_In_Object_Decl): Clear the component
referenced on the right-hand side of the first assignment generated
for a bit-packed array, if any.
(Expand_Array_Aggregate): Do not exclude aggregates of bit-packed
array types in object declarations from in-place expansion.
* sem_eval.adb (Eval_Indexed_Component): Do not attempt a constant
evaluation for a bit-packed array type.
A recently fixed bug caused an infinite loop when assertions were not
checked. With assertions checked, the symptom was just an internal
error caused by an assertion failure. This patch makes it so that if
another bug ever causes the same condition to fail, there will never be
an infinite loop with any assertion policy.
gcc/ada/ChangeLog:
* sem_ch3.adb (Access_Subprogram_Declaration): Replace assertion with
more defensive code.
GNAT implements a format with trailing '*' signs for the Image attribute
of NaN, +inf and -inf. It was probably always intended to be the same
length as the image of 1.0, but one '*' was actually missing. This patch
fixes this.
gcc/ada/ChangeLog:
* libgnat/s-imager.adb (Image_Floating_Point): Tweak display of
invalid floating point values.
Access parameters are not allowed in specifications of task entries.
Before this patch, the compiler failed to detect that case in accept
statements that were not directly in their task body's scopes. This
patch fixes this issue.
gcc/ada/ChangeLog:
* sem_ch3.adb (Access_Definition): Remove test for task entry context.
* sem_ch6.adb (Process_Formals): Add improved test for task entry
context.
Piotr Trojanek [Wed, 20 Nov 2024 15:22:05 +0000 (16:22 +0100)]
ada: Fix internal error on loop parameter specifications
Originally loop parameter specification only occurred in loops, but now
it also occurs in quantified expressions. This patch guards against
flagging non-loop nodes as null loop statements. This was causing
internal compiler errors that were only visible with switch -gnatdk,
which happens to be default in GNATprove testsuite.
gcc/ada/ChangeLog:
* sem_ch5.adb (Analyze_Loop_Parameter_Specification): Only set
flag Is_Null_Loop when loop parameter specification comes from
a loop and not from a quantified expression.
ada: Accept static strings with External_Initialization
Before this patch, the argument to the External_Initialization aspect
had to be a string literal. This patch extends the possibilities so that
any static string is accepted.
A new helper function, Is_OK_Static_Expression_Of_Type, is introduced,
and in addition to the main change of this patch a couple of calls to
that helper function are added in other places to replace equivalent
inline code.
gcc/ada/ChangeLog:
* sem_eval.ads (Is_OK_Static_Expression_Of_Type): New function.
* sem_eval.adb (Is_OK_Static_Expression_Of_Type): Likewise.
* sem_ch13.adb (Check_Expr_Is_OK_Static_Expression): Use new function.
* sem_prag.adb (Check_Expr_Is_OK_Static_Expression): Likewise.
* sem_ch3.adb (Apply_External_Initialization): Accept static strings
for the parameter.
The clauses in section 3.5 of the reference manual were moved around
along the different Ada versions, which caused some comments in our
source code to go out of date. This patch updates the references in
those comments.
Eric Botcazou [Tue, 19 Nov 2024 08:19:22 +0000 (09:19 +0100)]
ada: Minor refactoring in expansion of array aggregates
This just moves a couple of checks done in conjunction with the predicate
Aggr_Assignment_OK_For_Backend into its body and adds a couple of comments.
No functional changes.
gcc/ada/ChangeLog:
* exp_aggr.adb (Aggr_Assignment_OK_For_Backend): Add Target formal
parameter and check that it is not a bit-aligned component or slice.
Return False in CodePeer mode as well.
(Build_Array_Aggr_Code): Remove redundant tests done in conjunction
with a call to Aggr_Assignment_OK_For_Backend.
(Expand_Array_Aggregate): Likewise. Add a couple of comments and
improve formatting.
Before this patch, the machinery to generate validity checks got
confused in some situations involving private views of types, and ended
up generating incorrect conversions from floating point types to integer
types. This patch fixes this.
gcc/ada/ChangeLog:
* exp_attr.adb (Expand_N_Attribute_Reference): Fix computation of type
category.
Eric Botcazou [Sun, 17 Nov 2024 19:00:42 +0000 (20:00 +0100)]
ada: Add minimal support for other delayed aspects on controlled objects
This extends the processing done for the Address aspect to other delayed
aspects. The External_Name aspect is also reclassified as a representation
aspect and the three representation aspects External_Name, Link_Name and
Linker_Section are moved from the Always_Delay to the Rep_Aspect category,
which makes it possible not to delay them in most cases with a small tweak.
gcc/ada/ChangeLog:
* aspects.ads (Is_Representation_Aspect): True for External_Name.
(Aspect_Delay): Use Rep_Aspect for External_Name, Link_Name and
Linker_Section.
* einfo.ads (Initialization_Statements): Document extended usage.
* exp_util.adb (Needs_Initialization_Statements): Return True for
all delayed aspects.
* freeze.adb (Check_Address_Clause): Do not move the initialization
expression here...
(Freeze_Object_Declaration): ...but here instead, as well as for all
delayed aspects. Remove test for pragma Linker_Section.
* sem_ch13.adb (Analyze_One_Aspect): Do not delay in the Rep_Aspect
case if the expression is a string literal.
Bob Duff [Fri, 15 Nov 2024 20:18:46 +0000 (15:18 -0500)]
ada: Crash on assignment of task allocator with expanded name
The compiler crashes on an assignment statement of the form
"X.Y := new T;", where X.Y is an expanded name (i.e. not a record
component or similar) and T is a type containing tasks.
gcc/ada/ChangeLog:
* exp_util.adb (Build_Task_Image_Decls):
Deal properly with the case of an expanded name.
Minor cleanup: use a case statement instead of if/elsif chain.
Eric Botcazou [Sun, 17 Nov 2024 19:26:53 +0000 (20:26 +0100)]
ada: Lift technical limitation in expansion of record aggregates
The mechanim deferring the expansion of record aggregates nested in other
aggregates with intermediate conditional expressions is disabled in the
case where they contain self-references, because of a technical limitation
in the replacements done by Build_Record_Aggr_Code. This change lifts it.
gcc/ada/ChangeLog:
* exp_aggr.adb (Traverse_Proc_For_Aggregate): New generic procedure.
(Replace_Discriminants): Instantiate it instead of Traverse_Proc.
(Replace_Self_Reference): Likewise.
(Convert_To_Assignments): Remove limitation for nested aggregates
that contain self-references.
Eric Botcazou [Fri, 15 Nov 2024 20:29:18 +0000 (21:29 +0100)]
ada: Small improvements to expansion of conditional expressions
They comprise using a nonnull accesss type for the indirect expansion to
avoid useless checks, smplifying the expansion of if expressions whose
condition is known at compile time to avoid an N_Expression_With_Actions,
using the indirect expansion for them in the indefinite case too, which
makes the special case for an unconstrained array type obsolete.
No functional changes.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_N_Case_Expression): Remove obsolete comment
about C code generation. Do not create a useless target type if
the parent statement is rewritten instead of the expression. Use
a nonnull accesss type for the expansion done for composite types.
(Expand_N_If_Expression): Simplify the expansion when the condition
is known at compile time. Apply the expansion done for by-reference
types to indefinite types and remove the obsolete special case for
unconstrained array types Use a nonnull access type in this case.
Rename New_If local variable to If_Stmt for the sake of consistency.
Eric Botcazou [Fri, 15 Nov 2024 17:40:02 +0000 (18:40 +0100)]
ada: Fix wrong finalization with private unconstrained array type
The address passed to the routine attaching a controlled object to the
finalization master must be that of its dope vector for an object whose
nominal subtype is an unconstrained array type, but this is not the case
when this subtype has a private declaration.
gcc/ada/ChangeLog:
* exp_ch7.adb (Make_Address_For_Finalize): Look at the underlying
subtype to detect the unconstrained array type case.
* sprint.adb (Write_Itype) <E_Private_Subtype>: New case.
This patch slightly widens the set of filenames that the compiler
considers predefined. That makes it possible to build the GNAT runtime
using only the file mapping facilities of the compiler, without having
to rename files.
Before this patch, External_Initialization looked for files in all
directories of the source search path, which led to inconsistencies in
some cases. This patch restricts the file lookup so the argument is
interpreted as relative to the current source file's directory only.
Eric Botcazou [Thu, 14 Nov 2024 19:33:34 +0000 (20:33 +0100)]
ada: Clean up and restrict usage of Initialization_Statements
This mechanism is the only producer of N_Compound_Statement in the expanded
code and parks the statements generated for the in-place initialization of
objects by an aggregate, so that they can be moved to the freeze point if
there is an address aspect/clause, or even cancelled if the aggregate has
been generated for Initialize_Scalars/Normalize_Scalars before a subsequent
pragma Import for the object is encountered.
The main condition for its triggering is that the object be not yet frozen,
but that's always the case when its declaration is being processed, so the
mechanism is triggered unnecessarily and the change restricts this but, on
the other hand, it also extends its usage to the in-place initialization by
a function call, which was implemented by means of a custom deferral.
There should be no functional changes.
gcc/ada/ChangeLog:
* einfo.ads (Initialization_Statements): Document usage precisely.
* exp_aggr.adb (Convert_Aggr_In_Object_Decl): Do not create a
compound statement in most cases, do it only if necessary.
* exp_ch3.adb (Expand_N_Object_Declaration): Remove a couple of
useless statements.
* exp_ch6.adb (Make_Build_In_Place_Call_In_Object_Declaration):
Use the Initialization_Statements mechanism if necessary.
* exp_ch7.adb: Remove clauses for Aspects package.
(Insert_Actions_In_Scope_Around): Use the support code of Exp_Util
for the Initialization_Statements mechanism.
* exp_prag.adb (Undo_Initialization): Remove obsolete code.
* exp_util.ads (Move_To_Initialization_Statements): New procedure.
(Needs_Initialization_Statements): New function.
* exp_util.adb (Move_To_Initialization_Statements): New procedure.
(Needs_Initialization_Statements): New predicate.
Viljar Indus [Thu, 14 Nov 2024 13:22:44 +0000 (15:22 +0200)]
ada: Avoid expanding LHS assignments for controlled types
Expanding a function call that returns a controlled type
on the left-hand side of an assignment should be avoided.
Otherwise we will miss the diagnostic for
trying to assign something to a non-variable element.
gcc/ada/ChangeLog:
* exp_ch6.adb (Expand_Ctrl_Function_Call): Avoid expansion
of controlled types when the LHS is a function call.
ada: Ensure minimum stack size for preallocated task stacks
On targets with preallocated task stacks the minimum stack size is
defined as a constant in System.Parameters. When adding preallocated
tasks to the expanded code the compiler does not have direct access to
that value. Instead generate the expression
Max (Task_Size, Minimum_Task_Size) in the expanded tree and let it be
resolved later in the compilation process.
gcc/ada/ChangeLog:
* exp_ch9.adb (Expand_N_Task_Type_Declaration): Take
Minimum_Stack_Size into account when preallocating task stacks.
* rtsfind.ads (RE_Id, RE_Unit_Table): Add RE_Minimum_Stack_Size.
Sandra Loosemore [Thu, 12 Dec 2024 04:20:37 +0000 (04:20 +0000)]
Fix misplaced x86 -mstack-protector-guard-symbol documentation [PR117150]
Commit e1769bdd4cef522ada32aec863feba41116b183a accidentally inserted
the documentation for the x86 -mstack-protector-guard-symbol option in the
wrong place. Fixed thusly.
gcc/ChangeLog
PR target/117150
* doc/invoke.texi (RS/6000 and PowerPC Options): Move description
of -mstack-protector-guard-symbol from here...
(x86 Options): ...to here.
Jonathan Wakely [Tue, 10 Dec 2024 10:56:41 +0000 (10:56 +0000)]
libstdc++: Disable __gnu_debug::__is_singular(T*) in constexpr [PR109517]
Because of PR c++/85944 we have several bugs where _GLIBCXX_DEBUG causes
errors for constexpr code. Although Bug 117966 could be fixed by
avoiding redundant debug checks in std::span, and Bug 106212 could be
fixed by avoiding redundant debug checks in std::array, there are many
more cases where similar __glibcxx_requires_valid_range checks fail to
compile and removing the checks everywhere isn't desirable.
This just disables the __gnu_debug::__check_singular(T*) check during
constant evaluation. Attempting to dereference a null pointer will
certainly fail during constant evaluation (if it doesn't fail then it's
a compiler bug and not the library's problem). Disabling this check
during constant evaluation shouldn't do any harm.
libstdc++-v3/ChangeLog:
PR libstdc++/109517
PR libstdc++/109976
* include/debug/helper_functions.h (__valid_range_aux): Treat
all input iterator ranges as valid during constant evaluation.
Jonathan Wakely [Mon, 9 Dec 2024 17:35:24 +0000 (17:35 +0000)]
libstdc++: Skip redundant assertions in std::array equality [PR106212]
As PR c++/106212 shows, the Debug Mode checks cause a compilation error
for equality comparisons involving std::array prvalues in constant
expressions. Those Debug Mode checks are redundant when
comparing two std::array objects, because we already know we have a
valid range. We can also avoid the unnecessary step of using
std::__niter_base to do __normal_iterator unwrapping, which isn't needed
because our std::array iterators are just pointers. Using
std::__equal_aux1 instead of std::equal avoids the redundant checks in
std::equal and std::__equal_aux.
libstdc++-v3/ChangeLog:
PR libstdc++/106212
* include/std/array (operator==): Use std::__equal_aux1 instead
of std::equal.
* testsuite/23_containers/array/comparison_operators/106212.cc:
New test.
Jonathan Wakely [Mon, 9 Dec 2024 17:35:24 +0000 (17:35 +0000)]
libstdc++: Skip redundant assertions in std::span construction [PR117966]
As PR c++/117966 shows, the Debug Mode checks cause a compilation error
for a global constexpr std::span. Those debug checks are redundant when
constructing from an array or a range, because we already know we have a
valid range and we know its size. Instead of delegating to the
std::span(contiguous_iterator, contiguous_iterator) constructor, just
initialize the data members directly.
libstdc++-v3/ChangeLog:
PR libstdc++/117966
* include/std/span (span(T (&)[N])): Do not delegate to
constructor that performs redundant checks.
(span(array<T, N>&), span(const array<T, N>&)): Likewise.
(span(Range&&), span(const span<T, N>&)): Likewise.
* testsuite/23_containers/span/117966.cc: New test.
Jonathan Wakely [Wed, 11 Dec 2024 10:44:33 +0000 (10:44 +0000)]
libstdc++: Remove constraints on std::generator::promise_type::operator new
This was approved in Wrocław as LWG 3900, so that passing an incorrect
argument intended as an allocator will be ill-formed, instead of
silently ignored.
This also renames the template parameters and function parameters for
the allocators, to align with the names in the standard. I found it too
confusing to have a parameter _Alloc which doesn't correspond to Alloc
in the standard. Rename _Alloc to _Allocator (which the standard calls
Allocator) and rename _Na to _Alloc (which the standard calls Alloc).
libstdc++-v3/ChangeLog:
* include/std/generator (_Promise_alloc): Rename template
parameter. Use __alloc_rebind to rebind allocator.
(_Promise_alloc::operator new): Replace constraints with a
static_assert in the body. Rename allocator parameter.
(_Promise_alloc<void>::_M_allocate): Rename allocator parameter.
Use __alloc_rebind to rebind allocator.
(_Promise_alloc<void>::operator new): Rename allocator
parameter.
* testsuite/24_iterators/range_generators/alloc.cc: New test.
* testsuite/24_iterators/range_generators/lwg3900.cc: New test.
[PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges
This is a better fix of the PR permitting to avoid building live
ranges after rematerialization. It checks that FP can not be
eliminated now and that pseudos assigned to FP will be spilled. In
this case we need to build live ranges after rematerialization for
correct assignments of stack slots to spilled pseudos involved in
rematerialization.
gcc/ChangeLog:
PR rtl-optimization/116778
* ira-int.h (x_ira_class_hard_reg_index): Fix comment typo.
* lra-eliminations.cc (lra_fp_pseudo_p): New function.
* lra-int.h (lra_fp_pseudo_p): External declaration.
* lra-spills.cc (lra_need_for_spills_p): Fix formatting.
* lra.cc (lra): Use lra_fp_pseudo_p in lra_create_live_range after
lra_remat.
Paul Thomas [Wed, 11 Dec 2024 16:14:05 +0000 (16:14 +0000)]
Fortran: Add DECL_EXPR for variable length assoc name [PR117901]
2024-12-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/117901
* trans-stmt.cc (trans_associate_var): A variable character
length array associate name must generate a DECL expression for
the data pointer type.
gcc/testsuite/
PR fortran/117901
* gfortran.dg/pr117901.f90: New test.
Filip Kastl [Wed, 11 Dec 2024 18:57:04 +0000 (19:57 +0100)]
gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]
This patch adds a limit on the number of cases of a switch. When this
limit is exceeded, switch lowering decides to use faster but less
powerful algorithms.
In particular this means that for finding bit tests switch lowering
decides between the old dynamic programming O(n^2) algorithm and the
new greedy algorithm that Andi Kleen recently added but then reverted
due to PR117352. It also means that switch lowering may bail out on
finding jump tables if the switch is too large (Btw it also may not
bail! It can happen that the greedy algorithms finds some bit tests
which then basically split the switch into multiple smaller switches and
those may be small enough to fit under the limit.)
The limit is implemented as --param switch-lower-slow-alg-max-cases.
Exceeding the limit is reported through -Wdisabled-optimization.
This patch fixes the issue with the greedy algorithm described in
PR117352. The problem was incorrect usage of the is_beneficial()
heuristic.
gcc/ChangeLog:
PR middle-end/117091
PR middle-end/117352
* doc/invoke.texi: Add switch-lower-slow-alg-max-cases.
* params.opt: Add switch-lower-slow-alg-max-cases.
* tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
Note in a comment that we are looking for jump tables in
case sequences delimited by the already found bit tests.
(bit_test_cluster::find_bit_tests): Decide between
find_bit_tests_fast() and find_bit_tests_slow().
(bit_test_cluster::find_bit_tests_fast): New function.
(bit_test_cluster::find_bit_tests_slow): New function.
(switch_decision_tree::analyze_switch_statement): Report
exceeding the limit.
* tree-switch-conversion.h: Add find_bit_tests_fast() and
find_bit_tests_slow().
Co-Authored-By: Andi Kleen <ak@gcc.gnu.org> Signed-off-by: Filip Kastl <fkastl@suse.cz>
Jakub Jelinek [Wed, 11 Dec 2024 16:28:47 +0000 (17:28 +0100)]
c++: allow stores to anon union vars to change current union member in constexpr [PR117614]
Since r14-4771 the FE tries to differentiate between cases where the lhs
of a store allows changing the current union member and cases where it
doesn't, and cases where it doesn't includes everything that has gone
through the cxx_eval_constant_expression path on the lhs.
As the testcase shows, DECL_ANON_UNION_VAR_P vars were handled like that
too, even when stores to them are the only way how to change the current
union member in the sources.
So, the following patch just handles that case manually without calling
cxx_eval_constant_expression and without setting evaluated to true.
2024-12-11 Jakub Jelinek <jakub@redhat.com>
PR c++/117614
* constexpr.cc (cxx_eval_store_expression): For stores to
DECL_ANON_UNION_VAR_P vars just continue with DECL_VALUE_EXPR
of it, without setting evaluated to true or full
cxx_eval_constant_expression.
David Malcolm [Wed, 11 Dec 2024 15:32:14 +0000 (10:32 -0500)]
c++: tweak colorization of incompatible declspecs
Introduce a helper function for complaining about "signed unsigned"
and "short long". Add colorization there so that e.g. the 'signed'
and 'unsigned' are given consistent contrasting colors in both the
message and the quoted source.
gcc/cp/ChangeLog:
* decl.cc: Add #include "diagnostic-highlight-colors.h"
and #include "pretty-print-markup.h".
(complain_about_incompatible_declspecs): New.
(grokdeclarator): Use it when complaining about both 'signed' and
'unsigned', and both 'long' and 'short'.
gcc/ChangeLog:
* diagnostic-highlight-colors.h: Tweak comment.
* pretty-print-markup.h (class pp_element_quoted_string): New,
based on pretty-print.cc's selftest::test_element, adding an
optional highlight color.
* pretty-print.cc (class test_element): Drop.
(selftest::test_pp_format): Use pp_element_quoted_string.
(selftest::test_urlification): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/long-short-colorization.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 11 Dec 2024 15:24:26 +0000 (10:24 -0500)]
c++: print z candidate count and number them (v2)
Changed in v2: changed wording to "there is"/"there are" rather
than "we found".
This patch is a followup to:
"c++: use diagnostic nesting [PR116253]"
Following Sy Brand's UX suggestions in P2429R0 for example 1, this patch
tweaks print_z_candidates to add a note about the number of candidates,
and adds a candidate number to each one.
Various examples of output can be seen in the testsuite part of the
patch.
gcc/cp/ChangeLog:
* call.cc (print_z_candidates): Count the number of
candidates and issue a note stating the count at an
intermediate nesting level. Number the individual
candidates.
David Malcolm [Wed, 11 Dec 2024 15:21:35 +0000 (10:21 -0500)]
diagnostics: tweak output for nested messages [PR116253]
When printing nested messages with
-fdiagnostics-set-output=text:experimental-nesting=yes
avoid printing a line such as the "cc1plus:" in the following:
• note: set ‘-fconcepts-diagnostics-depth=’ to at least 2 for more detail
cc1plus:
for "special" locations such as UNKNOWN_LOCATION.
gcc/ChangeLog:
PR other/116253
* diagnostic-format-text.cc (on_report_diagnostic): When showing
locations for nested messages on new lines, don't print
UNKNOWN_LOCATION or BUILTINS_LOCATION.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Martin Jambor [Wed, 11 Dec 2024 13:55:27 +0000 (14:55 +0100)]
ipa: Update value range jump functions during inlining
When inlining (during the analysis phase) a call graph edge, we update
all pass-through jump functions corresponding to edges going out of
the newly inlined function to be relative to the function into which
we are inlining or to expose the information originally captured for
the edge that is being inlined.
Similarly, we can combine the value range information in pass-through
jump functions corresponding to both edges, which is what this patch
adds - at least for the case when the inlined pass-through is a
simple, non-arithmetic one, which is the case that we also handle for
constant and aggregate jump function parts.
gcc/ChangeLog:
2024-11-01 Martin Jambor <mjambor@suse.cz>
* ipa-cp.h: Forward declare class ipa_vr.
(ipa_vr_operation_and_type_effects) Declare.
* ipa-cp.cc (ipa_vr_operation_and_type_effects): Make public.
* ipa-prop.cc (update_jump_functions_after_inlining): Also update
value range jump functions.
middle-end: Add initial support for poly_int64 BIT_FIELD_REF in expand pass [PR96342]
While `poly_int64' has been the default representation of bitfield size
and offset for some time, there was a lack of support for the use of
non-constant `poly_int64' values for those values throughout the
compiler, limiting the applicability of the BIT_FIELD_REF rtl expression
for variable length vectors, such as those used by SVE.
This patch starts work on extending the functionality of relevant
functions in the expand pass such as to enable their use by the compiler
for such vectors.
middle-end: add vec_init support for variable length subvector concatenation. [PR96342]
For architectures where the vector-length is a compile-time variable,
rather representing a runtime constant, as is the case with SVE it is
perfectly reasonable that such vector be made up of two (or more) subvector
components of a compatible sub-length variable.
One example of this would be the concatenation of two VNx4QI vectors
into a single VNx8QI vector.
This patch adds initial support for the enablement of this feature in
the middle-end, removing the `.is_constant()' constraint on the vector's
number of elements, instead making the constant no. of elements the
multiple of the number of subvectors (which must then also be of
variable length, such that their polynomial ratio then results in a
compile-time constant) required to fill the vector.
gcc/ChangeLog:
PR target/96342
* expr.cc (store_constructor): add support for variable-length
vectors.
Co-authored-by: Tamar Christina <tamar.christina@arm.com>
middle-end: Fix mask length arg in call to vect_get_loop_mask [PR96342]
When issuing multiple calls to a simdclone in a vectorized loop,
TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared
to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived
from the relevant `rgroup_controls' entry within `vect_get_loop_mask'.
By passing `masktype' instead, we are able to get the correct number of
vector subparts and thu eliminate the ICE in the call to
`vect_get_loop_mask' when the data type for which we retrieve the mask
is wider than the one used when defining the mask at mask registration
time.
gcc/ChangeLog:
PR target/96342
* tree-vect-stmts.cc (vectorizable_simd_clone_call):
s/vectype/masktype/ in call to vect_get_loop_mask.
Andre Vieira [Wed, 11 Dec 2024 11:50:22 +0000 (11:50 +0000)]
middle-end: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE [PR96342]
This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the
target can reject a simd_clone based on the vector mode it is using.
This is needed because for VLS SVE vectorization the vectorizer accepts
Advanced SIMD simd clones when vectorizing using SVE types because the simdlens
might match. This will cause type errors later on.
Other targets do not currently need to use this argument.
gcc/ChangeLog:
PR target/96342
* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to
call TARGET_SIMD_CLONE_USABLE.
* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
and use it to reject the use of SVE simd clones with Advanced SIMD
modes.
* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.
* doc/tm.texi: Regenerate
Co-authored-by: Victor Do Nascimento <victor.donascimento@arm.com> Co-authored-by: Tamar Christina <tamar.christina@arm.com>
Tamar Christina [Wed, 11 Dec 2024 11:47:49 +0000 (11:47 +0000)]
middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]
IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.
In some cases where we have a two-complements equivalence but not a strict
signedness equivalencies we end up generating both a signed and unsigned IV for
the same candidate.
This patch implements a new OEP flag called OEP_ASSUME_WRAPV. This flag will
check if the operands would produce the same bit values after the computations
even if the final sign is different.
This happens quite a lot with Fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.
Which means that in e.g. exchange2 we generate a lot of duplicate code.
This is because candidate 6 and 8 are equivalent under two's complement but have
different signs.
This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other. IV already has code for this, so the patch just
uses affine trees instead of tree for the check.
Tamar Christina [Wed, 11 Dec 2024 11:45:36 +0000 (11:45 +0000)]
middle-end: refactor type to be explicit in operand_equal_p [PR114932]
This is a refactoring with no expected behavioral change.
The goal with this is to make the type of the expressions being used explicit.
I did not change all the recursive calls to operand_equal_p () to recurse
directly to the new function but instead this goes through the top level call
which re-extracts the types.
This was done because in most of the cases where we recurse type == arg.
The second patch makes use of this new flexibility to implement an overload
of operand_equal_p which checks for equality under two's complement.
gcc/ChangeLog:
PR tree-optimization/114932
* fold-const.cc (operand_compare::operand_equal_p): Split into one that
takes explicit type parameters and use that in public one.
* fold-const.h (class operand_compare): Add operand_equal_p private
overload.
Matthieu Longo [Tue, 28 May 2024 09:49:41 +0000 (10:49 +0100)]
autoupdate: replace obsolete macros in libiberty
Autoreconf-2.72 warns about obsolete macros. This patch aims at removing
the noise from a future upgrade to autoreconf-2.72 or later. This is in
no a way a complete patch allowing the upgrade to autoreconf-2.72.
- AC_GNU_SOURCE by AC_USE_SYSTEM_EXTENSIONS
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/
autoconf.html#index-AC_005fGNU_005fSOURCE-1
- AC_CONFIG_HEADER by AC_CONFIG_HEADERS
https://www.gnu.org/software/automake/manual/1.12.2/html_node/Obsolete-
Macros.html#index-AM_005fCONFIG_005fHEADER
Those fixes were originally submitted in a patch series in binutils.
https://inbox.sourceware.org/binutils/878qthm6a0.fsf@gentoo.org/
libiberty/getopt.c file is defining _NO_PROTO, which causes
conflicting declarations for the functions in AIX header files
like stdio.h & stdlib.h.
Looks like _NO_PROTO define were added long back and conflicting
declarations were always present until C23 standard uncovered it.
Remove the block defining _NO_PROTO as both Tru64 UNIX (ex-OSF/1)
and AIX 3.2 is no more supported.
This patch also adds the following overload:
aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode)
Depending on the data mode, the function returns a predicate with the
appropriate bits set.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload.
* config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise.
* config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3
and *sdiv_pow2<mode>3 to support Neon modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/sve-asrd.c: New test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> Signed-off-by: Soumya AR <soumyaa@nvidia.com>
Soumya AR [Wed, 11 Dec 2024 04:02:35 +0000 (09:32 +0530)]
aarch64: Extend SVE2 bit-select instructions for Neon modes.
NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain operands
inverted. These can be extended to work with Neon modes.
Since these instructions are unpredicated, duplicate patterns were added with
the predicate removed to generate these instructions for Neon modes.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md
(*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated
form.
(*aarch64_sve2_bsl1n_unpred<mode>): Likewise.
(*aarch64_sve2_bsl2n_unpred<mode>): Likewise.
liuhongt [Mon, 2 Dec 2024 09:54:59 +0000 (01:54 -0800)]
Fix inaccuracy in cunroll/cunrolli when considering what's innermost loop.
r15-919-gef27b91b62c3aa removed 1 / 3 size reduction for innermost
loop, but it doesn't accurately remember what's "innermost" for 2
testcases in PR117888.
1) For pass_cunroll, the "innermost" loop could be an originally outer
loop with inner loop completely unrolled by cunrolli. The patch moves
local variable cunrolli to parameter of tree_unroll_loops_completely
and passes it directly from execute of the pass.
2) For pass_cunrolli, cunrolli is set to false when the sibling loop
of a innermost loop is completely unrolled, and it inaccurately
takes the innermost loop as an "outer" loop. The patch add another
paramter innermost to helps recognizing the "original" innermost loop.
gcc/ChangeLog:
PR tree-optimization/117888
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Use
cunrolli instead of cunrolli && !loop->inner to check if it's
innermost loop.
(canonicalize_loop_induction_variables): Add new parameter
const_sbitmap innermost, and pass
cunrolli
&& (unsigned) loop->num < SBITMAP_SIZE (innermost)
&& bitmap_bit_p (innermost, loop->num) as "cunrolli" to
try_unroll_loop_completely
(canonicalize_induction_variables): Pass innermost to
canonicalize_loop_induction_variables.
(tree_unroll_loops_completely_1): Add new parameter
const_sbitmap innermost.
(tree_unroll_loops_completely): Move local variable cunrolli
to parameter to indicate it's from pass cunrolli, also track
all "original" innermost loop at the beginning.
gcc/testsuite/ChangeLog:
* gcc.dg/pr117888-2.c: New test.
* gcc.dg/vect/pr117888-1.c: Ditto.
* gcc.dg/tree-ssa/pr83403-1.c: Add
--param max-completely-peeled-insns=300 for arm*-*-*.
* gcc.dg/tree-ssa/pr83403-2.c: Ditto.
Gaius Mulley [Tue, 10 Dec 2024 20:47:36 +0000 (20:47 +0000)]
PR modula2/117120: case ch with a nul char constant causes ICE
This patch fixes the ICE caused when a case clause contains
a character constant ''. The fix was to walk the caselist and
convert any 0 length string into a char constant of value 0.
gcc/m2/ChangeLog:
PR modula2/117120
* gm2-compiler/M2CaseList.mod (CaseBoundsResolved): Rewrite.
(ConvertNulStr2NulChar): New procedure function.
(NulStr2NulChar): Ditto.
(GetCaseExpression): Ditto.
(OverlappingCaseBound): Rewrite.
* gm2-compiler/M2GCCDeclare.mod (CheckResolveSubrange): Allow
'' to be used as the subrange low limit.
* gm2-compiler/M2GenGCC.mod (FoldConvert): Rewrite.
(PopKindTree): Ditto.
(BuildHighFromString): Reformat.
* gm2-compiler/SymbolTable.mod (PushConstString): Add test for
length 0 and PushChar (nul).
gcc/testsuite/ChangeLog:
PR modula2/117120
* gm2/pim/pass/forloopnulchar.mod: New test.
* gm2/pim/pass/nulcharcase.mod: New test.
* gm2/pim/pass/nulcharvar.mod: New test.
[PR117946][LRA]: When assigning hard reg use biggest mode to check ira_prohibited_class_mode_regs
A pseudo in the PR test case gets hard reg 43 which is x86 r15 (after
r15, xmm regs go). The pseudo is of INT_SSE_CLASS and SImode but is
used in TImode as paradoxical subreg. r15 in TImode is wrong and does
not satisfy constraint 'r'. Therefore LRA creates moves involving the
pseudo in TImode until the limit of reload insns is achieved.
Unfortunately x86 hard_regno_mode_ok (as some hooks for other targets)
says that it is ok to use r15 for TImode pseudo. Therefore LRA uses
ira_prohibited_class_mode_regs for such cases but it was checked
against native pseudo mode. The patch fixes it by using the biggest
pseudo mode.
gcc/ChangeLog:
PR rtl-optimization/117946
* lra-assigns.cc: (find_hard_regno_for_1): Use the biggest mode to
check ira_prohibited_class_mode_regs.
Jerry DeLisle [Tue, 10 Dec 2024 04:11:23 +0000 (20:11 -0800)]
Fortran: Fix READ with padding in BLANK ZERO mode.
PR fortran/117819
libgfortran/ChangeLog:
* io/read.c (read_decimal): If the read value is short of the
specified width and pad mode is PAD yes, check for BLANK ZERO
and adjust the value accordingly.
(read_decimal_unsigned): Likewise.
(read_radix): Likewise.
even in a template. So one way to fix the ICE would be to check
!processing_template_decl. But we can also do the following and
continue warning even in templates.
This ICE appeared with the removal of NON_DEPENDENT_EXPR; before,
operand_equal_p would bail on this code so there was no problem.
PR c++/117880
gcc/ChangeLog:
* fold-const.cc (operand_compare::operand_equal_p) <case tcc_unary>:
Use OP_SAME_WITH_NULL instead of OP_SAME.
Wilco Dijkstra [Tue, 10 Dec 2024 14:22:48 +0000 (14:22 +0000)]
arm: Fix LDRD register overlap [PR117675]
The register indexed variants of LDRD have complex register overlap constraints
which makes them hard to use without using output_move_double (which can't be
used for atomics as it doesn't guarantee to emit atomic LDRD/STRD when required).
Add a new predicate and constraint for plain LDRD/STRD with base or base+imm.
This blocks register indexing and fixes PR117675.
gcc:
PR target/117675
* config/arm/arm.cc (arm_ldrd_legitimate_address): New function.
* config/arm/arm-protos.h (arm_ldrd_legitimate_address): New prototype.
* config/arm/constraints.md: Add new Uo constraint.
* config/arm/predicates.md (arm_ldrd_memory_operand): Add new predicate.
* config/arm/sync.md (arm_atomic_loaddi2_ldrd): Use
arm_ldrd_memory_operand and Uo.
gcc/testsuite:
PR target/117675
* gcc.target/arm/pr117675.c: Add new test.
Wilco Dijkstra [Thu, 14 Nov 2024 14:28:10 +0000 (14:28 +0000)]
AArch64: Add baseline tune
Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a
common base supported by all modern cores. Initially set it to
AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code.
Wilco Dijkstra [Tue, 1 Oct 2024 16:51:14 +0000 (16:51 +0000)]
AArch64: Cleanup alignment macros
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make
future changes easier. Use the existing alignment settings, however avoid
overaligning small array's or structs to 64 bits when there is no benefit.
The lower alignment gives a small reduction in data and stack size.
Using 32-bit alignment for small char arrays still improves performance of
string functions since it can be loaded in full by the first 8/16-byte load.
gcc:
* config/aarch64/aarch64.h (AARCH64_EXPAND_ALIGNMENT): Remove.
(DATA_ALIGNMENT): Use aarch64_data_alignment.
(LOCAL_ALIGNMENT): Use aarch64_stack_alignment.
* config/aarch64/aarch64.cc (aarch64_data_alignment): New function.
(aarch64_stack_alignment): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_data_alignment): New prototype.
(aarch64_stack_alignment): Likewise.
Wilco Dijkstra [Fri, 10 May 2024 17:13:40 +0000 (17:13 +0000)]
AArch64: Use LDP/STP for large struct types
Use LDP/STP for large struct types as they have useful immediate offsets and
are typically faster. This removes differences between little and big endian
and allows use of LDP/STP without UNSPEC.
gcc:
* config/aarch64/aarch64.cc (aarch64_classify_address): Treat SIMD structs
identically in little and bigendian.
* config/aarch64/aarch64-simd.md (aarch64_mov<mode>): Remove VSTRUCT
instructions.
(aarch64_be_mov<mode>): Allow little-endian, rename to aarch64_mov<mode>.
(aarch64_be_movoi): Allow little-endian, rename to aarch64_movoi.
(aarch64_be_movci): Allow little-endian, rename to aarch64_movci.
(aarch64_be_movxi): Allow little-endian, rename to aarch64_movxi.
Remove big-endian special case in define_split variants.
gcc/testsuite:
* gcc.target/aarch64/torture/simd-abi-8.c: Update to check for LDP/STP.
This provides to people working on coroutines, as well as writing tests
for coroutines, a way to have insight into the results and inputs of the
coroutine transformation passes, which is quite essential to
understanding what happens in the coroutine transformation. Currently,
the information dumped is the pre-transform function (which is not
otherwise available), the generated ramp function, the generated frame
type, the transformed actor/resumer, and the destroyer stub.
While debugging this, I've also encountered a minor bug in
c-pretty-print.cc, where it tried to check DECL_REGISTER of DECLs that
did not support it. I've added a check for that.
Similary, I found one in pp_cxx_template_parameter, where TREE_TYPE was
called on the list cell the template parameter was in rather than on the
parameter itself. I've fixed that.
And, lastly, there appeared to be no way to pretty-print a FIELD_DECL,
so I added support to cxx_pretty_printer::declaration for it (by reusing
the VAR_DECL path).
* c-pretty-print.cc (c_pretty_printer::storage_class_specifier):
Check that we're looking at a PARM_DECL or VAR_DECL before
looking at DECL_REGISTER.
gcc/cp/ChangeLog:
* coroutines.cc (dump_record_fields): New helper. Iterates a
RECORD_TYPEs TYPE_FIELDS and pretty-prints them.
(dmp_str): New. The lang-coro dump stream.
(coro_dump_id): New. ID of the lang-coro dump.
(coro_dump_flags): New. Flags passed to the lang-coro dump.
(coro_maybe_dump_initial_function): New helper. Prints, if
dumping is enabled, the fndecl passed to it as the original
function.
(coro_maybe_dump_ramp): New. Prints the ramp function passed to
it, if dumping is enabled.
(coro_maybe_dump_transformed_functions): New.
(cp_coroutine_transform::apply_transforms): Initialize the
lang-coro dump. Call coro_maybe_dump_initial_function on the
original function, as well as coro_maybe_dump_ramp, after the
transformation into the ramp is finished.
(cp_coroutine_transform::finish_transforms): Call
coro_maybe_dump_transformed_functions on the built actor and
destroy.
* cp-objcp-common.cc (cp_register_dumps): Register the coroutine
dump.
* cp-tree.h (coro_dump_id): Declare as extern.
* cxx-pretty-print.cc (pp_cxx_template_parameter): Don't call
TREE_TYPE on a TREE_LIST cell.
(cxx_pretty_printer::declaration): Handle FIELD_DECL similar to
VAR_DECL.
gcc/ChangeLog:
* dumpfile.cc (FIRST_ME_AUTO_NUMBERED_DUMP): Bump to 6 for sake
of the coroutine dump.
Marek Polacek [Wed, 27 Nov 2024 23:00:24 +0000 (18:00 -0500)]
c++: P2865R5, Remove Deprecated Array Comparisons from C++26 [PR117788]
This patch implements P2865R5 by promoting the warning to permerror in
C++26 only.
In C++20 we should warn even without -Wall. Jason fixed this in r15-5713
but let's add a test that doesn't use -Wall.
This caused a FAIL in conditionally_borrowed.cc because we end up
comparing two array types in equality_comparable_with ->
__weakly_eq_cmp_with. That could be fixed in libstc++, perhaps by
adding std::decay in the appropriate place.
PR c++/117788
gcc/c-family/ChangeLog:
* c-warn.cc (do_warn_array_compare): Emit a permerror in C++26.
gcc/cp/ChangeLog:
* typeck.cc (cp_build_binary_op) <case EQ_EXPR>: Don't check
warn_array_compare. Check tf_warning_or_error instead of just
tf_warning. Maybe return an error_mark_node in C++26.
<case LE_EXPR>: Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/Warray-compare-1.c: Expect an error in C++26.
* c-c++-common/Warray-compare-3.c: Likewise.
* c-c++-common/Warray-compare-4.c: New test.
* c-c++-common/Warray-compare-5.c: New test.
* g++.dg/warn/Warray-compare-1.C: New test.
libstdc++-v3/ChangeLog:
* testsuite/std/ranges/adaptors/conditionally_borrowed.cc: Add a
FIXME, adjust.
Tobias Burnus [Tue, 10 Dec 2024 15:16:04 +0000 (16:16 +0100)]
plugin/plugin-gcn.c: Fix error handling of GOMP_OFFLOAD_openacc_async_construct
Follow up to r15-5392-g884637b6362391. As the name implies,
GOMP_OFFLOAD_openacc_async_construct is also externally called.
Hence, partially revert previous commit to permit unlocking handling
in oacc-async.c's lookup_goacc_asyncqueue by not failing fatally.
Hence, also the other (indirect) callers had to be updated:
GOMP_OFFLOAD_dev2dev fails now with 'false' and
GOMP_OFFLOAD_async_run fatally.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_async_run):
Handle omp_async_queue == NULL after call to maybe_init_omp_async.
(GOMP_OFFLOAD_openacc_async_construct): Use error not fatal error,
partially reverting r15-5392.
PR117973 covers the aspect of
non-LOGICAL_OP_NON_SHORT_CIRCUIT targets for PR111456, for
which the test-case gcc.dg/tree-ssa/pr111456-1.c started
failing as described in PR117954.
testsuite: Mark gcc.c-torture/execute/memcpy-a?.c tests expensive
These tests can take several seconds per compilation to complete, taking
total elapsed time measured in minutes. Mark them as expensive so as to
let people skip them where they want to save on testing time.
gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c: Mark as expensive.
* gcc.c-torture/execute/memcpy-a2.c: Likewise.
* gcc.c-torture/execute/memcpy-a4.c: Likewise.
* gcc.c-torture/execute/memcpy-a8.c: Likewise.
This patch removes the remaining traces of the vcond{,u,eq} optabs.
Earlier patches removed the target-independent uses and I couldn't
find any direct references to either the *_optabs or the ifns
in target-specific code.
Saurabh Jha [Tue, 10 Dec 2024 13:21:21 +0000 (13:21 +0000)]
aarch64: Add support for fp8fma instructions
The AArch64 FEAT_FP8FMA extension introduces instructions for
multiply-add of vectors.
This patch introduces the following instructions:
1. {vmlalbq|vmlaltq}_f16_mf8_fpm.
2. {vmlalbq|vmlaltq}_lane{q}_f16_mf8_fpm.
3. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_f32_mf8_fpm.
4. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_lane{q}_f32_mf8_fpm.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(aarch64_pragma_builtins_checker::require_immediate_lane_index): New
overload.
(aarch64_pragma_builtins_checker::check): Add support for FP8FMA
intrinsics.
(aarch64_expand_pragma_builtins): Likewise.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Conditionally define TARGET_FP8FMA.
* config/aarch64/aarch64-simd-pragma-builtins.def: Add the FP8FMA
intrinsics.
* config/aarch64/aarch64-simd.md:
(@aarch64_<FMLAL_FP8_HF:insn><mode): New pattern.
(@aarch64_<FMLAL_FP8_HF:insn>_lane<V8HF_ONLY:mode><VB:mode>):
Likewise.
(@aarch64_<FMLALL_FP8_SF:insn><mode): Likewise.
(@aarch64_<FMLALL_FP8_SF:insn>_lane<V8HF_ONLY:mode><VB:mode>):
Likewise.
* config/aarch64/iterators.md (V8HF_ONLY): New mode iterator.
(SVE2_FP8_TERNARY_VNX8HF): Rename to...
(FMLAL_FP8_HF): ...this.
(SVE2_FP8_TERNARY_LANE_VNX8HF): Delete in favor of FMLAL_FP8_HF.
(SVE2_FP8_TERNARY_VNX4SF): Rename to...
(FMLALL_FP8_SF): ...this.
(SVE2_FP8_TERNARY_LANE_VNX4SF): Delete in favor of FMLALL_FP8_SF.
(sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Fold into...
(insn): ...here.
* config/aarch64/aarch64-sve2.md: Update uses accordingly.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test TARGET_FP8FMA.
* gcc.target/aarch64/simd/vmla_fpm.c: New test.
* gcc.target/aarch64/simd/vmla_lane_indices_1.c: Likewise.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Saurabh Jha [Tue, 10 Dec 2024 13:21:20 +0000 (13:21 +0000)]
aarch64: Add support for fp8dot2 and fp8dot4
The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces
instructions for dot product of vectors.
This patch introduces the following intrinsics:
1. vdot{q}_{fp16|fp32}_mf8_fpm.
2. vdot{q}_lane{q}_{fp16|fp32}_mf8_fpm.
We added a new aarch64_builtin_signature variant, ternary_lane, and added
support for it in the functions aarch64_fntype and
aarch64_expand_pragma_builtin.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(enum class): Add ternary_lane.
(aarch64_fntype): Hnadle ternary_lane.
(aarch64_pragma_builtins_checker::require_immediate_lane_index): New
function.
(aarch64_pragma_builtins_checker::check): Handle the new intrinsics.
(aarch64_expand_pragma_builtin): Likewise.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Define TARGET_FP8DOT2 and
TARGET_FP8DOT4.
* config/aarch64/aarch64-simd-pragma-builtins.def: Define vdot
and vdot_lane intrinsics.
* config/aarch64/aarch64-simd.md
(@aarch64_<fpm_uns_op><mode>): New pattern.
(@aarch64_<fpm_uns_op>_lane<VQ_HSF_VDOT:mode><VB:mode>): Likewise.
* config/aarch64/iterators.md (VQ_HSF_VDOT): New mode iterator.
(UNSPEC_VDOT, UNSPEC_VDOT_LANE): New unspecs.
(fpm_uns_op): Handle them.
(VNARROWB, Vnbtype): New mode attributes.
(FPM_VDOT, FPM_VDOT_LANE): New int iterators.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test fp8dot2 and fp8dot4.
* gcc.target/aarch64/simd/vdot2_fpm.c: New test.
* gcc.target/aarch64/simd/vdot4_fpm.c: New test.
* gcc.target/aarch64/simd/vdot_lane_indices_1.c: New test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Saurabh Jha [Tue, 10 Dec 2024 13:21:20 +0000 (13:21 +0000)]
aarch64: Add support for fp8 convert and scale
The AArch64 FEAT_FP8 extension introduces instructions for conversion
and scaling.
This patch introduces the following intrinsics:
1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm.
2. vcvt{q}_mf8_f16_fpm.
3. vcvt_{high}_mf8_f32_fpm.
4. vscale{q}_{f16|f32|f64}.
We introduced two aarch64_builtin_signatures enum variants, unary and
ternary, and added support for these variants in the functions
aarch64_fntype and aarch64_expand_pragma_builtin.
We added new simd_types for integers (s32, s32q, and s64q) and for
floating points (f8 and f8q).
Because we added support for fp8 intrinsics here, we modified the check
in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not
defined.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(FLAG_USES_FPMR, FLAG_FP8): New flags.
(ENTRY): Modified to support ternary operations.
(enum class): New variants to support new signatures.
(struct aarch64_pragma_builtins_data): Extend types to 4 elements.
(aarch64_fntype): Handle new signatures.
(aarch64_get_low_unspec): New function.
(aarch64_convert_to_v64): New function, split out from...
(aarch64_expand_pragma_builtin): ...here. Handle new signatures.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): New flag for FP8.
* config/aarch64/aarch64-simd-pragma-builtins.def: Define new fp8
intrinsics.
(ENTRY_BINARY, ENTRY_BINARY_LANE): Update for new ENTRY interface.
(ENTRY_UNARY, ENTRY_TERNARY, ENTRY_UNARY_FPM): New macros.
(ENTRY_BINARY_VHSDF_SIGNED): Likewise.
* config/aarch64/aarch64-simd.md
(@aarch64_<fpm_uns_op><mode>): New pattern.
(@aarch64_<fpm_uns_op><mode>_high): Likewise.
(@aarch64_<fpm_uns_op><mode>_high_be): Likewise.
(@aarch64_<fpm_uns_op><mode>_high_le): Likewise.
* config/aarch64/iterators.md (V4SF_ONLY, VQ_BHF): New mode iterators.
(UNSPEC_FCVTN_FP8, UNSPEC_FCVTN2_FP8, UNSPEC_F1CVTL_FP8)
(UNSPEC_F1CVTL2_FP8, UNSPEC_F2CVTL_FP8, UNSPEC_F2CVTL2_FP8)
(UNSPEC_FSCALE): New unspecs.
(VPACKB, VPACKBtype): New mode attributes.
(b): Add support for V[48][BH]F.
(FPM_UNARY_UNS, FPM_BINARY_UNS, SCALE_UNS): New int iterators.
(insn): New int attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature
macro doesn't exist and...
* gcc.target/aarch64/pragma_cpp_predefs_4.c: ...test that it does here.
* gcc.target/aarch64/simd/scale_fpm.c: New test.
* gcc.target/aarch64/simd/vcvt_fpm.c: New test.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Jonathan Wakely [Tue, 10 Dec 2024 09:48:57 +0000 (09:48 +0000)]
libstdc++: Revert change to __bitwise_relocatable
This reverts r15-6060-ge4a0157c2397c9 so that __is_bitwise_relocatable
depends only on is_trivial. To avoid the deprecation warnings for C++26,
use the __is_trivial built-in directly instead of std::is_trivial.
We need to be sure that the type is trivially copyable, not just
trivially constructible and trivially assignable. Otherwise we get
-Wclass-memaccess diagnostics for e.g. std::vector<std::pair<A*, B*>>.
We could add is_trivially_copyable to the conditions, but this isn't
really an appropriate change for stage 3 anyway (it affects all modes
from C++11 upwards). Just revert to using is_trivial, and we can revisit
the condition for GCC 16.
libstdc++-v3/ChangeLog:
* include/bits/stl_uninitialized.h (__is_bitwise_relocatable):
Revert to depending on is_trivial.
Richard Biener [Thu, 5 Dec 2024 09:47:13 +0000 (10:47 +0100)]
tree-optimization/117912 - bogus address equivalences for __builtin_object_size
VN again is the culprit for exploiting address equivalences before
__builtin_object_size got the chance to do its job. This time
it isn't about union members but adjacent structure fields where
an address to one after the last element of an array field can
spill over to the next field.
The following protects all out-of-bound accesses on the upper bound
side (singling out TYPE_MAX_VALUE + 1 is more expensive). It
ignores other out-of-bound addresses that would invoke UB.
Zero-sized arrays are a bit awkward because the C++ represents them
with a -1U upper bound.
There's a similar issue for zero-sized components whose address can
be the same as the adjacent field in C.
PR tree-optimization/117912
* tree-ssa-sccvn.cc (copy_reference_ops_from_ref): For addresses
of zero-sized components do not set ->off if the object size pass
didn't run.
For OOB ARRAY_REF accesses in address expressions avoid setting
->off if the object size pass didn't run.
(valueize_refs_1): Likewise.
* c-c++-common/torture/pr117912-1.c: New testcase.
* c-c++-common/torture/pr117912-2.c: Likewise.
* c-c++-common/torture/pr117912-3.c: Likewise.