Piotr Trojanek [Mon, 5 Feb 2024 18:41:50 +0000 (19:41 +0100)]
ada: Cleanup repeated code in expansion of stream attributes
In expansion of various attributes, in particular for the Input/Output
and Read/Write attributes, we can use constants that are already used
for expansion of many other attributes.
gcc/ada/
* exp_attr.adb (Expand_N_Attribute_Reference): Use constants
declared at the beginning of subprogram; tune layout.
* exp_ch3.adb (Predefined_Primitive_Bodies): Tune layout.
Piotr Trojanek [Tue, 19 Mar 2024 10:22:40 +0000 (11:22 +0100)]
ada: Simplify check for type without stream operations
Recursive routine Type_Without_Stream_Operation was checking restriction
No_Default_Stream_Attributes at every call, which was confusing and
inefficient.
This routine is only called from the places: Check_Stream_Attribute,
which already checks if this restriction is active, and
Stream_Operation_OK, where we add such a check.
Cleanup related to extending the use of No_Streams restriction.
gcc/ada/
* exp_ch3.adb (Stream_Operation_OK): Check restriction
No_Default_Stream_Attributes before call to
Type_Without_Stream_Operation.
* sem_util.adb (Type_Without_Stream_Operation): Remove static
condition from recursive routine
Piotr Trojanek [Fri, 5 Apr 2024 12:22:34 +0000 (14:22 +0200)]
ada: Enable inlining for subprograms with multiple return statements
With the support for forward GOTO statements in the GNATprove backend,
we can now inline subprograms with multiple return statements in the
frontend.
Also, fix inconsistent source locations in the inlined code, which were
now triggering assertion violations in the code for GNATprove
counterexamples.
gcc/ada/
* inline.adb (Has_Single_Return_In_GNATprove_Mode): Remove.
(Process_Formals): When rewriting an occurrence of a formal
parameter, use location of the occurrence, not of the inlined
call.
Piotr Trojanek [Thu, 11 Apr 2024 08:04:19 +0000 (10:04 +0200)]
ada: Add switch to disable expansion of assertions in CodePeer mode
A new debug switch -gnatd_k is added, which has only effect in CodePeer
mode. When enabled, assertion expressions are no longer expanded (which
is the default in the CodePeer mode); instead, their expansion needs to
be explicitly enabled by pragma Assertion_Policy.
gcc/ada/
* debug.adb (d_k): Use first available debug switch.
* gnat1drv.adb (Adjust_Global_Switches): If new debug switch is
active then don't expand assertion expressions by default.
Piotr Trojanek [Fri, 1 Dec 2023 17:47:01 +0000 (18:47 +0100)]
ada: Refactor common code for dynamic and static class-wide preconditions
Code cleanup; semantics is unaffected.
gcc/ada/
* exp_ch6.adb (Install_Class_Preconditions_Check): Refactor
common code for checking if precondition fails, since the
difference is only in raising an exception or calling the
Raise_Assert_Failure procedure.
Piotr Trojanek [Mon, 8 Apr 2024 14:26:02 +0000 (16:26 +0200)]
ada: Fix handling of aspects CPU and Interrupt_Priority
When resolving aspect expression, aspects CPU and Interrupt_Priority
should be handled like the aspect Priority; in particular, all these
expressions can reference discriminants of the annotated task type.
gcc/ada/
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Make
discriminants visible when analyzing aspect Interrupt_Priority.
(Freeze_Entity_Checks): Likewise.
(Resolve_Aspect_Expressions): Likewise for both aspects CPU and
Interrupt_Priority.
Piotr Trojanek [Mon, 8 Apr 2024 16:00:05 +0000 (18:00 +0200)]
ada: Refactor checks for Refined_Depends in generic instances
Code cleanup; semantics is unaffected.
gcc/ada/
* sem_prag.adb (Check_Dependency_Clause, Check_Output_States,
Report_Extra_Clauses): Remove multiple checks for being inside
an instance.
(Analyze_Refined_Depends_In_Decl_Part): Add single check for
being inside an instance.
Piotr Trojanek [Mon, 8 Apr 2024 11:54:22 +0000 (13:54 +0200)]
ada: Refactor checks for Refined_Global in generic instances
Code cleanup; semantics is unaffected.
gcc/ada/
* sem_prag.adb (Check_In_Out_States, Check_Input_States,
Check_Output_States, Check_Proof_In_States,
Check_Refined_Global_List, Report_Extra_Constituents,
Report_Missing_Items): Remove multiple checks for being inside
an instance.
(Analyze_Refined_Global_In_Decl_Part): Add single check for
being inside an instance.
Andreas Krebbel [Mon, 10 Jun 2024 07:09:10 +0000 (09:09 +0200)]
IBM Z: Fix ICE in expand_perm_as_replicate
The current implementation assumes to always be invoked with register
operands. For memory operands we even have an instruction
though (vlrep). With the patch we try this first and only if it fails
force the input into a register and continue.
vec_splats generation fails for single element 128bit types which are
allowed for vec_splat. This is something to sort out with another
patch I guess.
gcc/ChangeLog:
* config/s390/s390.cc (expand_perm_as_replicate): Handle memory
operands.
* config/s390/vx-builtins.md (vec_splats<mode>): Turn into parameterized expander.
(@vec_splats<mode>): New expander.
Richard Biener [Fri, 7 Jun 2024 10:15:31 +0000 (12:15 +0200)]
tree-optimization/115383 - EXTRACT_LAST_REDUCTION with multiple stmt copies
The EXTRACT_LAST_REDUCTION code isn't ready to deal with multiple stmt
copies but SLP no longer checks for this. The following adjusts
code generation to handle the situation.
PR tree-optimization/115383
* tree-vect-stmts.cc (vectorizable_condition): Handle
generating a chain of .FOLD_EXTRACT_LAST.
Andreas Tobler [Sun, 9 Jun 2024 21:18:04 +0000 (23:18 +0200)]
FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14
As of FreeBSD version 14, FreeBSD no longer provides profiled system
libraries like libc_p and libpthread_p. Stop linking against them if
the FreeBSD major version is 14 or more.
gcc:
* config/freebsd-spec.h: Change fbsd-lib-spec for FreeBSD > 13,
do not link against profiled system libraries if -pg is invoked.
Add a define to note about this change.
* config/aarch64/aarch64-freebsd.h: Use the note to inform if
-pg is invoked on FreeBSD > 13.
* config/arm/freebsd.h: Likewise.
* config/i386/freebsd.h: Likewise.
* config/i386/freebsd64.h: Likewise.
* config/riscv/freebsd.h: Likewise.
* config/rs6000/freebsd64.h: Likewise.
* config/rs6000/sysv4.h: Likeise.
Andreas noted we were getting an uninit warning after the recent constant
synthesis changes. Essentially there's no way for the uninit analysis code to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.
So trivial initialization with NULL_RTX is the obvious fix.
Roger Sayle [Sun, 9 Jun 2024 01:47:08 +0000 (19:47 -0600)]
[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.
This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations. During expansion of
these operations, the middle-end creates RTL like (X<<C1) | (Y>>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.
An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:
unsigned long long foo(unsigned long long x) { return x<<2; }
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
gcc/ChangeLog
* expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
to generate PLUS instead or IOR when unioning disjoint bitfields.
* optabs.cc (expand_subword_shift): Likewise.
(expand_binop): Likewise for double-word rotate.
Peter Bergner [Fri, 7 Jun 2024 21:03:08 +0000 (16:03 -0500)]
rs6000: Update ELFv2 stack frame comment showing the correct ROP save location
The ELFv2 stack frame layout comment in rs6000-logue.cc shows the ROP
hash save slot in the wrong location. Update the comment to show the
correct ROP hash save location in the frame.
Simon Martin [Fri, 7 Jun 2024 09:21:07 +0000 (11:21 +0200)]
c++: Make *_cast<*> parsing more robust to errors [PR108438]
We ICE upon the following when trying to emit a -Wlogical-not-parentheses
warning:
=== cut here ===
template <typename T> T foo (T arg, T& ref, T* ptr) {
int a = 1;
return static_cast<T!>(a);
}
=== cut here ===
This patch makes *_cast<*> parsing more robust by skipping to the closing '>'
upon error in the target type.
Successfully tested on x86_64-pc-linux-gnu.
PR c++/108438
gcc/cp/ChangeLog:
* parser.cc (cp_parser_postfix_expression): Use
cp_parser_require_end_of_template_parameter_list to skip to the closing
'>' upon error parsing the target type of *_cast<*> expressions.
Before this patch:
sat_u_sub_uint64_t_fmt_1:
bltu a0,a1,.L2
sub a0,a0,a1
ret
.L2:
li a0,0
ret
After this patch:
sat_u_sub_uint64_t_fmt_1:
sltu a5,a0,a1
addi a5,a5,-1
sub a0,a0,a1
and a0,a5,a0
ret
ToDo:
Only above 2 forms of .SAT_SUB are support for now, we will
support more forms of .SAT_SUB in the middle-end in short future.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_ussub): Add new func
decl for ussub expanding.
* config/riscv/riscv.cc (riscv_expand_ussub): Ditto but for impl.
* config/riscv/riscv.md (ussub<mode>3): Add new pattern ussub
for scalar modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test macros and comments.
* gcc.target/riscv/sat_u_sub-1.c: New test.
* gcc.target/riscv/sat_u_sub-2.c: New test.
* gcc.target/riscv/sat_u_sub-3.c: New test.
* gcc.target/riscv/sat_u_sub-4.c: New test.
* gcc.target/riscv/sat_u_sub-5.c: New test.
* gcc.target/riscv/sat_u_sub-6.c: New test.
* gcc.target/riscv/sat_u_sub-7.c: New test.
* gcc.target/riscv/sat_u_sub-8.c: New test.
* gcc.target/riscv/sat_u_sub-run-1.c: New test.
* gcc.target/riscv/sat_u_sub-run-2.c: New test.
* gcc.target/riscv/sat_u_sub-run-3.c: New test.
* gcc.target/riscv/sat_u_sub-run-4.c: New test.
* gcc.target/riscv/sat_u_sub-run-5.c: New test.
* gcc.target/riscv/sat_u_sub-run-6.c: New test.
* gcc.target/riscv/sat_u_sub-run-7.c: New test.
* gcc.target/riscv/sat_u_sub-run-8.c: New test.
Roger Sayle [Sat, 8 Jun 2024 04:01:38 +0000 (05:01 +0100)]
analyzer: Restore g++ 4.8 bootstrap; use std::move to return std::unique_ptr.
This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.
2024-06-08 Roger Sayle <roger@nextmovesoftware.com>
David Malcolm [Fri, 7 Jun 2024 20:14:29 +0000 (16:14 -0400)]
analyzer: add logging to get_representative_path_var
This was very helpful when debugging the cast_region::m_original_region
removal, but is probably too verbose to enable except by hand on
specific calls to get_representative_tree.
gcc/analyzer/ChangeLog:
* engine.cc (impl_region_model_context::on_state_leak): Pass nullptr
to get_representative_path_var.
* region-model.cc (region_model::get_representative_path_var_1):
Add logger param and use it in both overloads.
(region_model::get_representative_path_var): Likewise.
(region_model::get_representative_tree): Likewise.
(selftest::test_get_representative_path_var): Pass nullptr to
get_representative_path_var.
* region-model.h (region_model::get_representative_tree): Add
optional logger param to both overloads.
(region_model::get_representative_path_var): Add logger param to
both overloads.
(region_model::get_representative_path_var_1): Likewise.
* store.cc (binding_cluster::get_representative_path_vars): Add
logger param and use it.
(store::get_representative_path_vars): Likewise.
* store.h (binding_cluster::get_representative_path_vars): Add
logger param.
(store::get_representative_path_vars): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
cast_region had its own field m_original_region, rather than
simply using region::m_parent, leading to lots of pointless
special-casing of RK_CAST.
Remove the field and simply use the parent region.
Doing so revealed a bug (seen in gcc.dg/analyzer/taint-alloc-4.c)
where region_model::get_representative_path_var_1's RK_CAST case
was always failing, due to using the "parent region" (actually
that of the original region's parent), rather than the original region;
the patch fixes the bug by removing the distinction.
gcc/analyzer/ChangeLog:
* call-summary.cc
(call_summary_replay::convert_region_from_summary_1): Update
for removal of cast_region::m_original_region.
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Likewise.
* region-model.cc (region_model::get_store_value): Likewise.
* region.cc (region::get_base_region): Likewise.
(region::descendent_of_p): Likewise.
(region::maybe_get_frame_region): Likewise.
(region::get_memory_space): Likewise.
(region::calc_offset): Likewise.
(cast_region::accept): Delete.
(cast_region::dump_to_pp): Update for removal of
cast_region::m_original_region.
(cast_region::add_dump_widget_children): Delete.
* region.h (struct cast_region::key_t): Rename "original_region"
to "parent".
(cast_region::cast_region): Likewise. Update for removal of
cast_region::m_original_region.
(cast_region::accept): Delete.
(cast_region::add_dump_widget_children): Delete.
(cast_region::get_original_region): Delete.
(cast_region::m_original_region): Delete.
* sm-taint.cc (region_model::check_region_for_taint): Remove
special-casing for RK_CAST.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/taint-alloc-4.c: Update expected result to
reflect change in message due to
region_model::get_representative_path_var_1 now handling RK_CAST.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
demo.c: In function ‘test_invalid_calc_of_array_size’:
demo.c:9:20: warning: undefined behavior when subtracting pointers [CWE-469] [-Wanalyzer-undefined-behavior-ptrdiff]
9 | return &sentinel - arr;
| ^
events 1-2
│
│ 3 | int arr[42];
│ | ~~~
│ | |
│ | (2) underlying object for right-hand side of subtraction created here
│ 4 | int sentinel;
│ | ^~~~~~~~
│ | |
│ | (1) underlying object for left-hand side of subtraction created here
│
└──> ‘test_invalid_calc_of_array_size’: event 3
│
│ 9 | return &sentinel - arr;
│ | ^
│ | |
│ | (3) ⚠️ subtraction of pointers has undefined behavior if they do not point into the same array object
│
gcc/analyzer/ChangeLog:
PR analyzer/105892
* analyzer.opt (Wanalyzer-undefined-behavior-ptrdiff): New option.
* analyzer.opt.urls: Regenerate.
* region-model.cc (class undefined_ptrdiff_diagnostic): New.
(check_for_invalid_ptrdiff): New.
(region_model::get_gassign_result): Call it for POINTER_DIFF_EXPR.
Simon Martin [Tue, 4 Jun 2024 19:20:23 +0000 (21:20 +0200)]
c++: Handle erroneous DECL_LOCAL_DECL_ALIAS in duplicate_decls [PR107575]
We currently ICE upon the following because we don't properly handle local
functions with an error_mark_node as DECL_LOCAL_DECL_ALIAS in duplicate_decls.
=== cut here ===
void f (void) {
virtual int f (void) const;
virtual int f (void);
}
=== cut here ===
This patch fixes this by checking for error_mark_node.
Successfully tested on x86_64-pc-linux-gnu.
PR c++/107575
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Check for error_mark_node
DECL_LOCAL_DECL_ALIAS.
Jason Merrill [Wed, 5 Jun 2024 02:27:56 +0000 (22:27 -0400)]
c++: -include and header unit translation
Within a source file, #include is translated to import if a suitable header
unit is available, but this wasn't working with -include. This turned out
to be because we suppressed the translation before the beginning of the
main file. After removing that, I had to tweak libcpp file handling to
accommodate the way it moves from an -include to the main file.
Patrick Palka [Fri, 7 Jun 2024 16:12:30 +0000 (12:12 -0400)]
c++: lambda in pack expansion [PR115378]
Here find_parameter_packs_r is incorrectly treating the 'auto' return
type of a lambda as a parameter pack due to Concepts-TS specific logic
added in r6-4517, leading to confusion later when expanding the pattern.
Since we intend on removing Concepts TS support soon anyway, this patch
fixes this by restricting the problematic logic with flag_concepts_ts.
Doing so revealed that add_capture was relying on this logic to set
TEMPLATE_TYPE_PARAMETER_PACK for the 'auto' type of an pack expansion
init-capture, which we now need to do explicitly.
PR c++/115378
gcc/cp/ChangeLog:
* lambda.cc (lambda_capture_field_type): Set
TEMPLATE_TYPE_PARAMETER_PACK on the auto type of an init-capture
pack expansion.
* pt.cc (find_parameter_packs_r) <case TEMPLATE_TYPE_PARM>:
Restrict TEMPLATE_TYPE_PARAMETER_PACK promotion with
flag_concepts_ts.
Roger Sayle [Fri, 7 Jun 2024 13:03:20 +0000 (14:03 +0100)]
i386: PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.
This patch addresses PR target/115351, which is a code quality regression
on x86 when passing floating point complex numbers. The ABI considers
these arguments to have TImode, requiring interunit moves to place the
FP values (which are actually passed in SSE registers) into the upper
and lower parts of a TImode pseudo, and then similar moves back again
before they can be used.
The cause of the regression is that changes in how TImode initialization
is represented in RTL now prevents the RTL optimizers from eliminating
these redundant moves. The specific cause is that the *concatditi3
pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
rtx_cost, preventing fwprop1 from propagating it. This pattern just
sets the hipart and lopart of a double-word register, typically two
instructions (less if reload can allocate things appropriately) but
the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.
This issue is resolved by having ix86_rtx_costs return more reasonable
values for these (place-holder) patterns.
2024-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/115351
* config/i386/i386.cc (ix86_rtx_costs): Provide estimates for
the *concatditi3 and *insvti_highpart patterns, about two insns.
gcc/testsuite/ChangeLog
PR target/115351
* g++.target/i386/pr115351.C: New test case.
with the patch below, we now generate a single instruction:
foo: vpternlogq $232, %ymm2, %ymm1, %ymm0
ret
The AVX512 vpternlog[qd] instructions are a very cool addition to the
x86 instruction set, that can calculate any Boolean function of three
inputs in a single fast instruction. As the truth table for any
three-input function has 8 rows, any specific function can be represented
by specifying those bits, i.e. by a 8-bit byte, an immediate integer
between 0 and 256.
Examples of ternary functions and their indices are given below:
A naive implementation (in many compilers) might be add define_insn
patterns for all 256 different functions. The situation is even
worse as many of these Boolean functions don't have a "canonical form"
(as produced by simplify_rtx) and would each need multiple patterns.
See the space-separated equivalent expressions in the table above.
This need to provide instruction "templates" might explain why GCC,
LLVM and ICC all exhibit similar coverage problems in their ability
to recognize x86 ternlog ternary functions.
Perhaps a unique feature of GCC's design is that in addition to regular
define_insn templates, machine descriptions can also perform pattern
matching via a match_operator (and its corresponding predicate).
This patch introduces a ternlog_operand predicate that matches a
(possibly infinite) set of expression trees, identifying those that
have at most three unique operands. This then allows a
define_insn_and_split to recognize suitable expressions and then
transform them into the appropriate UNSPEC_VTERNLOG as a pre-reload
splitter. This design allows combine to smash together arbitrarily
complex Boolean expressions, then transform them into an UNSPEC
before register allocation. As an "optimization", where possible
ix86_expand_ternlog generates a simpler binary operation, using
AND, XOR, IOR or ANDN where possible, and in a few cases attempts
to "canonicalize" the ternlog, by reordering or duplicating operands,
so that later CSE passes have a hope of spotting equivalent values.
This patch leaves the existing ternlog patterns in sse.md (for now),
many of which are made obsolete by these changes. In theory we now
only need one define_insn for UNSPEC_VTERNLOG. One complication from
these previous variants was that they inconsistently used decimal vs.
hexadecimal to specify the immediate constant operand in assembly
language, making the list of tweaks to the testsuite with this patch
larger than it might have been. I propose to remove the vestigial
patterns in a follow-up patch, once this approach has baked (proven
to be stable) on mainline.
2024-06-07 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
fixup_modeless_constant before testing predicates. Only call
copy_to_mode_reg on memory operands (after the first one).
(ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
into a VEC_DUPLICATE if possible.
(ix86_ternlog_idx): Convert an RTX expression into a ternlog
index between 0 and 255, recording the operands in ARGS, if
possible or return -1 if this is not possible/valid.
(ix86_ternlog_leaf_p): Helper function to identify "leaves"
of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
(ix86_ternlog_operand_p): Test whether a expression is suitable
for and prefered as an UNSPEC_TERNLOG.
(ix86_expand_ternlog_binop): Helper function to construct the
binary operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog_andnot): Helper function to construct a
ANDN operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog): Expand a 3-operand ternary logic
expression, constructing either an UNSPEC_TERNLOG or simpler
rtx expression. Called from builtin expanders and pre-reload
splitters.
* config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here.
(ix86_ternlog_operand_p): Likewise.
(ix86_expand_ternlog): Likewise.
* config/i386/predicates.md (ternlog_operand): New predicate
that calls xi86_ternlog_operand_p.
* config/i386/sse.md (<avx512>_vpternlog<mode>_0): New
define_insn_and_split that recognizes a SET_SRC of ternlog_operand
and expands it via ix86_expand_ternlog pre-reload.
(<avx512>_vternlog<mode>_mask): Convert from define_insn to
define_expand. Use ix86_expand_ternlog if the mask operand is
~0 (or 255 or -1).
(*<avx512>_vternlog<mode>_mask): define_insn renamed from above.
* gcc.target/i386/avx512f-vpternlogd-3.c: New 128-bit test case.
* gcc.target/i386/avx512f-vpternlogd-4.c: New 256-bit test case.
* gcc.target/i386/avx512f-vpternlogd-5.c: New 512-bit test case.
* gcc.target/i386/avx512f-vpternlogq-3.c: New test case.
Michal Jires [Fri, 17 Nov 2023 20:17:18 +0000 (21:17 +0100)]
lto: Implement cache partitioning
This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.
It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.
Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.
First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.
Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.
Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.
Alexandre Oliva [Fri, 7 Jun 2024 10:00:11 +0000 (07:00 -0300)]
[libstdc++] drop workaround for clang<=7
In response to a request in the review of the patch that introduced
_GLIBCXX_CLANG, this patch removes from std/variant an obsolete
workaround for clang 7-.
Richard Biener [Fri, 7 Jun 2024 07:41:11 +0000 (09:41 +0200)]
Fix fold-left reduction vectorization with multiple stmt copies
There's a typo when code generating the mask operand for conditional
fold-left reductions in the case we have multiple stmt copies. The
latter is now allowed for SLP and possibly disabled for non-SLP by
accident.
This fixes the observed run-FAIL for
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c with AVX512
and 256bit sized vectors.
Jonathan Wakely [Mon, 18 Mar 2024 16:58:23 +0000 (16:58 +0000)]
libstdc++: Optimize std::to_address
We can use if-constexpr and variable templates to simplify and optimize
std::to_address. This should compile faster (and run faster for -O0)
than dispatching to the pre-C++20 std::__to_address overloads.
Add finalizer creation to array constructor for functions of derived type.
PR fortran/90068
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_trans_array_ctor_element): Eval non-
variable expressions once only.
(gfc_trans_array_constructor_value): Add statements of
final block.
(trans_array_constructor): Detect when final block is required.
Jakub Jelinek [Fri, 7 Jun 2024 08:32:08 +0000 (10:32 +0200)]
bitint: Fix up lower_addsub_overflow [PR115352]
The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
_25 = .USUBC (0, _24, _14);
_12 = IMAGPART_EXPR <_25>;
_26 = REALPART_EXPR <_25>;
if (_23 >= 1)
goto <bb 8>; [80.00%]
else
goto <bb 11>; [20.00%]
<bb 11> :
# _17 = PHI <_30(9), _22(7), _33(10)>
# _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.
Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all. There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.
So, the optimization just can't work properly and the following patch
removes it.
2024-06-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.
Rainer Orth [Fri, 7 Jun 2024 08:14:23 +0000 (10:14 +0200)]
go: Fix gccgo -v on Solaris with ld
The Go testsuite's go.sum file ends in
Couldn't determine version of /var/gcc/regression/master/11.4-gcc-64/build/gcc/gccgo
on Solaris. It turns out this happens because gccgo -v is confused:
[...]
gcc version 15.0.0 20240531 (experimental) [master a0d60660f2aae2d79685f73d568facb2397582d8] (GCC)
COMPILER_PATH=./:/usr/ccs/bin/
LIBRARY_PATH=./:/lib/amd64/:/usr/lib/amd64/:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-g1' '-B' './' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.'
./collect2 -V -M ./libgcc-unwind.map -Qy /usr/lib/amd64/crt1.o ./crtp.o /usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/lib/amd64/values-xpg6.o ./crtbegin.o -L. -L/lib/amd64 -L/usr/lib/amd64 -t -lgcc_s -lgcc -lc -lgcc_s -lgcc ./crtend.o /usr/lib/amd64/crtn.o
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.3297
Undefined first referenced
symbol in file
main /usr/lib/amd64/crt1.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status
trying to invoke the linker without adding any object file. This only
happens when Solaris ld is in use. gccgo passes -t to the linker in
that case, but does it unconditionally, even with -v.
When configured to use GNU ld, gccgo -v is fine instead.
This patch avoids this by restricting the -t to actually linking.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (ld and gld).
The test SEGVs because it tries a stack acess way beyond the stack
area. As Ian analyzed in the PR, the testcase currently requires
split-stack support, so this patch requires just that.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
Fix returned type to be allocatable for user-functions.
The returned type of user-defined function returning a
class object was not detected and handled correctly, which
lead to memory leaks.
PR fortran/90072
gcc/fortran/ChangeLog:
* expr.cc (gfc_is_alloc_class_scalar_function): Detect
allocatable class return types also for user-defined
functions.
* trans-expr.cc (gfc_conv_procedure_call): Same.
(trans_class_vptr_len_assignment): Compute vptr len
assignment correctly for user-defined functions.
Alexandre Oliva [Wed, 29 May 2024 05:52:07 +0000 (02:52 -0300)]
enable adjustment of return_pc debug attrs
This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute. This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.
for gcc/ChangeLog
* target.def (call_offset_return_label): New hook.
* doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
placeholder.
* doc/tm.texi: Rebuild.
* dwarf2out.cc (struct call_arg_loc_node): Record call_insn
instead of call_arg_loc_note.
(add_AT_lbl_id): Add optional offset argument.
(gen_call_site_die): Compute and pass on a return pc offset.
(gen_subprogram_die): Move call_arg_loc_note computation...
(dwarf2out_var_location): ... from here. Set call_insn.
Pan Li [Mon, 3 Jun 2024 02:43:10 +0000 (10:43 +0800)]
RISC-V: Add testcases for scalar unsigned SAT_ADD form 5
After the middle-end support the form 5 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 5 of unsigned .SAT_ADD.
Form 5:
#define SAT_ADD_U_5(T) \
T sat_add_u_5_##T(T x, T y) \
{ \
return (T)(x + y) < x ? -1 : (x + y); \
}
Passed the riscv fully regression tests.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test macro for form 5.
* gcc.target/riscv/sat_u_add-21.c: New test.
* gcc.target/riscv/sat_u_add-22.c: New test.
* gcc.target/riscv/sat_u_add-23.c: New test.
* gcc.target/riscv/sat_u_add-24.c: New test.
* gcc.target/riscv/sat_u_add-run-21.c: New test.
* gcc.target/riscv/sat_u_add-run-22.c: New test.
* gcc.target/riscv/sat_u_add-run-23.c: New test.
* gcc.target/riscv/sat_u_add-run-24.c: New test.
Pan Li [Mon, 3 Jun 2024 02:33:15 +0000 (10:33 +0800)]
RISC-V: Add testcases for scalar unsigned SAT_ADD form 4
After the middle-end support the form 4 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 4 of unsigned .SAT_ADD.
Form 4:
#define SAT_ADD_U_4(T) \
T sat_add_u_4_##T (T x, T y) \
{ \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
}
Passed the rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test macro for form 4.
* gcc.target/riscv/sat_u_add-17.c: New test.
* gcc.target/riscv/sat_u_add-18.c: New test.
* gcc.target/riscv/sat_u_add-19.c: New test.
* gcc.target/riscv/sat_u_add-20.c: New test.
* gcc.target/riscv/sat_u_add-run-17.c: New test.
* gcc.target/riscv/sat_u_add-run-18.c: New test.
* gcc.target/riscv/sat_u_add-run-19.c: New test.
* gcc.target/riscv/sat_u_add-run-20.c: New test.
Pan Li [Mon, 3 Jun 2024 02:24:47 +0000 (10:24 +0800)]
RISC-V: Add testcases for scalar unsigned SAT_ADD form 3
After the middle-end support the form 3 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 3 of unsigned .SAT_ADD.
Form 3:
#define SAT_ADD_U_3(T) \
T sat_add_u_3_##T (T x, T y) \
{ \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
}
Passed the rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test macro for form 3.
* gcc.target/riscv/sat_u_add-13.c: New test.
* gcc.target/riscv/sat_u_add-14.c: New test.
* gcc.target/riscv/sat_u_add-15.c: New test.
* gcc.target/riscv/sat_u_add-16.c: New test.
* gcc.target/riscv/sat_u_add-run-13.c: New test.
* gcc.target/riscv/sat_u_add-run-14.c: New test.
* gcc.target/riscv/sat_u_add-run-15.c: New test.
* gcc.target/riscv/sat_u_add-run-16.c: New test.
Pan Li [Mon, 3 Jun 2024 01:35:49 +0000 (09:35 +0800)]
RISC-V: Add testcases for scalar unsigned SAT_ADD form 2
After the middle-end support the form 2 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 2 of unsigned .SAT_ADD.
Form 2:
#define SAT_ADD_U_2(T) \
T sat_add_u_2_##T(T x, T y) \
{ \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
}
Passed the rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test macro for form 2.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.
Pan Li [Wed, 29 May 2024 06:15:45 +0000 (14:15 +0800)]
RISC-V: Add testcases for scalar unsigned SAT_ADD form 1
After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.
Form 1:
#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
return (T)(x + y) >= x ? (x + y) : -1; \
}
Passed the riscv fully regression tests.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.
Pan Li [Thu, 6 Jun 2024 01:19:53 +0000 (09:19 +0800)]
Match: Support more form for scalar unsigned SAT_ADD
After we support one gassign form of the unsigned .SAT_ADD, we
would like to support more forms including both the branch and
branchless. There are 5 other forms of .SAT_ADD, list as below:
Form 1:
#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
return (T)(x + y) >= x ? (x + y) : -1; \
}
Form 2:
#define SAT_ADD_U_2(T) \
T sat_add_u_2_##T(T x, T y) \
{ \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
}
Form 3:
#define SAT_ADD_U_3(T) \
T sat_add_u_3_##T (T x, T y) \
{ \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
}
Form 4:
#define SAT_ADD_U_4(T) \
T sat_add_u_4_##T (T x, T y) \
{ \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
}
Form 5:
#define SAT_ADD_U_5(T) \
T sat_add_u_5_##T(T x, T y) \
{ \
return (T)(x + y) < x ? -1 : (x + y); \
}
Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
long unsigned int _1;
long unsigned int _2;
uint64_t _3;
__complex__ long unsigned int _6;
The below test suites are passed for this patch.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.
gcc/ChangeLog:
* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (cmp_operand): Add match_phi comparation.
(dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
new func impl to build call for phi gimple.
(match_unsigned_saturation_add): Add new func impl to match the
.SAT_ADD for phi gimple.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.
Jakub Jelinek [Thu, 6 Jun 2024 20:12:11 +0000 (22:12 +0200)]
c: Fix up pointer types to may_alias structures [PR114493]
The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
gcc_assert (TYPE_CANONICAL (t2) != t2
&& TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.
This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.
The following patch does that in the C FE as well.
2024-06-06 Jakub Jelinek <jakub@redhat.com>
PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.
* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.
Pengxuan Zheng [Fri, 31 May 2024 00:53:23 +0000 (17:53 -0700)]
aarch64: Add vector floating point extend pattern [PR113880, PR113869]
This patch adds vector floating point extend pattern for V2SF->V2DF and
V4HF->V4SF conversions by renaming the existing aarch64_float_extend_lo_<Vwide>
pattern to the standard optab one, i.e., extend<mode><Vwide>2. This allows the
vectorizer to vectorize certain floating point widening operations for the
aarch64 target.
PR target/113880
PR target/113869
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_
builtin codes to standard optab ones.
* config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_<Vwide>): Rename
to...
(extend<mode><Vwide>2): ... This.
This patch simplifies the real type build functions by using
the default float_type_node, double_type_node rather than create
new nodes. It also uses the default GCC long_double_type_node
or float128_type_nodes for longreal.
gcc/m2/ChangeLog:
* gm2-gcc/m2type.cc (build_m2_short_real_node): Rewrite
to use the default float_type_node.
(build_m2_real_node): Rewrite to use the default
double_type_node.
(build_m2_long_real_node): Rewrite to use the default
long_double_type_node or float128_type_node.
Co-Authored-By: Kewen.Lin <linkw@linux.ibm.com> Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
Richard Ball [Thu, 6 Jun 2024 15:10:14 +0000 (16:10 +0100)]
arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.
The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs
which causes suboptimal codegen due to missed optimisation
opportunities. This patch also adds a test for thumb2
switch statements as none exist currently.
Andre Vieira [Thu, 6 Jun 2024 15:02:50 +0000 (16:02 +0100)]
arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]
This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.
libgcc/ChangeLog:
PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
Tamar Christina [Thu, 6 Jun 2024 13:35:48 +0000 (14:35 +0100)]
AArch64: correct constraint on Upl early clobber alternatives
I made an oversight in the previous patch, where I added a ?Upa
alternative to the Upl cases. This causes it to create the tie
between the larger register file rather than the constrained one.
This fixes the affected patterns.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (@aarch64_pred_cmp<cmp_op><mode>,
*cmp<cmp_op><mode>_cc, *cmp<cmp_op><mode>_ptest,
@aarch64_pred_cmp<cmp_op><mode>_wide,
*aarch64_pred_cmp<cmp_op><mode>_wide_cc,
*aarch64_pred_cmp<cmp_op><mode>_wide_ptest): Fix Upl tie alternative.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>): Fix
Upl tie alternative.
Thomas Schwinge [Wed, 5 Jun 2024 11:11:04 +0000 (13:11 +0200)]
nvptx, libgcc: Stub unwinding implementation
Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.
The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.
libgcc/ChangeLog:
* config/nvptx/t-nvptx: Add unwind-nvptx.c.
* config/nvptx/unwind-nvptx.c: New file.
Thomas Schwinge [Fri, 10 May 2024 10:50:23 +0000 (12:50 +0200)]
nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred'
For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'): the '0xffffffff'
bitmask isn't correct if not all 32 threads of a warp are active. The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.
We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing. (I've tested '-muniform-simt'
code executing single-threaded.)
The change that I proposed on 2022-12-15 was to emit PTX code to calculate
'(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'.
This works, but the PTX JIT generates SASS code to do this computation.
In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon
the original code a little bit, see the following examplary SASS 'diff' before
vs. after this change:
The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression tests.
gcc/ChangeLog:
* match.pd: Add new form for vector mode recog.
* tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
new match func decl;
(vect_recog_build_binary_gimple_call): Extract helper func to
build gcall with given internal_fn.
(vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.
Michal Jires [Tue, 9 Jan 2024 16:49:34 +0000 (17:49 +0100)]
lto: Remove random_seed from section name.
This patch removes suffixes from section names during LTO linking.
These suffixes were originally added for ld -r to work (PR lto/44992).
They were added to all LTO object files, but are only useful before WPA.
After that they waste space, and if kept random, make LTO caching impossible.
Bootstrapped/regtested on x86_64-pc-linux-gnu
gcc/ChangeLog:
* lto-streamer.cc (lto_get_section_name): Remove suffixes after WPA.
gcc/lto/ChangeLog:
* lto-common.cc (lto_section_with_id): Dont load suffix during LTRANS.
Hongyu Wang [Wed, 8 May 2024 03:08:42 +0000 (11:08 +0800)]
[APX CCMP] Support ccmp for float compare
The ccmp insn itself doesn't support fp compare, but x86 has fp comi
insn that changes EFLAG which can be the scc input to ccmp. Allow
scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
compare which can not be identified in ccmp.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_gen_ccmp_first):
Add fp compare and check the allowed fp compare type.
(ix86_gen_ccmp_next): Adjust compare_code input to ccmp for
fp compare.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ccmp-1.c: Add test for fp compare.
* gcc.target/i386/apx-ccmp-2.c: Likewise.
Hongyu Wang [Tue, 9 Apr 2024 08:05:26 +0000 (16:05 +0800)]
[APX CCMP] Adjust startegy for selecting ccmp candidates
For general ccmp scenario, the tree sequence is like
_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2
current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.
It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first. The costs are not meaningful for failed expansions.
Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.
gcc/ChangeLog:
* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
expand_ccmp_next, returns the valid one first instead of
comparing cost.
Hongyu Wang [Wed, 27 Mar 2024 02:13:06 +0000 (10:13 +0800)]
[APX CCMP] Support APX CCMP
APX CCMP feature implements conditional compare which executes compare
when EFLAGS matches certain condition.
CCMP introduces default flags value (dfv), when conditional compare does
not execute, it will directly set the flags according to dfv.
The instruction goes like
ccmpeq {dfv=sf,of,cf,zf} %rax, %r16
For this instruction, it will test EFLAGS regs if it matches conditional
code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the
EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are
set. PF will be set according to CF in dfv, and AF will always be
cleared.
The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which
sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS.
To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and
TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we
extended the cstorem4 optab to support storing different CCmode to fit
current ccmp infrasturcture.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function
that test if the first compare can be generated.
(ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp
sequence.
* config/i386/i386-opts.h (enum apx_features): Add apx_ccmp.
* config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto
declare.
(ix86_gen_ccmp_next): Likewise.
(ix86_get_flags_cc): Likewise.
* config/i386/i386.cc (ix86_flags_cc): New enum.
(ix86_ccmp_dfv_mapping): New string array to map conditional
code to dfv.
(ix86_print_operand): Handle special dfv flag for CCMP.
(ix86_get_flags_cc): New function to return x86 CC enum.
(TARGET_GEN_CCMP_FIRST): Define.
(TARGET_GEN_CCMP_NEXT): Likewise.
* config/i386/i386.h (TARGET_APX_CCMP): Define.
* config/i386/i386.md (@ccmp<mode>): New define_insn to support
ccmp.
(UNSPEC_APX_DFV): New unspec for ccmp dfv.
(ALL_CC): New mode iterator.
(cstorecc4): Change to ...
(cstore<mode>4) ... this, use ALL_CC to loop through all
available CCmodes.
* config/i386/i386.opt (apx_ccmp): Add enum value for ccmp.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-ccmp-1.c: New compile test.
* gcc.target/i386/apx-ccmp-2.c: New runtime test.
Hongyu Wang [Thu, 6 Jun 2024 05:00:26 +0000 (13:00 +0800)]
[APX] Adjust target-support check [PR 115341]
Current target apxf check does not specify sub-features that assembler
supports, so the check with older binutils will fail at assemble stage
for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check
for all apx subfeatures.
gcc/testsuite/ChangeLog:
PR target/115341
* lib/target-supports.exp (check_effective_target_apxf):
Check for all apx sub-features.
Richard Biener [Tue, 5 Mar 2024 14:46:24 +0000 (15:46 +0100)]
Allow single-lane SLP in-order reductions
The single-lane case isn't different from non-SLP, no re-association
implied. But the transform stage cannot handle a conditional reduction
op which isn't checked during analysis - this makes it work, exercised
with a single-lane non-reduction-chain by gcc.target/i386/pr112464.c
Alexandre Oliva [Thu, 6 Jun 2024 01:43:54 +0000 (22:43 -0300)]
[libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__
A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined. Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.
So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.
I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit. pstl_config.h, though also
imported, required adjustment.
Adjust rtx_cost for MEM to enable more simplication
For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
variants in ix86_vector_duplicate_simode_const.
Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
bit larger than broadcast.
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.
When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.
i.e Simplify
(and:v8hi
(ashifrt:v8hi A 8)
(const_vector 0xff x8))
to
(lshifrt:v8hi A 8)
gcc/ChangeLog:
PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.