Juzhe-Zhong [Mon, 22 May 2023 14:05:18 +0000 (22:05 +0800)]
RISC-V: Add "m_" prefix for private member
Since the current framework is hard to maintain and
hard to be used in the future possible auto-vectorization patterns.
We will need to keep adding more helpers and arguments during the
auto-vectorization supporting. We should refactor the framework
now for the future use since the we don't support too much
auto-vectorization
patterns for now.
Start with this simple patch, this patch is adding "m_" prefix for
private the members.
Juzhe-Zhong [Mon, 22 May 2023 10:38:26 +0000 (18:38 +0800)]
RISC-V: Fix typo of multiple_rgroup-2.h
Just notice this following fail in the regression:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c (test for
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c (test
for excess errors)
Eric Botcazou [Fri, 10 Feb 2023 18:07:33 +0000 (19:07 +0100)]
ada: Small cleanup in support for protected subprograms
This moves the propagation of the Uses_Sec_Stack flag, from the original to
the rewritten subprogram, to the point where the latter is expanded, along
with the propagation of the Has_Nested_Subprogram flag, as well as addresses
a ??? comment in the same block of code. No functional changes.
gcc/ada/
* inline.adb (Cleanup_Scopes): Do not propagate the Uses_Sec_Stack
flag from original to rewritten protected subprograms here...
* exp_ch9.adb (Expand_N_Protected_Body) <N_Subprogram_Body>:
...but here instead. Add local variables and remove a useless
test.
Piotr Trojanek [Fri, 10 Feb 2023 15:28:41 +0000 (16:28 +0100)]
ada: Fix source location for crashes in expanded Loop_Entry attributes
Historically, Loop_Entry attributes were expanded while expanding their
corresponding loops, so it was easier to use location of these loops for
expanded code. Now, these attributes are expanded where they appear, so
we can easily use the location of the attribute reference for expanded
code.
This matters when there is a crash in the expanded code, e.g. because of
a stack overflow in the declaration of an constant object that captures
the Loop_Entry prefix. Now backtrace will point to the source location
of the attribute, which is more helpful than the location of the loop.
gcc/ada/
* exp_attr.adb (Expand_Loop_Entry_Attribute): Use location of the
attribute reference, not of the loop statement.
Justin Squirek [Thu, 9 Feb 2023 17:00:46 +0000 (17:00 +0000)]
ada: Incorrect constant folding in postcondition involving 'Old
The following patch fixes an issue in the compiler whereby certain flavors of
access comparisons may be incorrectly constant-folded out of contract
expressions - notably in postcondition expressions featuring a reference to
'Old.
gcc/ada/
* checks.adb (Install_Null_Excluding_Check): Avoid non-null
optimizations when assertions are enabled.
Marc Poulhiès [Thu, 9 Feb 2023 08:36:14 +0000 (09:36 +0100)]
ada: Fix crash caused by incorrect expansion of iterated component
The way iterated component are expanded could lead to inconsistent tree.
This change fixes 2 issues:
- in an early step during Pre_Analyze, the loop variable still has
Any_Type and the compiler must not emit an error. A later full Analyze
is supposed to correctly set the Etype, and only then should the
compiler emit an error if Any_Type is still used.
- when expanding into a loop with assignments statement, the expression
is analyzed in an early context (where the loop variable still has
Any_Type Etype) and then copied. The compiler would crash because this
Any_Type is never changed because the expression node has its Analyzed
flag set. Resetting the flag ensures the later Analyze call also
analyzes these nodes and set Etype correctly.
gcc/ada/
* exp_aggr.adb (Process_Transient_Component): Reset Analyzed flag
for the copy of the initialization expression.
* sem_attr.adb (Validate_Non_Static_Attribute_Function_Call): Skip
error emission during Pre_Analyze.
Eric Botcazou [Thu, 9 Feb 2023 15:05:16 +0000 (16:05 +0100)]
ada: Fix missing finalization in separate package body
This directly comes from a loophole in the implementation.
gcc/ada/
* exp_ch7.adb (Process_Package_Body): New procedure taken from...
(Build_Finalizer.Process_Declarations): ...here. Call the above
procedure to deal with both package bodies and package body stubs.
The concept of extended nodes was retired with the introduction of
variable-sized node types, but a reference to that concept was left
over in a comment. This change removes that reference.
Eric Botcazou [Wed, 8 Feb 2023 21:16:23 +0000 (22:16 +0100)]
ada: Fix missing finalization in library-unit instance spec
This fixes the missing finalization of objects declared in the spec of
package instances that are library units (and only them, i.e. not all
library-level package instances) when the instances have a package body.
The finalization is done when there is no package body, and supporting
this case precisely broke the other case because of a thinko or a typo.
This also requires a small adjustment to the routine writing ALI files.
gcc/ada/
* exp_ch7.adb (Build_Finalizer): Reverse the test comparing the
instantiation and declaration nodes of a package instance, and
therefore bail out only when they are equal. Adjust comments.
(Expand_N_Package_Declaration): Do not clear the Finalizer field.
* lib-writ.adb: Add with and use clauses for Sem_Util.
(Write_Unit_Information): Look at unit nodes to find finalizers.
* sem_ch12.adb (Analyze_Package_Instantiation): Beef up the comment
about the rewriting of the instantiation node into a declaration.
Eric Botcazou [Wed, 8 Feb 2023 15:26:46 +0000 (16:26 +0100)]
ada: Fix spurious freezing error on nonabstract null extension
This prevents the wrapper function created for each nonoverridden inherited
function with a controlling result of nonabstract null extensions of tagged
types from causing premature freezing of types referenced in its profile.
gcc/ada/
* exp_ch3.adb (Make_Controlling_Function_Wrappers): Create the body
as the expanded body of an expression function.
Bob Duff [Mon, 6 Feb 2023 16:58:52 +0000 (11:58 -0500)]
ada: Add Is_Past_Self_Hiding_Point flag
This patch adds a flag Is_Past_Self_Hiding_Point. When False,
this will replace E_Void as the indicator for a premature use of
a declaration within itself -- for example, "X : T := X;".
One might think this flag should be called something like
Is_Hidden_From_All_Visibility, reversing the sense of
Is_Past_Self_Hiding_Point. We don't do that because we want
Is_Past_Self_Hiding_Point to be initially False by default (and we have
no mechanism for defaulting to True), and because it doesn't exactly
match the RM definition of "hidden from all visibility" (for
example, for record components).
This is work in progress; more changes are needed before we
can remove all Mutate_Ekind(..., E_Void).
gcc/ada/
* einfo.ads (Is_Past_Self_Hiding_Point): Document.
* gen_il-fields.ads (Is_Past_Self_Hiding_Point): Add to list of
fields.
* gen_il-gen-gen_entities.adb (Is_Past_Self_Hiding_Point): Declare
in all entities.
* exp_aggr.adb: Set Is_Past_Self_Hiding_Point as appropriate.
* sem.adb: Likewise.
* sem_aggr.adb: Likewise.
* sem_ch11.adb: Likewise.
* sem_ch12.adb: Likewise.
* sem_ch5.adb: Likewise.
* sem_ch7.adb: Likewise.
* sem_prag.adb: Likewise.
* sem_ch6.adb: Likewise.
(Set_Formal_Mode): Minor cleanup: Move from spec.
* sem_ch6.ads:
(Set_Formal_Mode): Minor cleanup: Move to body.
* cstand.adb: Call Set_Is_Past_Self_Hiding_Point on all entities
as soon as they are created.
* comperr.adb (Compiler_Abort): Minor cleanup -- use 'in' instead
of 'or else'.
* debug.adb: Minor comment cleanups.
Steve Baird [Fri, 3 Feb 2023 01:33:53 +0000 (17:33 -0800)]
ada: Accept Assert pragmas in expression functions
gcc/ada/
* sem_ch4.adb (Analyze_Expression_With_Actions.Check_Action_Ok):
Accept an executable pragma occuring in a declare expression as
per AI22-0045. This means Assert and Inspection_Point pragmas as
well as any implementation-defined pragmas that the implementation
chooses to categorize as executable. Currently Assume and Debug
are the only such pragmas.
Piotr Trojanek [Mon, 6 Feb 2023 13:45:22 +0000 (14:45 +0100)]
ada: Add warning on frontend inlining of Subprogram_Variant
We already warned when contracts like pre/postcondition appear together
with pragma Inline_Always and they are ignored by the frontend inlining.
For consistency we now also warn for Subprogram_Variant, which is
similarly ignored even though this contract is only meaningful for
recursive subprograms and those can't be inlined anyway (but error about
this might only be emitted when full compilation is done).
gcc/ada/
* sem_prag.adb
(Check_Postcondition_Use_In_Inlined_Subprogram): Mention
Subprogram_Variant in the comment.
(Analyze_Subprogram_Variant_In_Decl_Part): Warn when contract is
ignored because of pragma Inline_Always and frontend inlining.
Piotr Trojanek [Mon, 6 Feb 2023 13:40:26 +0000 (14:40 +0100)]
ada: Fix spurious warning on Inline_Always and contracts
Warnings about pre/postconditions being ignored with Inline_Always were
only true for the obsolete frontend inlining. With the current backend
pre/postconditions work fine with Inline_Always.
gcc/ada/
* sem_prag.adb (Check_Postcondition_Use_In_Inlined_Subprogram): Only
emit warning when frontend inlining is enabled.
Piotr Trojanek [Wed, 1 Feb 2023 12:24:43 +0000 (13:24 +0100)]
ada: Remove redundant protection against empty lists
Calls to List_Length on No_List intentionally return 0 (and likewise
call to First on No_List intentionally return Empty), so explicit guards
against No_List are unnecessary. Code cleanup; semantics is unaffected.
Routine Is_Actual_Tagged_Parameter was added to detect unsupported SPARK
2005 constructs, but this feature was deconstructed in favor of SPARK
2014 and its SPARK_Mode aspects.
Joffrey Huguet [Mon, 16 Jan 2023 15:44:14 +0000 (16:44 +0100)]
ada: Add contracts to Ada.Strings.Unbounded library
This patch adds contracts to the conversions between
Unbounded_String and String, the Element function and the
equality between two Unbounded_String, or between
Unbounded_String and String.
This patch also disallows the use of a function in SPARK, because
it returns an uninitialized Unbounded_String.
gcc/ada/
* libgnat/a-strunb.ads, libgnat/a-strunb__shared.ads
(To_Unbounded_String): Add postcondition. Add aspect SPARK_Mode
Off on the version that takes a Natural as parameter.
(To_String): Complete postcondition.
(Set_Unbounded_String): Add postcondition.
(Element): Likewise.
("="): Likewise.
Eric Botcazou [Wed, 1 Feb 2023 13:15:19 +0000 (14:15 +0100)]
ada: Implement conversions from Big_Integer to large types
This implements the conversion from Big_Integer to Long_Long_Unsigned on
32-bit platforms and to Long_Long_Long_{Integer,Unsigned} on 64-bit ones.
gcc/ada/
* libgnat/s-genbig.ads (From_Bignum): New overloaded declarations.
* libgnat/s-genbig.adb (LLLI): New subtype.
(LLLI_Is_128): New boolean constant.
(From_Bignum): Change the return type of the signed implementation
to Long_Long_Long_Integer and add support for the case where its
size is 128 bits. Add a wrapper around it for Long_Long_Integer.
Add an unsigned implementation returning Unsigned_128 and a wrapper
around it for Unsigned_64.
(To_Bignum): Test LLLI_Is_128 instead of its size.
(To_String.Image): Add qualification to calls to From_Bignum.
* libgnat/a-nbnbin.adb (To_Big_Integer): Likewise.
(Signed_Conversions.From_Big_Integer): Likewise.
(Unsigned_Conversions): Likewise.
Eric Botcazou [Wed, 1 Feb 2023 11:35:08 +0000 (12:35 +0100)]
ada: Fix error and crash on imported function with precondition and 'Base
This fixes a spurious error on an imported function with a precondition
and a parameter declared with a 'Base formal type, and even a crash in
the case where this function is declared in a generic package.
gcc/ada/
* freeze.adb (Wrap_Imported_Subprogram): Use Copy_Subprogram_Spec
to copy the spec from the subprogram to the generated subprogram
body.
(Freeze_Entity): Do not wrap imported subprograms inside generics.
Steve Baird [Tue, 31 Jan 2023 01:05:13 +0000 (17:05 -0800)]
ada: Reject illegal declarations in expression functions
gcc/ada/
* sem_ch4.adb (Analyze_Expression_With_Actions.Check_Action_Ok):
If Comes_From_Source (A) is False, then look at Original_Node (A)
instead of A. In particular, if an (illegal) expression function
is transformed into a "vanilla" function, we don't want to allow
it just because Comes_From_Source is now False.
Steve Baird [Mon, 30 Jan 2023 23:41:48 +0000 (15:41 -0800)]
ada: Better error message if non-Ada2022 code declares No_Return function
When a feature that is legal in Ada2022 but not in earlier Ada versions
is used, we typically want to call Error_Msg_Ada_2022_Feature in order to
generate an informative message in the error case. Specifying No_Return
for a function (as opposed to a procedure) is no exception to this rule.
gcc/ada/
* sem_prag.adb (Analyze_Pragma): In Check_No_Return, call
Error_Msg_Ada_2022_Feature in the case of a function. Remove code
outside of Check_No_Return that was querying Ada_Version.
Piotr Trojanek [Tue, 31 Jan 2023 12:45:23 +0000 (13:45 +0100)]
ada: Fix traversal for the rightmost node of a pretty-printed expression
When getting the rightmost node of a pretty-printed expression we
incorrectly traversed some composite nodes, which caused the expression
image to be chopped.
gcc/ada/
* pprint.adb (Expression_Image): Reduce scope of local variables; inline
local uncommented constant From_Source; concatenate string with a single
character, as it is likely to execute faster; add missing cases to
traversal for the rightmost node and assertion to demonstrate that the
??? comment is no longer relevant.
Piotr Trojanek [Mon, 30 Jan 2023 11:21:24 +0000 (12:21 +0100)]
ada: Restrict expression pretty-printer to subexpressions
When pretty-printing expressions with a CASE alternatives we can qualify
the call to Nkind using N_Subexpr, so that we will get compile-time
errors when new node kinds are added (e.g. Ada 2022 case expressions).
gcc/ada/
* pprint.adb (Expr_Name): Qualify CASE expression with N_Subexpr; add
missing alternative for N_Raise_Storage_Error; remove dead alternatives;
explicitly list unsupported alternatives.
Piotr Trojanek [Fri, 27 Jan 2023 11:37:25 +0000 (12:37 +0100)]
ada: Update Controlling_Argument when copying trees
When copying the AST we need to update fields that carry semantic
meaning and not just copy them. We already updated some of them,
e.g. the First/Next_Named_Association chain, but failed to update
the Controlling_Argument.
This fix doesn't appear to change anything for the compiler, but it is
needed for GNATprove, where we no longer want to expand expression
functions and instead we want to copy their preanalyzed expressions.
gcc/ada/
* sem_util.ads (New_Copy_Tree): Update comment.
* sem_util.adb (New_Copy_Tree): Update Controlling_Argument, very
much like we update the First/Next_Named_Association.
Bob Duff [Mon, 30 Jan 2023 21:56:08 +0000 (16:56 -0500)]
ada: update Ada_Version_Type in fe.h to match opt.ads
Remove Ada_With_Extensions, which is not used on the C side.
Do not add Ada_With_Core_Extensions and Ada_With_All_Extensions,
which are also not used on the C side, and on the Ada side
are always used via functions All_Extensions_Allowed and
Core_Extensions_Allowed. Explain this in comments.
Move the functions closer to the type declaration,
so the usage style is clearer.
Cleanup only -- no change in compiler behavior.
gcc/ada/
* fe.h: Remove Ada_With_Extensions and add commentary.
* opt.ads: Rearrange code and add commentary.
Bob Duff [Mon, 30 Jan 2023 16:25:08 +0000 (11:25 -0500)]
ada: prevent infinite recursion in Collect_Types_In_Hierarchy
In (illegal) mutually-dependent type declarations, it is possible for
Etype (Etype (Typ)) to point back to Typ. This patch stops the recursion
in such cases.
Ju-Zhe Zhong [Mon, 22 May 2023 08:35:37 +0000 (16:35 +0800)]
VECT: Fix bug of multiple-rgroup for length is counting elements
Address comments from Richard that splits the patch of fixing
multiple-rgroup
handling of length counting elements.
This patch is fixing issue of handling multiple-rgroup of length is
counting elements
Before this patch, multiple rgroup run fail:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
execution test
After this patch, These tests are all passed.
gcc/ChangeLog:
* tree-vect-loop.cc (vect_get_loop_len): Fix issue for
multiple-rgroup of length.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_get_loop_len): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c:
New test.
Kewen Lin [Mon, 22 May 2023 02:19:02 +0000 (21:19 -0500)]
vect: Refactor code for index == count in vect_transform_slp_perm_load_1
This patch is to refactor the handlings for the case (index
== count) in a loop of vect_transform_slp_perm_load_1, in
order to prepare a subsequent adjustment on *nperm. This
patch doesn't have any functional changes.
Basically this is to rewrite two if below:
if (index == count && !noop_p)
{
// A ...
// ++*n_perms;
}
if (index == count)
{
if (!analyze_only)
{
if (!noop_p)
// B1 ...
// B2 ...
for ...
{
if (!noop_p)
// B3 building VEC_PERM_EXPR
else
// B4 building nothing (no uses for B2 and its seq)
}
}
// B5
}
into one hunk below:
if (index == count)
{
if (!noop_p)
{
// A ...
// ++*n_perms;
if (!analyze_only)
{
// B1 ...
// B2 ...
for ...
// B3 building VEC_PERM_EXPR
}
}
else if (!analyze_only)
{
// no B2 since no any further uses here.
for ...
// B4 building nothing
}
// B5 ...
}
gcc/ChangeLog:
* tree-vect-slp.cc (vect_transform_slp_perm_load_1): Refactor the
handling for the case index == count.
Tobias Burnus [Sun, 21 May 2023 18:36:19 +0000 (20:36 +0200)]
libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]
The nteams-var ICV exists per device and can be set either via the routine
omp_set_num_teams or as environment variable (OMP_NUM_TEAMS with optional
_ALL/_DEV/_DEV_<num> suffix); it is default-initialized to zero. The number
of teams created is described under the num_teams clause. If the clause is
absent, the number of teams is implementation defined but at least
one team must exist and, if nteams-var is positive, at most nteams-var
teams may exist.
The latter condition was not honored in a target region before this
commit, such that too many teams were created.
Already before this commit, both the num_teams([lower:]upper) clause
(on the host and in target regions) and, only on the host, the nteams-var
ICV were honored. And as only one teams is created for host fallback,
unless the clause specifies otherwise, the nteams-var ICV was and is
effectively honored.
libgomp/ChangeLog:
PR libgomp/109875
* config/gcn/target.c (GOMP_teams4): Honor nteams-var ICV.
* config/nvptx/target.c (GOMP_teams4): Likewise.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-2.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-3.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-4.c: New test.
Georg-Johann Lay [Sun, 21 May 2023 16:54:21 +0000 (18:54 +0200)]
target/90622: __builtin_avr_insert bits: Use BLD/BST for one bit in place.
If just one bit is inserted in the same position like with:
__builtin_avr_insert_bits (0xFFFFF2FF, src, dst);
a BLD/BST sequence is better than XOR/AND/XOR. Thus, don't fold that
case to the latter sequence.
gcc/
PR target/90622
* config/avr/avr.cc (avr_fold_builtin) [AVR_BUILTIN_INSERT_BITS]:
Don't fold to XOR / AND / XOR if just one bit is copied to the
same position.
Roger Sayle [Sun, 21 May 2023 14:06:52 +0000 (15:06 +0100)]
nvptx: Add suppport for __builtin_nvptx_brev instrinsic.
This patch adds support for (a pair of) bit reversal intrinsics
__builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
and 64-bit bit reversal (using nvptx's brev instruction) matching
the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
2023-05-21 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target
builtin for bit reversal using brev instruction.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and
NVPTX_BUILTIN_BREVLL.
(nvptx_init_builtins): Define "brev" and "brevll".
(nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and
NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function.
* doc/extend.texi (Nvidia PTX Builtin-in Functions): New
section, document __builtin_nvptx_brev{,ll}.
gcc/testsuite/ChangeLog
* gcc.target/nvptx/brev-1.c: New 32-bit test case.
* gcc.target/nvptx/brev-2.c: Likewise.
* gcc.target/nvptx/brevll-1.c: New 64-bit test case.
* gcc.target/nvptx/brevll-2.c: Likewise.
Jakub Jelinek [Sun, 21 May 2023 11:36:56 +0000 (13:36 +0200)]
atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
(x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
simplification actually relies on the (CST1 & CST2) simplification,
otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
running into
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
operands are another bit-wise operation with a common input. If so,
distribute the bit operations to save an operation and possibly two if
constants are involved. For example, convert
(A | B) & (A | C) into A | (B & C)
Further simplification will occur if B and C are constants. */
simplification which simplifies that
(x & CST2) | (CST1 & CST2) back to
CST2 & (x | CST1).
I went through all other places I could find where we have a simplification
with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
while the other spots aren't that severe (just trade 2 operations for
another 2 if the two constants don't simplify, rather than as in the above
case trading 2 ops for 3), I still think all those spots really intend
to optimize only if the 2 constants simplify.
So, the following patch adds to those a ! modifier to ensure that,
even at GENERIC that modifier means !EXPR_P which is exactly what we want
IMHO.
2023-05-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109505
* match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
Combine successive equal operations with constants,
(A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
operands.
Andrew Pinski [Sun, 21 May 2023 04:01:46 +0000 (21:01 -0700)]
Fix expand_single_bit_test for big-endian
I had thought extract_bit_field bitpos argument was the shifted position
and not the bitposition like BIT_FIELD_REF so I had removed the code which
would use the correct bitposition for BYTES_BIG_ENDIAN.
Committed as obvious; I checked big-endian MIPS to make sure we are now
producing the correct code.
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Correct bitpos for big-endian.
These APIs help the users to convert vector LMUL=1 integer to
vbool[2-64]_t. According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying
contents.
For example, given below code.
vbool64_t test_vreinterpret_v_u8m1_b64 (vuint8m1_t src) {
return __riscv_vreinterpret_v_u8m1_b64 (src);
}
It will generate the assembly code similar as below:
vsetvli a5,zero,e8,mf8,ta,ma
vlm.v v1,0(a1)
vsm.v v1,0(a0)
ret
Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
The reinterpret from vbool*_t to v{u}int*_t with lmul=1 will be coverred
int another PATCH.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): Add the
rest bool size, aka 2, 4, 8, 16, 32, 64.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vbool[2|4|8|16|32|64] interpret function.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(vint8m1_t): Add the type to bool[2|4|8|16|32|64]_interpret_ops.
(vint16m1_t): Likewise.
(vint32m1_t): Likewise.
(vint64m1_t): Likewise.
(vuint8m1_t): Likewise.
(vuint16m1_t): Likewise.
(vuint32m1_t): Likewise.
(vuint64m1_t): Likewise.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vbool[2|4|8|16|32|64] interpret case.
* config/riscv/riscv-vector-builtins.def (bool2_interpret): Add
vbool2_t interprect to base type.
(bool4_interpret): Likewise.
(bool8_interpret): Likewise.
(bool16_interpret): Likewise.
(bool32_interpret): Likewise.
(bool64_interpret): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Add
test cases for vbool[2|4|8|16|32|64]_t.
Andrew Pinski [Sat, 20 May 2023 21:14:23 +0000 (21:14 +0000)]
Fix PR 109919: ICE in emit_move_insn with some bit tests
The problem is I used expand_expr with the target but
we don't want to use the target here as it is the wrong
mode for the original expression. The testcase would ICE
deap down while trying to do a move to use the target.
Anyways just calling expand_expr with NULL_EXPR fixes
the issue.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/109919
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Don't use the
target for expand_expr.
Pan Li [Fri, 19 May 2023 23:49:00 +0000 (07:49 +0800)]
Mode-Switching: Fix local array maybe uninitialized warning
There are 2 local array in function optimize_mode_switching. It will be
initialized conditionally at the beginning but then always consumed in
another loop. It may trigger the warning maybe-uninitialized, and may
result in build failure when enable werror, aka warning as error.
This patch will initialize the local array to zero explictly when
declaration.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* mode-switching.cc (entity_map): Initialize the array to zero.
(bb_info): Ditto.
Triffid Hunter [Sat, 20 May 2023 05:50:00 +0000 (07:50 +0200)]
target/105753: Fix ICE in add_clobbers due to extra PARALLEL in insn.
This patch removes the superfluous parallel in [u]divmod patterns in
the AVR backend. Effect of extra parallel is that add_clobbers reaches
gcc_unreachable() because the clobbers for [u]divmod are missing.
If an insn has multiple parts like clobbers, the parallel around the
parts of the insn pattern is implicit.
gcc/
PR target/105753
* config/avr/avr.md (divmodpsi, udivmodpsi, divmodsi, udivmodsi):
Remove superfluous "parallel" in insn pattern.
([u]divmod<mode>4): Tidy code. Use gcc_unreachable() instead of
printing error text to assembly.
gcc/testsuite/
PR target/105753
* gcc.target/avr/torture/pr105753.c: New test.
Andrew Pinski [Fri, 19 May 2023 22:09:04 +0000 (22:09 +0000)]
Expand directly for single bit test
Instead of using creating trees to the expansion,
just expand directly which makes the code a little simplier
but also reduces how much GC memory will be used during the expansion.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test): Rename to ...
(expand_single_bit_test): This and expand directly.
(do_store_flag): Update for the rename function.
Andrew Pinski [Fri, 19 May 2023 19:44:35 +0000 (19:44 +0000)]
Use BIT_FIELD_REF inside fold_single_bit_test
Instead of depending on combine to do the extraction,
Let's create a tree which will expand directly into
the extraction. This improves code generation on some
targets.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test): Use BIT_FIELD_REF
instead of shift/and.
Andrew Pinski [Fri, 19 May 2023 18:52:45 +0000 (18:52 +0000)]
Simplify fold_single_bit_test with respect to code
Since we know that fold_single_bit_test is now only passed
NE_EXPR or EQ_EXPR, we can simplify it and just use a gcc_assert
to assert that is the code that is being passed.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test): Add an assert
and simplify based on code being NE_EXPR or EQ_EXPR.
Andrew Pinski [Fri, 19 May 2023 18:36:39 +0000 (18:36 +0000)]
Simplify fold_single_bit_test slightly
Now the only use of fold_single_bit_test is in do_store_flag,
we can change it such that to pass the inner arg and bitnum
instead of building a tree. There is no code generation changes
due to this change, only a decrease in GC memory that is produced
during expansion.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test): Take inner and bitnum
instead of arg0 and arg1. Update the code.
(do_store_flag): Don't create a tree when calling
fold_single_bit_test instead just call it with the bitnum
and the inner tree.
Andrew Pinski [Fri, 19 May 2023 18:21:59 +0000 (18:21 +0000)]
Use get_def_for_expr in fold_single_bit_test
The code in fold_single_bit_test, checks if
the inner was a right shift and improve the bitnum
based on that. But since the inner will always be a
SSA_NAME at this point, the code is dead. Move it over
to use the helper function get_def_for_expr instead.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test): Use get_def_for_expr
instead of checking the inner's code.
Andrew Pinski [Fri, 19 May 2023 17:47:14 +0000 (17:47 +0000)]
Inline and simplify fold_single_bit_test_into_sign_test into fold_single_bit_test
Since the last use of fold_single_bit_test is fold_single_bit_test,
we can inline it and even simplify the inlined version. This has
no behavior change.
gcc/ChangeLog:
* expr.cc (fold_single_bit_test_into_sign_test): Inline into ...
(fold_single_bit_test): This and simplify.
Andrew Pinski [Fri, 19 May 2023 16:20:23 +0000 (16:20 +0000)]
Move fold_single_bit_test to expr.cc from fold-const.cc
This is part 1 of N patch set that will change the expansion
of `(A & C) != 0` from using trees to directly expanding so later
on we can do some cost analysis.
Since the only user of fold_single_bit_test is now
expand, move it to there.
gcc/ChangeLog:
* fold-const.cc (fold_single_bit_test_into_sign_test): Move to
expr.cc.
(fold_single_bit_test): Likewise.
* expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc
(fold_single_bit_test): Likewise and make static.
* fold-const.h (fold_single_bit_test): Remove declaration.
Die Li [Sat, 20 May 2023 05:00:13 +0000 (23:00 -0600)]
Fix riscv_expand_conditional_move.
Two issues have been observed in current riscv_expand_conditional_move
implementation.
1. Before introduction of TARGET_XTHEADCONDMOV, op0 of comparision expression
is used for mode comparision with word_mode, but after TARGET_XTHEADCONDMOV
megered with TARGET_SFB_ALU, dest of if-then-else is used for mode comparision with
word_mode, and from md file mode of dest is DI or SI which can be different with
word_mode in RV64.
2. TARGET_XTHEADCONDMOV cannot be generated when the mode of the comparison is E_VOID.
This patch solves the issues above.
Provide an example from the newly added test case.
Testcase:
int ConNmv_reg_reg_reg(int x, int y, int z, int n){
if (x != y) return z;
return n;
}
We were not able to match the CTZ sign extend pattern on RISC-V
because it gets optimized to zero extend and/or to ANDI patterns.
For the ANDI case, combine scrambles the RTL and generates the
extension by using subregs.
gcc/ChangeLog:
PR target/106888
* config/riscv/bitmanip.md
(<bitmanip_optab>disi2): Match with any_extend.
(<bitmanip_optab>disi2_sext): New pattern to match
with sign extend using an ANDI instruction.
gcc/testsuite/ChangeLog:
PR target/106888
* gcc.target/riscv/pr106888.c: New test.
* gcc.target/riscv/zbbw.c: Check for ANDI.
Martin Uecker [Fri, 19 May 2023 14:15:17 +0000 (16:15 +0200)]
c: Remove dead code related to type compatibility across TUs.
Code to detect struct/unions across the same TU is not needed
anymore. Code for determining compatibility of tagged types is
preserved as it will be used for C2X. Some errors in the unused
code are fixed.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (set_type_context): Remove.
(pop_scope, diagnose_mismatched_decls, pushdecl):
Remove dead code.
* c-typeck.cc (comptypes_internal): Remove dead code.
(same_translation_unit_p): Remove.
(tagged_types_tu_compatible_p): Some fixes.
Andrew Pinski [Fri, 19 May 2023 06:12:49 +0000 (06:12 +0000)]
Fix driver/33980: Precompiled header file not removed on error
So the problem here is that in the spec files, we were not marking the pch
output file to be removed on error.
The way to fix this is to mark the --output-pch argument as the output
file argument.
For the C++ specs file, we had to move around where the %V was located
such that it would be after the %w marker as %V marker clears the outputfiles.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/cp/ChangeLog:
PR driver/33980
* lang-specs.h ("@c++-header"): Add %w after
the --output-pch.
("@c++-system-header"): Likewise.
("@c++-user-header"): Likewise.
gcc/ChangeLog:
PR driver/33980
* gcc.cc (default_compilers["@c-header"]): Add %w
after the --output-pch.
Vineet Gupta [Tue, 9 May 2023 23:22:08 +0000 (16:22 -0700)]
RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]
[part #2 of PR/109279]
SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.
long long f(void) { return 0x0101010101010101ull; }
Before
li a5,0x1010000
addi a5,a5,0x101
mv a0,a5
slli a5,a5,32
add a0,a5,a0
ret
With patch
li a5,0x1010000
addi a5,a5,0x101
slli a0,a5,32
add a0,a0,a5
ret
This is testsuite clean.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
to hival, ASHIFT the corresponding regs.
Patrick Palka [Fri, 19 May 2023 13:58:20 +0000 (09:58 -0400)]
c++: simplify norm_cache manipulation
We can avoid performing two norm_cache lookups during normalization of a
concept-id by allocating and inserting a norm_entry* before rather than
after the fact, which is simpler and cheaper.
gcc/cp/ChangeLog:
* constraint.cc (normalize_concept_check): Avoid having to do
two norm_cache lookups. Remove unnecessary early exit for an
ill-formed concept definition.
Patrick Palka [Fri, 19 May 2023 13:40:16 +0000 (09:40 -0400)]
c++: scoped variable template-id of reference type [PR97340]
lookup_and_finish_template_variable calls convert_from_reference, which
means for a variable template-id of reference type the function wraps
the corresponding VAR_DECL in an INDIRECT_REF. But the downstream logic
of two callers, tsubst_qualified_id and finish_class_member_access_expr,
expect a DECL_P result and this unexpected INDIRECT_REF leads to an ICE
resolving such a (dependently scoped) template-id as in the first testcase.
(Note these two callers eventually call convert_from_reference on the
result anyway, so calling it earlier seems redundant in this case.)
This patch fixes this by pulling out the convert_from_reference call
from lookup_and_finish_template_variable and into the callers that
actually need it, which turns out to only be tsubst_copy_and_build
(if we got rid of the call there we'd mishandle the second testcase).
PR c++/97340
gcc/cp/ChangeLog:
* pt.cc (lookup_and_finish_template_variable): Don't call
convert_from_reference.
(tsubst_copy_and_build) <case TEMPLATE_ID_EXPR>: Call
convert_from_reference on the result of
lookup_and_finish_template_variable.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/var-templ80.C: New test.
* g++.dg/cpp1y/var-templ81.C: New test.
Jakub Jelinek [Fri, 19 May 2023 10:58:32 +0000 (12:58 +0200)]
tree-ssa-math-opts: Pattern recognize some further hand written forms of signed __builtin_mul_overflow{,_p} [PR105776]
In the pattern recognition of signed __builtin_mul_overflow{,_p} we
check for result of unsigned division (which follows unsigned
multiplication) being equality compared against one of the multiplication's
argument (the one not used in the division) and check for the comparison
to be done against same precision cast of the argument (because
division's result is unsigned and the argument is signed).
But as shown in this PR, one can write it equally as comparison done in
the signed type, i.e. compare division's result cast to corresponding
signed type against the argument.
The following patch handles even those cases.
2023-05-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105776
* tree-ssa-math-opts.cc (arith_overflow_check_p): If cast_stmt is
non-NULL, allow division statement to have a cast as single imm use
rather than comparison/condition.
(match_arith_overflow): In that case remove the cast stmt in addition
to the division statement.
Jakub Jelinek [Fri, 19 May 2023 10:57:31 +0000 (12:57 +0200)]
tree-ssa-math-opts: Pattern recognize hand written __builtin_mul_overflow_p with same unsigned types even when target just has highpart umul [PR101856]
As can be seen on the following testcase, we pattern recognize it on
i?86/x86_64 as return __builtin_mul_overflow_p (x, y, 0UL) and avoid
that way the extra division, but don't do it e.g. on aarch64 or ppc64le,
even when return __builtin_mul_overflow_p (x, y, 0UL); actually produces
there better code. The reason for testing the presence of the optab
handler is to make sure the generated code for it is short to ensure
we don't actually pessimize code instead of optimizing it.
But, we have one case that the internal-fn.cc .MUL_OVERFLOW expansion
handles nicely, and that is when arguments/result is the same mode
TYPE_UNSIGNED type, we only use IMAGPART_EXPR of it (i.e.
__builtin_mul_overflow_p rather than __builtin_mul_overflow) and
umul_highpart_optab supports the particular mode, in that case
we emit comparison of the highpart umul result against zero.
So, the following patch matches what we do in internal-fn.cc and
also pattern matches __builtin_mul_overflow_p if
1) we only need the flag whether it overflowed (i.e. !use_seen)
2) it is unsigned (i.e. !cast_stmt)
3) umul_highpart is supported for the mode
2023-05-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/101856
* tree-ssa-math-opts.cc (match_arith_overflow): Pattern detect
unsigned __builtin_mul_overflow_p even when umulv4_optab doesn't
support it but umul_highpart_optab does.
Jakub Jelinek [Fri, 19 May 2023 08:13:14 +0000 (10:13 +0200)]
libgomp: Fix up -static -fopenmp linking [PR109904]
When an OpenMP program with target regions is linked statically,
it fails to link on various arches (doesn't when using recent glibc
because it has libdl stuff in libc), because libgomp.a(target.o) uses
dlopen/dlsym/dlclose, but we aren't linking against -ldl (unless
user asked for that). We already have libgomp.spec so that we
can supply extra libraries to link against in the -static case,
this patch adds -ldl to that if plugins are supported.
2023-05-19 Jakub Jelinek <jakub@redhat.com>
PR libgomp/109904
* configure.ac (link_gomp): Include also $DL_LIBS.
* configure: Regenerated.
* config.host: Arrange to set min Darwin OS versions from
the configured host version.
* config/darwin10-unwind-find-enc-func.c: Do not use current
headers, but declare the nexessary structures locally to the
versions in use for Mac OSX 10.6.
* config/t-darwin: Amend to handle configured min OS
versions.
* config/t-darwin-min-1: New.
* config/t-darwin-min-5: New.
* config/t-darwin-min-8: New.
Eric Botcazou [Fri, 19 May 2023 07:00:11 +0000 (09:00 +0200)]
Fix internal error on small array with negative lower bound
Ada supports arrays with negative indices, although the internal index type
is sizetype like in other languages, which is unsigned. This means that
negative values are represented by very large numbers, which works with a
bit of care. This plugs a small loophole in output_constructor_bitfield.
gcc/
* varasm.cc (output_constructor_bitfield): Call tree_to_uhwi instead
of tree_to_shwi on array indices. Minor tweaks.
gcc/testsuite/
* gnat.dg/specs/array6.ads: New test.
Joseph Myers [Fri, 19 May 2023 00:42:07 +0000 (00:42 +0000)]
c: Do not allow thread-local tentative definitions for C2x
C2x makes it clear that thread-local declarations can never be
tentative definitions (the legacy feature of C where you can e.g. do
"int i;" more than once at file scope, possibly with one of the
declarations initialized, and it counts as exactly one definition),
but are always definitions in the absence of "extern". The wording
about external definitions was unclear in the thread-local case in C11
/ C17 (both about what counts as a tentative definition, and what is a
"definition" at all), not having been updated to cover the addition of
thread-local storage.
Implement this C2x requirement. Arguably this is a defect fix that
would be appropriate to apply for all standard versions, but for now
the change is conditional on flag_isoc2x (however, it doesn't handle
_Thread_local / thread_local any different from GNU __thread). Making
the change unconditional results in various TLS tests failing to
compile (gcc.dg/c11-thread-local-1.c gcc.dg/tls/thr-init-1.c
gcc.dg/tls/thr-init-2.c gcc.dg/torture/tls/thr-init-2.c
objc.dg/torture/tls/thr-init.m), though it's not clear if those tests
reflect any real code similarly trying to make use of thread-local
tentative definitions.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (diagnose_mismatched_decls): Do not handle
thread-local declarations as tentative definitions for C2x.
(finish_decl): Do not allow thread-local definition with
incomplete type for C2x.
gcc/testsuite/
* gcc.dg/c2x-thread-local-2.c: New test.