Jakub Jelinek [Tue, 28 Nov 2023 12:14:05 +0000 (13:14 +0100)]
libiberty: Use x86 HW optimized sha1
Nick has approved this patch (+ small ld change to use it for --build-id=),
so I'm commiting it to GCC as master as well.
If anyone from ARM would be willing to implement it similarly with
vsha1{cq,mq,pq,h,su0q,su1q}_u32 intrinsics, it could be a useful linker
speedup on those hosts as well, the intent in sha1.c was that
sha1_hw_process_bytes, sha1_hw_process_block functions
would be defined whenever
defined (HAVE_X86_SHA1_HW_SUPPORT) || defined (HAVE_WHATEVERELSE_SHA1_HW_SUPPORT)
but the body of sha1_hw_process_block and sha1_choose_process_bytes
would then have #elif defined (HAVE_WHATEVERELSE_SHA1_HW_SUPPORT) for
the other arch support, similarly for any target attributes on
sha1_hw_process_block if needed.
2023-11-28 Jakub Jelinek <jakub@redhat.com>
include/
* sha1.h (sha1_process_bytes_fn): New typedef.
(sha1_choose_process_bytes): Declare.
libiberty/
* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): New check.
* sha1.c: If HAVE_X86_SHA1_HW_SUPPORT is defined, include x86intrin.h
and cpuid.h.
(sha1_hw_process_bytes, sha1_hw_process_block,
sha1_choose_process_bytes): New functions.
* config.in: Regenerated.
* configure: Regenerated.
FAIL: gcc.dg/vect/pr71259.c -flto -ffat-lto-objects (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/pr71259.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-14.c (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-alias-check-14.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-14.c -flto -ffat-lto-objects (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-alias-check-14.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-9.c (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-alias-check-9.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-9.c -flto -ffat-lto-objects (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-alias-check-9.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-6.c (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-cond-arith-6.c (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-cond-arith-6.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-gather-5.c (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-gather-5.c (test for excess errors)
FAIL: gcc.dg/vect/vect-gather-5.c -flto -ffat-lto-objects (internal compiler error: in SET_TYPE_VECTOR_SUBPARTS, at tree.h:4248)
FAIL: gcc.dg/vect/vect-gather-5.c -flto -ffat-lto-objects (test for excess errors)
poly size (1, 1) vectors can not be allowed to interleave VLA SLP since interleave VLA SLP suppose VF at least hold 2 elements,
whereas, poly size (1,1) may possible only have 1 element.
Steve Baird [Tue, 14 Nov 2023 20:06:36 +0000 (12:06 -0800)]
ada: Error compiling reduction expression with overloaded reducer subprogram
In some cases involving a reduction expression with an overloaded reducer
subprogram, the accumulator type is not determined correctly. This can lead
to spurious compile-time errors.
gcc/ada/
* exp_attr.adb (Expand_N_Attribute_Reference): In the case of a
Reduce attribute reference, fix bugs in initializing Accum_Typ.
The previous version was incorrect in the case where E1 refers to
the first of multiple possible overload resolution candidates and
that candidate does not turn out to be the right one. The previous
version also had code to compute Accum_Typ via a different method
if the initial computation turned out to yield a universal numeric
type. Delete that initial computation and use the second method in
all cases.
Gary Dismukes [Wed, 15 Nov 2023 23:57:47 +0000 (23:57 +0000)]
ada: Errors on instance of Multiway_Trees with discriminated type
The compiler may report various type conflicts on an instantiation
of the generic package Ada.Containers.Multiway_Trees with an actual
for Element_Type that is a nonprivate actual type with discriminants
that has a discriminant-dependent component of a private type (such
as a Bounded_Vector type). The type errors occur on an aggregate
of the implementation type Tree_Node_Type within the body of
Multiway_Trees, where the aggregate has a box-defaulted association
for the Element component. (Such type errors could of course arise
in other cases of generic instantiations that follow a similar type
model.)
In the case where the discriminant-dependent component type has a
default-initialization procedure (init proc), the compiler was handling
box associations for such components by expanding the topmost box
association into subaggregates that themselves have box associations,
and didn't properly account for discriminant-dependent subcomponents of
private types. This could be fixed internally in Propagate_Discriminants,
but it seems that the entire machinery for dealing with such subcomponent
associations is unnecessary, and the topmost component association can
be handled directly as a default-initialized box association.
gcc/ada/
* sem_aggr.adb (Add_Discriminant_Values): Remove this procedure.
(Propagate_Discriminants): Remove this procedure.
(Resolve_Record_Aggregate): Remove code (the Capture_Discriminants
block statement) related to propagating discriminants and
generating initializations for subcomponents of a
discriminant-dependent box-defaulted subcomponent of a nonprivate
record type with discriminants, and handle all top-level
components that have a non-null base init proc directly, by
calling Add_Association with "Is_Box_Present => True". Also,
combine that elsif clause with the immediately preceding elsif
clause, since they now both contain the same statement (calls to
Add_Association with the same actuals).
Eric Botcazou [Thu, 9 Nov 2023 14:00:56 +0000 (15:00 +0100)]
ada: Further cleanup in finalization machinery
When transient scopes are being materialized, they can give rise to a block
created around the construct being wrapped or not, depending on the kind of
construct. In both cases finalization actions for the transient objects of
the scope are generated the same way, with normal finalization done manually
immediately after the construct and exceptional finalization deferred to the
enclosing scope by means of a hooking mechanism.
Now when the block is generated, it becomes this enclosing scope, so the
normal finalization that comes with it would also be done immediately after
the construct, even without normal finalization generated manually.
Therefore this change gets rid of the manual finalization as well as of the
hooking in the cases where the block is generated, leading to a significant
streamlining of the expanded code in these cases. This requires fixing a
small inaccuracy of the Within_Case_Or_If_Expression predicate, which must
only be concerned with the dependent expressions, since those are the only
ones to be treated specially by the finalization machinery.
It also contains a small cleanup for the description of the transient scope
management present at the beginning of the exp_ch7.adb file.
gcc/ada/
* exp_ch7.ads (Expand_Cleanup_Actions): Move declaration to the
Finalization Management section.
* exp_ch7.adb (Transient Scope Management): Move description down to
after that of the general finalization and make a few changes.
(Insert_Actions_In_Scope_Around): Call Process_Transients_In_Scope
only if cleanups are being handled.
(Process_Transients_In_Scope): Remove redundant test on Clean.
* exp_util.ads (Within_Case_Or_If_Expression): Adjust description.
* exp_util.adb (Within_Case_Or_If_Expression): Only return true if
within the dependent expressions of the conditional expressions.
Eric Botcazou [Wed, 8 Nov 2023 22:29:01 +0000 (23:29 +0100)]
ada: Fix premature finalization for nested return within extended one
The return object is incorrectly finalized when the nested return is taken,
because the special flag attached to the return object is not updated.
gcc/ada/
* exp_ch6.adb (Build_Flag_For_Function): New function made up of the
code building the special flag for return object present...
(Expand_N_Extended_Return_Statement): ...in there. Replace the code
with a call to Build_Flag_For_Function. Add assertion for the flag.
(Expand_Non_Function_Return): For a nested return, if the return
object needs finalization actions, update the special flag.
When emitting code for architectures with tagged pointers, it is useful
to be able to recognize values representing addresses because they
require special handling. This commits adds the predicate
Is_Address_Compatible_Type, which differs from the node attribute
Is_Descendant_Of_Address by also taking Standard_Address into account.
gcc/ada/
* einfo-utils.ads, einfo-utils.adb (Is_Address_Compatible_Type):
New function.
Gary Dismukes [Tue, 7 Nov 2023 22:16:31 +0000 (22:16 +0000)]
ada: Type error on container aggregate with loop_parameter_specification
The compiler incorrectly reported a type error on a container aggregate
for a Vector type with a loop_parameter_specification specifying a
nonstatic upper bound, complaining that it expected the Vector index
type, but instead found type Count_Type. The expansion of the aggregate
was incorrectly passing a size temporary of type Count_Type to the
function associated with the New_Indexed part of the container type's
Aggregate aspect (New_Vector in the case of Vectors), which has two
formals of the container index type. The fix is to convert the size
temporary to the expected index type.
gcc/ada/
* exp_aggr.adb (Expand_Container_Aggregate): Apply a conversion to the
size temp object passed as the second actual parameter on the call to
the New_Indexed_Subp function, to convert it to the index type of the
container type (taken from the first formal parameter of the function).
Eric Botcazou [Mon, 6 Nov 2023 22:34:20 +0000 (23:34 +0100)]
ada: Fix internal error on declare expression in expression function
When the expression function is not a completion, its (return) expression
does not cause freezing so analyzing the declare expression in this context
must not freeze the type of the object.
The change also contains another fix, which makes it so that the compiler
does not evaluate a nonstatic representation attribute of a scalar subtype
in the same context if the subtype is not already frozen.
gcc/ada/
* sem_attr.adb (Eval_Attribute): Do not proceed in a spec expression
for nonstatic representation attributes of a scalar subtype when the
subtype is not frozen.
* sem_ch3.adb (Analyze_Object_Declaration): Do not freeze the type
of the object in a spec expression.
Yannick Moy [Fri, 3 Nov 2023 13:49:30 +0000 (14:49 +0100)]
ada: Remove dependency on System.Val_Bool in System.Img_Bool
In order to facilitate the certification of System.Img_Bool, remove
its dependency on unit System.Val_Bool. Modify the definition of
ghost function Is_Boolean_Image_Ghost to take the expected boolean
value and move it to System.Val_Spec.
gcc/ada/
* libgnat/s-imgboo.adb: Remove with_clause now in spec file.
* libgnat/s-imgboo.ads: Remove dependency on System.Val_Bool.
(Image_Boolean): Replace call to Value_Boolean by passing value V
to updated ghost function Is_Boolean_Image_Ghost.
* libgnat/s-valboo.ads (Is_Boolean_Image_Ghost): Move to other
unit.
(Value_Boolean.): Update precondition.
* libgnat/s-valspe.ads (Is_Boolean_Image_Ghost): Move here. Add
new parameter for expected boolean value.
Tucker Taft [Sat, 4 Nov 2023 13:53:28 +0000 (13:53 +0000)]
ada: Fix predicate failure that occurred in a test case
The CodePeer test case illustrating a problem where a "high"
precondition failure was expected, died in the GNAT FE on
input_reading.adb. The problem was in Check_SCIL, where
it didn't properly handle a discriminant_specification.
Jakub Jelinek [Tue, 28 Nov 2023 09:16:47 +0000 (10:16 +0100)]
testsuite: Fix up pr111754.c test
On Tue, Nov 28, 2023 at 03:56:47PM +0800, juzhe.zhong@rivai.ai wrote:
> Hi, there is a regression in RISC-V caused by this patch:
>
> FAIL: gcc.dg/vect/pr111754.c -flto -ffat-lto-objects scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }"
> FAIL: gcc.dg/vect/pr111754.c scan-tree-dump optimized "return { 0.0, 9.0e\\+0, 0.0, 0.0 }"
>
> I have checked the dump is :
> F foo (F a, F b)
> {
> <bb 2> [local count: 1073741824]:
> <retval> = { 0.0, 9.0e+0, 0.0, 0.0 };
> return <retval>;
>
> }
>
> The dump IR seems reasonable to me.
> I wonder whether we should walk around in RISC-V backend to generate the same IR as ARM SVE ?
> Or we should adjust the test ?
Note, the test also FAILs on i686-linux (but not e.g. on x86_64-linux):
/home/jakub/src/gcc/obj67/gcc/xgcc -B/home/jakub/src/gcc/obj67/gcc/ /home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c -fdiagnostics-plain-output -O2 -fdump-tree-optimized -S
+-o pr111754.s
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c: In function 'foo':
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: note: the ABI for passing parameters with 16-byte alignment has changed in GCC 4.6
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
FAIL: gcc.dg/vect/pr111754.c (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:7:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/vect/pr111754.c:6:3: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
So, I think it is wrong to specify
/* { dg-options "-O2 -fdump-tree-optimized" } */
in the test, should be dg-additional-options instead, so that it gets
the implied vector compilation options e.g. for i686-linux (-msse2 in that
case at least), question is if -Wno-psabi should be added as well or not,
and certainly the scan-tree-dump needs to be guarded by appropriate
vect_* effective target (but dunno which, one which asserts support for
V4SFmode and returning it).
Alternatively, perhaps don't check optimized dump but some earlier one
before generic vector lowering, then hopefully it could match on all
targets? Maybe with the <retval> = ... vs. return ... variants.
2023-11-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111754
* gcc.dg/vect/pr111754.c: Use dg-additional-options rather than
dg-options, add -Wno-psabi and use -fdump-tree-forwprop1 rather than
-fdump-tree-optimized. Scan forwprop1 dump rather than optimized and
scan for either direct return or setting of <retval> to the vector.
When looking around, I've noticed we have a similar simplification
for parity (with ^ rather than +). Note, unlike the popcount one,
this one doesn't check for INTEGRAL_TYPE_P (type) (which rules out
vector simplification), so I've used the old handling for types_match and
otherwise do it only for scalar argument types and handle different
precision in there.
The testcase ICEs without the previous patch on the first function,
but strangely not on the second which tests parity. The reason
is that in this case there is no wi::bit_and test like for popcount
and for BITINT_TYPEs build_call_internal actually refuses to create it
and thus the whole simplification fails. While
.{CLZ,CTZ,CLRSB,FFS,POPCOUNT,PARITY} ifns are direct optab ifns for
normal integer and vector types (and thus it is desirable to punt if
there is no supported optab for them), they have this large/huge _BitInt
extension before bitint lowering, so the patch also adjusts
build_call_internal to allow that case.
2023-11-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112719
* match.pd (parity(X)^parity(Y) -> parity(X^Y)): Handle case of
mismatched types.
* gimple-match-exports.cc (build_call_internal): Add special-case for
bit query ifns on large/huge BITINT_TYPE before bitint lowering.
Since my PR112566 r14-5557 changes the following testcase ICEs, because
.POPCOUNT (x) + .POPCOUNT (y) has a simplification attempted even when
x and y have incompatible types (different precisions).
Note, with _BitInt it can ICE already starting with r14-5435 and
I think as a latent problem it exists for years, because IFN_POPCOUNT
calls inherently can have different argument types and return type
is always the same.
The following patch fixes it by using widest_int during the analysis
(which is where it was ICEing) and if it is optimizable, casting to
the wider type so that bit_ior has matching argument types.
2023-11-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112719
* match.pd (popcount (X) + popcount (Y) -> POPCOUNT (X | Y)): Deal
with argument types with different precisions.
Lewis Hyatt [Mon, 27 Nov 2023 17:08:41 +0000 (12:08 -0500)]
libcpp: Fix unsigned promotion for unevaluated divide by zero [PR112701]
When libcpp encounters a divide by zero while processing a constant
expression "x/y", it returns "x" as a fallback. The value of the fallback is
not normally important, since an error will be generated anyway, but if the
expression appears in an unevaluated context, such as "0 ? 0/0u : -1", then
there will be no error, and the fallback value will be meaningful to the
extent that it may cause promotion from signed to unsigned of an operand
encountered later. As the PR notes, libcpp does not do the unsigned
promotion correctly in this case; fix it by making the fallback return value
unsigned as necessary.
libcpp/ChangeLog:
PR preprocessor/112701
* expr.cc (num_div_op): Set unsignedp appropriately when returning a
stub value for divide by 0.
gcc/testsuite/ChangeLog:
PR preprocessor/112701
* gcc.dg/cpp/expr.c: Add additional tests to cover divide by 0 in an
unevaluated context, where the unsignedness still matters.
Juzhe-Zhong [Mon, 27 Nov 2023 13:24:12 +0000 (21:24 +0800)]
RISC-V: Fix VSETVL PASS regression
This patch is regression fix patch, not an optimization patch.
Since trunk GCC generates redundant vsetvl than GCC-13.
This is the case:
bb 2:
def a2 (vsetvl a2, zero)
bb 3:
use a2
bb 4:
use a2 (vle)
before this patch:
bb 2:
vsetvl a2 zero
bb 3:
vsetvl zero, zero ----> should be eliminated.
bb 4:
vle.v
The root cause is we didn't set bb 3 as transparent since the incorrect codes.
bb 3 didn't modify "a2" just use it, the VSETVL status from bb 2 can be available to bb 3 and bb 4:
bb 2 -> bb 3 -> bb4.
Another regression fix is anticipation calculation:
bb 4:
use a5 (sub)
use a5 (vle)
The vle VSETVL status should be considered as anticipated as long as both sub and vle a5 def are coming from same def.
David Malcolm [Tue, 28 Nov 2023 01:09:35 +0000 (20:09 -0500)]
diagnostics: don't print annotation lines when there's no column info
gcc/ChangeLog:
* diagnostic-show-locus.cc (layout::maybe_add_location_range):
Don't print annotation lines for ranges when there's no column
info.
(selftest::test_one_liner_no_column): New.
(selftest::test_diagnostic_show_locus_one_liner): Call it.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Andrew Pinski [Sun, 26 Nov 2023 23:25:51 +0000 (23:25 +0000)]
aarch64: Improve cost of `a ? {-,}1 : b`
While looking into PR 112454, I found the cost for
`(if_then_else (cmp) (const_int 1) (reg))` was being recorded as 8
(or `COSTS_N_INSNS (2)`) but it should have been 4 (or `COSTS_N_INSNS (1)`).
This improves the cost by not adding the cost of `(const_int 1)` to
the total cost.
It does not does not fully fix PR 112454 as that requires other changes to forwprop
the `(const_int 1)` earlier than combine. Though we do fix the loop case where the
constant was only used once.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
Handle csinv/csinc case of 1/-1.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/csinc-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This testcase started to fail after r14-5628-g53ba8d669550d3 because
IPA-VRP can now start to figure out the functions return a constant
value and there was nothing that profiling needed to profile any more.
This disables IPA-VRP for this testcase to be able to profile again.
Bootrapped/tested on x86_64-linux-gnu with no regressions.
Richard Earnshaw [Mon, 27 Nov 2023 17:59:39 +0000 (17:59 +0000)]
arm: libgcc: tweak warning from __sync_synchronize
My previous patch to add an implementation of __sync_syncrhonize with
a warning trips a testsuite failure in fortran (and possibly other
languages as well) as the framework expects no blank lines in the
output, but this warning was generating one. So remove the newline
from the end of the message and rely on the one added by the linker
instead.
Since we're there, remove the trailing period from the message as
well, since the convention seems to be not to have one.
PR111754: Rework encoding of result for VEC_PERM_EXPR with constant input vectors.
gcc/ChangeLog:
PR middle-end/111754
* fold-const.cc (fold_vec_perm_cst): Set result's encoding to sel's
encoding, and set res_nelts_per_pattern to 2 if sel contains stepped
sequence but input vectors do not.
(test_nunits_min_2): New test Case 8.
(test_nunits_min_4): New tests Case 8 and Case 9.
Szabolcs Nagy [Wed, 26 Apr 2023 13:28:32 +0000 (14:28 +0100)]
aarch64: Use br instead of ret for eh_return
The expected way to handle eh_return is to pass the stack adjustment
offset and landing pad address via
EH_RETURN_STACKADJ_RTX
EH_RETURN_HANDLER_RTX
to the epilogue that is shared between normal return paths and the
eh_return paths. EH_RETURN_HANDLER_RTX is the stack slot of the
return address that is overwritten with the landing pad in the
eh_return case and EH_RETURN_STACKADJ_RTX is a register added to sp
right before return and it is set to 0 in the normal return case.
The issue with this design is that eh_return and normal return may
require different return sequence but there is no way to distinguish
the two cases in the epilogue (the stack adjustment may be 0 in the
eh_return case too).
The reason eh_return and normal return requires different return
sequence is that control flow integrity hardening may need to treat
eh_return as a forward-edge transfer (it is not returning to the
previous stack frame) and normal return as a backward-edge one.
In case of AArch64 forward-edge is protected by BTI and requires br
instruction and backward-edge is protected by PAUTH or GCS and
requires ret (or authenticated ret) instruction.
This patch resolves the issue by introducing EH_RETURN_TAKEN_RTX that
is a flag set to 1 in the eh_return path and 0 in normal return paths.
Branching on the EH_RETURN_TAKEN_RTX flag, the right return sequence
can be used in the epilogue.
The handler could be passed the old way via clobbering the return
address, but since now the eh_return case can be distinguished, the
handler can be in a different register than x30 and no stack frame
is needed for eh_return.
This patch fixes a return to anywhere gadget in the unwinder with
existing standard branch protection as well as makes EH return
compatible with the Guarded Control Stack (GCS) extension.
Some tests are adjusted because eh_return no longer prevents pac-ret
in the normal return path.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_eh_return_handler_rtx):
Remove.
* config/aarch64/aarch64.cc (aarch64_return_address_signing_enabled):
Sign return address even in functions with eh_return.
(aarch64_expand_epilogue): Conditionally return with br or ret.
(aarch64_eh_return_handler_rtx): Remove.
* config/aarch64/aarch64.h (EH_RETURN_TAKEN_RTX): Define.
(EH_RETURN_STACKADJ_RTX): Change to R5.
(EH_RETURN_HANDLER_RTX): Change to R6.
* df-scan.cc: Handle EH_RETURN_TAKEN_RTX.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document EH_RETURN_TAKEN_RTX.
* except.cc (expand_eh_return): Handle EH_RETURN_TAKEN_RTX.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/return_address_sign_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_2.c: ... here and fix the
scan asm check.
* gcc.target/aarch64/return_address_sign_b_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_b_2.c: ... here and fix the
scan asm check.
Thomas Schwinge [Wed, 22 Nov 2023 16:35:23 +0000 (17:35 +0100)]
GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]
Certain other command-line flags are mutually exclusive (random example: GCN
'-march=gfx906', '-march=gfx908'). If they're not appropriately marked up,
this does disturb the multilib selection machinery, for example:
In the last invocation, '-march=gfx900 -march=gfx906', for example, in
'gcc/gcc.cc:set_multilib_dir' we see both flags -- which there doesn't exist a
matching multilib for, therefore we "fail" to the default ('.'). Tagges as
'Negative', only the last flag survives, and we, for example, get the expected:
aarch64: Remove redundant zeroing/merging in SVE intrinsics [PR106326]
Many predicated SVE intrinsics provide three forms of predication:
zeroing, merging, and any/dont-care. All three are equivalent when
the predicate is all-true, so this patch drops the zeroing and
merging in that case.
gcc/
PR target/106326
* config/aarch64/aarch64-sve-builtins.h (is_ptrue): Declare.
* config/aarch64/aarch64-sve-builtins.cc (is_ptrue): New function.
(gimple_folder::redirect_pred_x): Likewise.
(gimple_folder::fold): Use it.
gcc/testsuite/
PR target/106326
* gcc.target/aarch64/sve/acle/general/pr106326_1.c: New test.
The fix for PR106329 needs a way of testing for a ptrue of a particular
element size. We already had such a function for svlast, so this patch
moves it to common code and generalises it to work with all kinds of
vectors.
Richard Biener [Mon, 27 Nov 2023 09:20:02 +0000 (10:20 +0100)]
tree-optimization/112653 - PTA and return
The following separates the escape solution for return stmts not
only during points-to solving but also for later querying. This
requires adjusting the points-to-global tests to include escapes
through returns. Technically the patch replaces the existing
post-processing which computes the transitive closure of the
returned value solution by a proper artificial variable with
transitive closure constraints. Instead of adding the solution
to escaped we track it separately.
PR tree-optimization/112653
* gimple-ssa.h (gimple_df): Add escaped_return solution.
* tree-ssa.cc (init_tree_ssa): Reset it.
(delete_tree_ssa): Likewise.
* tree-ssa-structalias.cc (escaped_return_id): New.
(find_func_aliases): Handle non-IPA return stmts by
adding to ESCAPED_RETURN.
(set_uids_in_ptset): Adjust HEAP escaping to also cover
escapes through return.
(init_base_vars): Initialize ESCAPED_RETURN.
(compute_points_to_sets): Replace ESCAPED post-processing
with recording the ESCAPED_RETURN solution.
* tree-ssa-alias.cc (ref_may_alias_global_p_1): Check
the ESCAPED_RETUNR solution.
(dump_alias_info): Dump it.
* cfgexpand.cc (update_alias_info_with_stack_vars): Update it.
* ipa-icf.cc (sem_item_optimizer::fixup_points_to_sets):
Likewise.
* tree-inline.cc (expand_call_inline): Reset it.
* tree-parloops.cc (parallelize_loops): Likewise.
* tree-sra.cc (maybe_add_sra_candidate): Check it.
vect: Avoid duplicate_and_interleave for uniform vectors [PR112661]
can_duplicate_and_interleave_p checks whether we know a way of
building a particular VLA SLP invariant. g:60034ecf25597bd515f
skipped that test for booleans, to support MASK_LEN_GATHER_LOAD
calls with a dummy all-ones mask. But there's nothing fundamentally
different about VLA masks vs VLA data vectors. If we have a VLA mask
that isn't all-ones, we need some way of loading it. This ultimately
led to the ICE in the PR.
This patch fixes it by applying can_duplicate_and_interleave_p
to masks, while also adding a special path for uniform vectors
(of all kinds) to support the MASK_LEN_GATHER_LOAD usage. This
also fixes an XFAIL in pr36648.cc for SVE.
The patch is mostly Richard's. My only changes were to skip
redundant conversions and to use gimple_build_vector_from_val
for all eligible vectors.
2023-11-27 Richard Biener <rguenther@suse.de>
Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/112661
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Defer duplicate-and-
interleave test to...
(vect_build_slp_tree_2): ...here, once we have all the operands.
Skip the test for uniform vectors.
(vect_create_constant_vectors): Detect uniform vectors. Avoid
redundant conversions in that case. Use gimple_build_vector_from_val
to build the vector.
gcc/testsuite/
* g++.dg/vect/pr36648.cc: Remove XFAIL for VLA load-lanes.
Commit 248df13b966f46649e16dc3c8c92b263790ef503 restricted the rotate
count to immediates. Although the documentation of vec_rli (Vector
Element Rotate Left Immediate) can be read as if it where restricted to
immediates, this is not the case. Thus, revert this commit.
In order to finally allow register operands, the rotate count must be of
type unsigned char since the expander expects it to be of mode QI. The
previously used type unsigned integer worked out for immediates since
those are of VOID mode anyway.
Alex Coplan [Fri, 17 Mar 2023 16:30:51 +0000 (16:30 +0000)]
c-family: Implement __has_feature and __has_extension [PR60512]
This patch implements clang's __has_feature and __has_extension in GCC.
Currently the patch aims to implement all documented features (and some
undocumented ones) following the documentation at
https://clang.llvm.org/docs/LanguageExtensions.html with the exception
of the legacy features for C++ type traits. These are omitted, since as
the clang documentation notes, __has_builtin is the correct "modern" way
to query for these (which GCC already implements).
gcc/c-family/ChangeLog:
PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(c_common_register_feature): New.
(init_has_feature): New.
(has_feature_p): New.
* c-common.h (c_common_has_feature): New.
(c_family_register_lang_features): New.
(c_common_register_feature): New.
(has_feature_p): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Generalize and move common part ...
(c_common_lex_availability_macro): ... here.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature.
PR c++/60512
* c-c++-common/has-feature-common.c: New test.
* c-c++-common/has-feature-pedantic.c: New test.
* g++.dg/ext/has-feature.C: New test.
* gcc.dg/asan/has-feature-asan.c: New test.
* gcc.dg/has-feature.c: New test.
* gcc.dg/ubsan/has-feature-ubsan.c: New test.
* obj-c++.dg/has-feature.mm: New test.
* objc.dg/has-feature.m: New test.
Currently for an unsigned 16-bit comparison between memory and an
immediate where the high bit is set, a clc is emitted. This is because
the constant is created for mode HI and therefore sign extended. This
means constraint D does not hold anymore. Since the mode already
restricts the immediate to 16 bit, it is enough to make use of
constraint n and chop of the high bits in the output template.
gcc/ChangeLog:
* config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
use of constraint n instead of D and chop of high bits in the
output template.
Jakub Jelinek [Mon, 27 Nov 2023 08:22:20 +0000 (09:22 +0100)]
mips: Fix up mips*-sde-elf* build [PR112300]
As reported in the PR, mipsisa64r2-sde-elf doesn't build because HEAP_TRAMPOLINES_INIT
macro isn't defined anywhere.
It is normally defined by
# Figure out if we need to enable heap trampolines by default
case ${target} in
*-*-darwin2*)
# Currently, we do this for macOS 11 and above.
tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
;;
*)
tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
;;
esac
in config.gcc, but mips*-sde-elf* is the only target which overwrites
tm_defines shell variable rather than just appending to it (or in one case
prepending), all other targets append something to it, including other
mips* triplets.
I believe (just from looking at config.gcc) that the difference is that
LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 LIBC_MUSL=4 HEAP_TRAMPOLINES_INIT=0
isn't defined without the patch and is with the patch.
I think defining those first 4 shouldn't cause any harm and defining the
last one is required for it to actually build at all.
2023-11-27 Jakub Jelinek <jakub@redhat.com>
PR target/112300
* config.gcc (mips*-sde-elf*): Append to tm_defines rather than
overwriting them.
Juzhe-Zhong [Sat, 25 Nov 2023 08:24:32 +0000 (16:24 +0800)]
RISC-V: Remove incorrect function gate gather_scatter_valid_offset_mode_p
Come back to review the codes of gather/scatter, notice gather_scatter_valid_offset_mode_p looks odd.
gather_scatter_valid_offset_mode_p is supposed to block vluxei64/vsuxei64 in RV32 system.
However, it failed to do that since it is passing data_mode instead of index mode:
riscv_vector::gather_scatter_valid_offset_mode_p (<RATIO2:MODE>mode)
It should be RATIO2I instead of RATIO2.
So we have this following iterators which already can block the this situation:
Tsukasa OI [Sat, 21 Oct 2023 05:29:06 +0000 (05:29 +0000)]
RISC-V: Initial RV64E and LP64E support
Along with RV32E, RV64E is ratified. Though ILP32E and LP64E ABIs are
still draft, it's worth supporting it.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Set version to ratified 2.0.
(riscv_subset_list::parse_std_ext): Allow RV64E.
* config.gcc: Parse base ISA 'rv64e' and ABI 'lp64e'.
* config/riscv/arch-canonicalize: Parse base ISA 'rv64e'.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Define different macro per XLEN. Add handling for ABI_LP64E.
* config/riscv/riscv-d.cc (riscv_d_handle_target_float_abi):
Add handling for ABI_LP64E.
* config/riscv/riscv-opts.h (enum riscv_abi_type): Add ABI_LP64E.
* config/riscv/riscv.cc (riscv_option_override): Enhance error
handling to support RV64E and LP64E.
(riscv_conditional_register_usage): Change "RV32E" in a comment
to "RV32E/RV64E".
* config/riscv/riscv.h
(UNITS_PER_FP_ARG): Add handling for ABI_LP64E.
(STACK_BOUNDARY): Ditto.
(ABI_STACK_BOUNDARY): Ditto.
(MAX_ARGS_IN_REGISTERS): Ditto.
(ABI_SPEC): Add support for "lp64e".
* config/riscv/riscv.opt: Parse -mabi=lp64e as ABI_LP64E.
* doc/invoke.texi: Add documentation of the LP64E ABI.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-1.c: Test for __riscv_64e.
* gcc.target/riscv/predef-2.c: Ditto.
* gcc.target/riscv/predef-3.c: Ditto.
* gcc.target/riscv/predef-4.c: Ditto.
* gcc.target/riscv/predef-5.c: Ditto.
* gcc.target/riscv/predef-6.c: Ditto.
* gcc.target/riscv/predef-7.c: Ditto.
* gcc.target/riscv/predef-8.c: Ditto.
* gcc.target/riscv/predef-9.c: New test for RV64E and LP64E,
based on predef-7.c.
Jose E. Marchesi [Mon, 27 Nov 2023 06:20:21 +0000 (07:20 +0100)]
bpf: remove bpf-helpers.h
Now that we are finally able to use the kernel provided bpf_helpers.h
file and associated machinery, there is no longer need to distribute
our own version.
This patch removes bpf-helpers.h and deletes most of the associated
tests from the gcc.target/bpf testsuite. Two tests are adapted and
retained: one testing the kernel_helper attribute, which is still
useful, and the other making sure that proper constant propagation is
performed with -O2, which is necessary to use the helpers defined as
static pointers in the kernel's bpf_helpers.h.
Regtested in target bpf-unknown-none and host x86_64-linux-gnu.
testsuite/gcc.dg/uninit-pred-9_b.c:20: Fix XPASS for various targets
The xfail for "*-*-*" here, set in r14-4089-gd45ddc2c04e471
"tree-optimization/111294 - backwards threader PHI costing"
was somewhat too general and made this test XPASS for a
number of targets. The common factor for those targets is
that they either explicitly or by default define
LOGICAL_OP_NON_SHORT_CIRCUIT as 0 (see fold-const.cc).
Instead of changing *-*-* to a seemingly random set of
xfailed targets or inventing a new testsuite
effective-target predicate for logical-op-short-circuited
targets or the opposite, let's just force a setting that
removes the need for the xfail for all targets, by
overriding with --param=logical-op-non-short-circuit=0.
* gcc.dg/uninit-pred-9_b.c: Remove xfail for line 20. Pass
--param=logical-op-non-short-circuit=0. Comment why.
testsuite/gcc.dg/uninit-pred-9_b.c:23: Un-xfail for MMIX
In a recent all-target test-round investigating XPASSes for
this file, I noticed this line XPASSing for MMIX. From the
commit history it's obvious it was left out from related
target-xfail tweaks, now the last target xfailing a bogus
warning for this line.
* gcc.dg/uninit-pred-9_b.c: Remove xfail for MMIX from line 23.
The new test at gcc.target/i386/pr112686.c fails on darwin with:
Excess errors:
cc1: error: '-fsplit-stack' currently only supported on GNU/Linux
cc1: error: '-fsplit-stack' is not supported by this compiler configuration
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112686.c: Add a requirement for split_stack.
Andrew Pinski [Sun, 26 Nov 2023 04:24:56 +0000 (20:24 -0800)]
Fix gcc.dg/vla-1.c
r14-5628-g53ba8d669550d3 added noipa to f1 but `-fno-ipa-vrp` should have been used
instead. The testcase is testing about the clone of f1 so turning off
IPA VRP is the correct approach here rather than turning off of IPA on the function.
gcc/testsuite/ChangeLog:
PR testsuite/112691
* gcc.dg/vla-1.c: Add -fno-ipa-vrp.
Remove noipa from f1.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 26 Nov 2023 02:50:46 +0000 (18:50 -0800)]
Fix gcc.target/aarch64/simd/vmulxd_{f64,f32}_2.c after after IPA-VRP improvement for return values
Just like the patch against gcc.target/aarch64/movk.c, the issue here
is the two functions, foo32 and foo64 needed to mark as noipa so that
IPA-VRP cannot propagate the return value.
gcc/testsuite/ChangeLog:
PR testsuite/112688
* gcc.target/aarch64/simd/vmulx.x (foo32): Mark as noipa rather
than noinline.
(foo4): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Fri, 24 Nov 2023 02:55:30 +0000 (18:55 -0800)]
Fix contracts-tmpl-spec2.C on targets where plain char is unsigned by default
Since contracts-tmpl-spec2.C is just testing contracts, I thought it would be better
to just add `-fsigned-char` to the options rather than change the testcase to support
both cases.
Committed after testing on aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
PR testsuite/108321
* g++.dg/contracts/contracts-tmpl-spec2.C: Add -fsigned-char
to options.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Wed, 22 Nov 2023 02:25:24 +0000 (18:25 -0800)]
Fix gcc.target/aarch64/movk.c testcase after IPA-VRP improvement for return values
The problem here is dummy_number_generator returns a constant which IPA VRP is now able
propagate that so we need to mark the funciton as noipa to stop that.
gcc/testsuite/ChangeLog:
PR testsuite/112688
* gcc.target/aarch64/movk.c: Add noipa on dummy_number_generator
and remove -fno-inline option.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Sat, 25 Nov 2023 09:31:55 +0000 (10:31 +0100)]
i386: Fix up *jcc_bt*_mask{,_1} [PR111408]
The following testcase is miscompiled in GCC 14 because the
*jcc_bt<mode>_mask and *jcc_bt<SWI48:mode>_mask_1 patterns have just
one argument in (match_operator 0 "bt_comparison_operator" [...])
but as bt_comparison_operator is eq,ne, we need two.
The md readers don't warn about it, after all, some checks can
be done in the predicate rather than specified explicitly, and the
behavior is that anything is accepted as the second argument.
I went through all other i386.md match_operator uses and all others
looked right (extract_operator using 3 operands, all others 2).
I think we'll want to fix this at different spots in older releases
because I think the bug was introduced already in 2008, though most
likely just latent.
2023-11-25 Jakub Jelinek <jakub@redhat.com>
PR target/111408
* config/i386/i386.md (*jcc_bt<mode>_mask,
*jcc_bt<SWI48:mode>_mask_1): Add (const_int 0) as expected
second operand of bt_comparison_operator.
Jakub Jelinek [Sat, 25 Nov 2023 09:30:39 +0000 (10:30 +0100)]
aarch64: Fix up aarch64_simd_stp<mode> [PR109977]
The aarch64_simd_stp<mode> pattern uses w constraint in one alternative and
r in another, but for the latter incorrectly uses <vw> iterator in %<vw>1 which
expands to %d1 for V2DF and %s1 for V2SF and V4SF (this one not relevant to
the pattern) and %w1 for others, so it ICEs if the alternative is selected
during final. Compared to this, <vwcore> macro has the same values for all
modes but uses w for V2DF and V2SF.
2023-11-24 Andrew Pinski <pinskia@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/109977
* config/aarch64/aarch64-simd.md (aarch64_simd_stp<mode>): Use <vwcore>
rather than %<vw> for alternative with r constraint on input operand.
Nathaniel Shead [Fri, 10 Nov 2023 03:28:40 +0000 (14:28 +1100)]
c++: more checks for exporting names with using-declarations
Currently only functions are directly checked for validity when
exporting via a using-declaration. This patch also checks exporting
non-external names of variables, types, and enumerators. This also
prevents ICEs with `export using enum` for internal-linkage enums.
While we're at it this patch also improves the error messages for these
cases to provide more context about what went wrong.
gcc/cp/ChangeLog:
* name-lookup.cc (check_can_export_using_decl): New.
(do_nonmember_using_decl): Use above to check if names can be
exported.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-10.C: New test.
* g++.dg/modules/using-enum-2.C: New test.
Nathaniel Shead [Mon, 13 Nov 2023 05:48:36 +0000 (16:48 +1100)]
c++: Allow exporting a typedef redeclaration [PR102341]
A typedef doesn't create a new entity, and thus should be allowed to be
exported even if it has been previously declared un-exported. See the
example in [module.interface] p6:
export module M;
struct S { int n; };
typedef S S;
export typedef S S; // OK, does not redeclare an entity
PR c++/102341
gcc/cp/ChangeLog:
* decl.cc (duplicate_decls): Allow exporting a redeclaration of
a typedef.
gcc/testsuite/ChangeLog:
* g++.dg/modules/export-1.C: Adjust test.
* g++.dg/modules/export-2_a.C: New test.
* g++.dg/modules/export-2_b.C: New test.
Lewis Hyatt [Wed, 1 Nov 2023 17:01:12 +0000 (13:01 -0400)]
preprocessor: Reinitialize frontend parser after loading a PCH [PR112319]
Since r14-2893, the frontend parser object needs to exist when running in
preprocess-only mode, because pragma_lex() is now called in that mode and
needs to make use of it. This is handled by calling c_init_preprocess() at
startup. If -fpch-preprocess is in effect (commonly, because of
-save-temps), a PCH file may be loaded during preprocessing, in which
case the parser will be destroyed, causing the issue noted in the
PR. Resolve it by reinitializing the frontend parser after loading the PCH.
gcc/c-family/ChangeLog:
PR pch/112319
* c-ppoutput.cc (cb_read_pch): Reinitialize the frontend parser
after loading a PCH.
gcc/testsuite/ChangeLog:
PR pch/112319
* g++.dg/pch/pr112319.C: New test.
* g++.dg/pch/pr112319.Hs: New test.
* gcc.dg/pch/pr112319.c: New test.
* gcc.dg/pch/pr112319.hs: New test.
Martin Jambor [Fri, 24 Nov 2023 16:32:35 +0000 (17:32 +0100)]
sra: SRA of non-escaped aggregates passed by reference to calls
PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it. This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.
This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls. The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk. Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.
The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs). Apart from this
case I have not noticed any regressions, at least not so far.
Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.
This version of the patch also takes care to avoid attempts to modify
abnormal edges, something which was missing in the previosu version.
gcc/ChangeLog:
2023-11-23 Martin Jambor <mjambor@suse.cz>
PR middle-end/109849
* tree-sra.cc (passed_by_ref_in_call): New.
(sra_initialize): Allocate passed_by_ref_in_call.
(sra_deinitialize): Free passed_by_ref_in_call.
(create_access): Add decl pool candidates only if they are not
already candidates.
(build_access_from_expr_1): Bail out on ADDR_EXPRs.
(build_access_from_call_arg): New function.
(asm_visit_addr): Rename to scan_visit_addr, change the
disqualification dump message.
(scan_function): Check taken addresses for all non-call statements,
including phi nodes. Process all call arguments, including the static
chain, build_access_from_call_arg.
(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
non-escaped local variables.
(sort_and_splice_var_accesses): Disallow smaller-than-precision
replacements for aggregates passed by reference to functions.
(sra_modify_expr): Use a separate stmt iterator for adding satements
before the processed statement and after it.
(enum out_edge_check): New type.
(abnormal_edge_after_stmt_p): New function.
(sra_modify_call_arg): New function.
(sra_modify_assign): Adjust calls to sra_modify_expr.
(sra_modify_function_body): Likewise, use sra_modify_call_arg to
process call arguments, including the static chain.
Tobias Burnus [Fri, 24 Nov 2023 14:31:08 +0000 (15:31 +0100)]
OpenMP: Add -Wopenmp and use it
The new warning has two purposes: First, it makes clearer to the
user that it is about OpenMP and, secondly and more importantly,
it permits to use -Wno-openmp.
The newly added -Wopenmp is enabled by default and replaces the
'0' (always warning) in several OpenMP-related warning calls.
For code shared with OpenACC, it only uses OPT_Wopenmp for
'flag_openmp | flag_openmp_simd'.
* lang.opt (Wopenmp): Add, enabled by dafault and documented in C.
* openmp.cc (gfc_match_omp_declare_target, resolve_positive_int_expr,
resolve_nonnegative_int_expr, resolve_omp_clauses,
gfc_resolve_omp_do_blocks): Use OPT_Wopenmp with gfc_warning{,_now}.
Richard Earnshaw [Mon, 20 Nov 2023 14:04:17 +0000 (14:04 +0000)]
arm: libgcc: provide implementations of __sync_synchronize
Prior to Armv6 there was no architected method to synchronize data
across processors. Armv6 saw the first introduction of
multi-processor support, using a CP15 operation; but this was
deprecated in Armv7 and is not supported on m-profile devices of any
form. Armv7 (and armv6-m) and later support data synchronization via
the DMB instruction.
This all leads to difficulties when linking programs as the user
generally needs to know which synchronization method is needed, but
there seems no easy way around this, when there are no OS-related
primitives available.
I've addressed this by adding multiple variants of __sync_synchronize
to libgcc, one for each of the above use cases. I've named these
__sync_synchronize_none, __sync_synchronize_cp15dmb and
__sync_synchronize_dmb. I've also added three specs files that can be
used to direct the linker to pick the appropriate implementation.
Using specs fragments for this is preferable to directing the user to
directly use --defsym as the latter has to be placed at the correct
position on the command line to be effective and the spec rule ensures
this automatically.
I've also added a default implementation of __sync_synchronize. The
default implementation will use DMB if that is available in the target
ISA, or fall back to a nul-implementation if it isn't. In the latter
case it will cause the linker (GNU LD) to emit a warning that
specifies how to pick a specific implementation. I've chosen not to
permit this default to use the CP15 solution as that has been
deprecated.
libgcc:
* config.host (arm*-*-eabi* | arm*-*-rtems*):
Add arm/t-sync to the makefile rules.
* config/arm/lib1funcs.S (__sync_synchronize_none)
(__sync_synchronize_cp15dmb, __sync_synchronize_dmb)
(__sync_synchronize): New functions.
* config/arm/t-sync: New file.
* config/arm/sync-none.specs: Likewise.
* config/arm/sync-dmb.specs: Likewise.
* config/arm/sync-cp15dmb.specs: Likewise.
Tobias Burnus [Fri, 24 Nov 2023 14:10:49 +0000 (15:10 +0100)]
OpenMP: Accept argument to depobj's destroy clause
Since OpenMP 5.2, the destroy clause takes an depend argument as argument;
for the depobj directive, it the new argument is optional but, if present,
it must be identical to the directive's argument.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_depobj): Accept optionally an argument
to the destroy clause.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_depobj): Accept optionally an argument
to the destroy clause.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_depobj): Accept optionally an argument
to the destroy clause.
libgomp/ChangeLog:
* libgomp.texi (5.2 Impl. Status): An argument to the destroy clause
is now supported.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/depobj-3.c: New test.
* gfortran.dg/gomp/depobj-3.f90: New test.
By [basic.link] p3.2.1, a non-template non-volatile const-qualified
variable is not necessarily internal linkage in a module declaration,
and rather may have module linkage (or external linkage if it is
exported, see p4.8).
PR c++/99232
gcc/cp/ChangeLog:
* decl.cc (grokvardecl): Don't mark variables attached to
modules as internal.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr99232_a.C: New test.
* g++.dg/modules/pr99232_b.C: New test.
Juzhe-Zhong [Fri, 24 Nov 2023 08:34:28 +0000 (16:34 +0800)]
RISC-V: Fix inconsistency among all vectorization hooks
This patches 200+ ICEs exposed by testing with rv64gc_zve64d.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112694
The rootcause is we disallow poly (1,1) size vectorization in preferred_simd_mode.
with this following code:
- if (TARGET_MIN_VLEN < 128 && TARGET_MAX_LMUL < RVV_M2)
- return word_mode;
However, we allow poly (1,1) size in hook:
TARGET_VECTORIZE_RELATED_MODE
TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
And also enables it in all vectorization patterns.
I was adding this into preferred_simd_mode because poly (1,1) size mode will cause
ICE in can_duplicate_and_interleave_p.
So, the alternative approach we need to block poly (1,1) size in both TARGET_VECTORIZE_RELATED_MODE
and TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES hooks and all vectorization patterns.
which is ugly approach and too much codes change.
Now, after investivation, I find it's nice that loop vectorizer can automatically block poly (1,1)
size vector in interleave vectorization with this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=730909fa858bd691095bc23655077aa13b7941a9
So, we don't need to worry about ICE in interleave vectorization and allow poly (1,1) size vector
in vectorization which fixes 200+ ICEs in zve64d march.
Our system.h and configure.ac try to accommodate valgrind-3.1, but it is
more than 15 years old at this point. As Valgrind-based checking is a
developer-oriented feature, drop the compatibility stuff and streamline
the detection.
When top-level configure has either --enable-checking=valgrind or
--enable-valgrind-annotations, we want to activate a couple of workarounds
in libcpp. They do not use anything from the Valgrind API, so just
delete all detection.
Jakub Jelinek [Fri, 24 Nov 2023 11:12:20 +0000 (12:12 +0100)]
i386: Fix ICE during cbranchv16qi4 expansion [PR112681]
The following testcase ICEs, because cbranchv16qi4 expansion calls
ix86_expand_branch with op1 being a pre-AVX unaligned memory and
ix86_expand_branch emits a xorv16qi3 instruction without making sure
the operand predicates are satisfied.
While I could manually check if the argument (or both?) doesn't
match vector_operand predicate (apparently this one or bcst_vector_operand
is used in all integral 16+ bytes *xorv*3 instructions) force it into a
register, but as all gen_xorv*3 expanders call
ix86_expand_vector_logical_operator, it seems easier to just call that
function which ensures the right thing happens. Calling the individual
gen_xorv*3 functions would mean ugly switch on the modes and using high
level expand_simple_binop here seems too high level to me.
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR target/112681
* config/i386/i386-expand.cc (ix86_expand_branch): Use
ix86_expand_vector_logical_operator to expand vector XOR rather than
gen_rtx_SET on gen_rtx_XOR.
Alex Coplan [Wed, 1 Nov 2023 21:45:39 +0000 (21:45 +0000)]
rtl-ssa: Add some helpers for removing accesses
This adds some helpers to access-utils.h for removing accesses from an
access_array. This is needed by the upcoming aarch64 load/store pair
fusion pass.
gcc/ChangeLog:
* rtl-ssa/access-utils.h (filter_accesses): New.
(remove_regno_access): New.
(check_remove_regno_access): New.
* rtl-ssa/accesses.cc (rtl_ssa::remove_note_accesses_base): Use
new filter_accesses helper.
Alex Coplan [Sat, 16 Sep 2023 08:23:52 +0000 (09:23 +0100)]
rtl-ssa: Support for inserting new insns
The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass. We then update all mem uses that require re-parenting in one go
at the end of the pass.
To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.
New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.
gcc/ChangeLog:
* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn. Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.
Jakub Jelinek [Fri, 24 Nov 2023 10:32:28 +0000 (11:32 +0100)]
match.pd: Avoid simplification into invalid BIT_FIELD_REFs [PR112673]
The following testcase is lowered by the bitint lowering pass, then
vectorizer vectorizes one of the loops in it, so we have
vect__18.6_34 = VIEW_CONVERT_EXPR<vector(4) unsigned long>(x_35(D));
_8 = BIT_FIELD_REF <vect__18.6_34, 64, 0>;
...
_18 = BIT_FIELD_REF <vect__18.6_34, 64, 64>;
etc. where x_35(D) is _BitInt(256) argument. That is valid BIT_FIELD_REF,
the first argument is a vector and it extracts the vector elements from it.
Then comes forwprop4 and simplifies that using match.pd into
_8 = (unsigned long) x_35(D);
...
_18 = BIT_FIELD_REF <x_35(D), 64, 64>;
and tree-cfg verification ICEs on the latter (though, even the first cast
is kind of undesirable after bitint lowering, we want large/huge bitints
lowered). The ICE is because if BIT_FIELD_REFs first argument has
INTEGRAL_TYPE_P, we require type_has_mode_precision_p, but that is not the
case of _BitInt(256), it has BLKmode.
The following patch fixes it by doing the BIT_FIELD_REF with VCE to
BIT_FIELD_REF simplification only if the result is valid.
2023-11-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112673
* match.pd (bit_field_ref (vce @0) -> bit_field_ref @0): Only simplify
if either @0 doesn't have scalar integral type or if it has mode
precision.