Jose E. Marchesi [Fri, 14 Jul 2023 11:54:06 +0000 (13:54 +0200)]
bpf: enable instruction scheduling
This patch adds a dummy FSM to bpf.md in order to get INSN_SCHEDULING
defined. If the later is not defined, the `combine' pass generates
paradoxical subregs of mems, which seems to then be mishandled by LRA,
resulting in invalid code.
Tested in bpf-unknown-none.
gcc/ChangeLog:
2023-07-14 Jose E. Marchesi <jose.marchesi@oracle.com>
Mikael Morin [Fri, 14 Jul 2023 12:15:51 +0000 (14:15 +0200)]
fortran: Reorder array argument evaluation parts [PR92178]
In the case of an array actual arg passed to a polymorphic array dummy
with INTENT(OUT) attribute, reorder the argument evaluation code to
the following:
- first evaluate arguments' values, and data references,
- deallocate data references associated with an allocatable,
intent(out) dummy,
- create a class container using the freed data references.
The ordering used to be incorrect between the first two items,
when one argument was deallocated before a later argument evaluated
its expression depending on the former argument. r14-2395-gb1079fc88f082d3c5b583c8822c08c5647810259 fixed it by treating
arguments associated with an allocatable, intent(out) dummy in a
separate, later block. This, however, wasn't working either if the data
reference of such an argument was depending on its own content, as
the class container initialization was trying to use deallocated
content.
This change generates class container initialization code in a separate
block, so that it is moved after the deallocation block without moving
the rest of the argument evaluation code.
This alone is not sufficient to fix the problem, because the class
container generation code repeatedly uses the full expression of
the argument at a place where deallocation might have happened
already. This is non-optimal, but may also be invalid, because the data
reference may depend on its own content. In that case the expression
can't be evaluated after the data has been deallocated.
As in the scalar case previously treated, this is fixed by saving
the data reference to a pointer before any deallocation happens,
and then only refering to the pointer. gfc_reset_vptr is updated
to take into account the already evaluated class container if it's
available.
Contrary to the scalar case, one hunk is needed to wrap the parameter
evaluation in a conditional, to avoid regressing in
optional_class_2.f90. This used to be handled by the class wrapper
construction which wrapped the whole code in a conditional. With
this change the class wrapper construction can't see the parameter
evaluation code, so the latter is updated with an additional handling
for optional arguments.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans.h (gfc_reset_vptr): Add class_container argument.
* trans-expr.cc (gfc_reset_vptr): Ditto. If a valid vptr can
be obtained through class_container argument, bypass evaluation
of e.
(gfc_conv_procedure_call): Wrap the argument evaluation code
in a conditional if the associated dummy is optional. Evaluate
the data reference to a pointer now, and replace later
references with usage of the pointer.
Mikael Morin [Fri, 14 Jul 2023 12:15:21 +0000 (14:15 +0200)]
fortran: Factor data references for scalar class argument wrapping [PR92178]
In the case of a scalar actual arg passed to a polymorphic assumed-rank
dummy with INTENT(OUT) attribute, avoid repeatedly evaluating the actual
argument reference by saving a pointer to it. This is non-optimal, but
may also be invalid, because the data reference may depend on its own
content. In that case the expression can't be evaluated after the data
has been deallocated.
There are two ways redundant expressions are generated:
- parmse.expr, which contains the actual argument expression, is
reused to get or set subfields in gfc_conv_class_to_class.
- gfc_conv_class_to_class, to get the virtual table pointer associated
with the argument, generates a new expression from scratch starting
with the frontend expression.
The first part is fixed by saving parmse.expr to a pointer and using
the pointer instead of the original expression.
The second part is fixed by adding a separate field to gfc_se that
is set to the class container expression when the expression to
evaluate is polymorphic. This needs the same field in gfc_ss_info
so that its value can be propagated to gfc_conv_class_to_class which
is modified to use that value. Finally gfc_conv_procedure saves the
expression in that field to a pointer in between to avoid the same
problem as for the first part.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans.h (struct gfc_se): New field class_container.
(struct gfc_ss_info): Ditto.
(gfc_evaluate_data_ref_now): New prototype.
* trans.cc (gfc_evaluate_data_ref_now): Implement it.
* trans-array.cc (gfc_conv_ss_descriptor): Copy class_container
field from gfc_se struct to gfc_ss_info struct.
(gfc_conv_expr_descriptor): Copy class_container field from
gfc_ss_info struct to gfc_se struct.
* trans-expr.cc (gfc_conv_class_to_class): Use class container
set in class_container field if available.
(gfc_conv_variable): Set class_container field on encountering
class variables or components, clear it on encountering
non-class components.
(gfc_conv_procedure_call): Evaluate data ref to a pointer now,
and replace later references by usage of the pointer.
Mikael Morin [Fri, 14 Jul 2023 12:15:07 +0000 (14:15 +0200)]
fortran: defer class wrapper initialization after deallocation [PR92178]
If an actual argument is associated with an INTENT(OUT) dummy, and code
to deallocate it is generated, generate the class wrapper initialization
after the actual argument deallocation.
This is achieved by passing a cleaned up expression to
gfc_conv_class_to_class, so that the class wrapper initialization code
can be isolated and moved independently after the deallocation.
PR fortran/92178
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_procedure_call): Use a separate gfc_se
struct, initalized from parmse, to generate the class wrapper.
After the class wrapper code has been generated, copy it back
depending on whether parameter deallocation code has been
generated.
libgomp/
* libgomp.texi (OMP_ALLOCATOR): Document the default values for
the traits. Add crossref to 'Memory allocation'.
(Memory allocation): Refer to OMP_ALLOCATOR for the available
traits and allocators/mem spaces; document the default value
for the pool_size trait.
All tree arguments to the PHI have the same number of occurrences, namely 1,
however it makes a big difference which comparison we test first.
Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17,
This means we end up generating 4 comparisons, while 2 would have been enough.
By keeping track of the "complexity" of the COND in each BB, (i.e. the number
of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
using a key tuple of <occurrences, complexity> we end up selecting the compare
from BB 16 and BB 18 first. BB 16 only requires 1 compare, and BB 18, after we
test BB 16 also only requires one additional compare. This change paired with
the one previous above results in the optimal 2 compares.
Most of the comparisons are still needed because the chain of
occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
_85 is _73 & vr_15 < -20. clearly given _73 needs to be true in both branches,
the only additional test needed is on vr_15, where the one test is the negation
of the other. So we don't need to do the comparison of _73 twice.
The changes in the patch reduces the overall number of compares by one, but has
a bigger effect on the dependency chain.
Previously we would generate 5 instructions chain:
As explained in that commit, ifconvert sorts PHI args in increasing number of
occurrences in order to reduce the number of comparisons done while
traversing the tree.
The remaining task that this patch fixes is dealing with the long chain of
comparisons that can be created from phi nodes, particularly when they share
any common successor (classical example is a diamond node).
on a PHI-node the true and else branches carry a condition, true will
carry `a` and false `~a`. The issue is that at the moment GCC tests both `a`
and `~a` when the phi node has more than 2 arguments. Clearly this isn't
needed. The deeper the nesting of phi nodes the larger the repetition.
As an example, for
foo (int *f, int d, int e)
{
for (int i = 0; i < 1024; i++)
{
int a = f[i];
int t;
if (a < 0)
t = 1;
else if (a < e)
t = 1 - a * d;
else
t = 0;
f[i] = t;
}
}
Which correctly elides the test of _21. This is done by borrowing the
vectorizer's helper functions to limit predicate mask usages. Ifcvt will chain
conditionals on the false edge (unless specifically inverted) so this patch on
creating cond a ? b : c, will register ~a when traversing c. If c is a
conditional then c will be simplified to the smaller possible predicate given
the assumptions we already know to be true.
gcc/ChangeLog:
PR tree-optimization/109154
* tree-if-conv.cc (gen_simplified_condition,
gen_phi_nest_statement): New.
(gen_phi_arg_condition, predicate_scalar_phi): Use it.
gcc/testsuite/ChangeLog:
PR tree-optimization/109154
* gcc.dg/vect/vect-ifcvt-19.c: New test.
Richard Biener [Fri, 14 Jul 2023 08:01:39 +0000 (10:01 +0200)]
Provide extra checking for phi argument access from edge
The following adds checking that the edge we query an associated
PHI arg for is related to the PHI node. Triggered by questionable
code in one of my reviews.
* gimple.h (gimple_phi_arg): New const overload.
(gimple_phi_arg_def): Make gimple arg const.
(gimple_phi_arg_def_from_edge): New inline function.
* tree-phinodes.h (gimple_phi_arg_imm_use_ptr_from_edge):
Likewise.
* tree-ssa-operands.h (PHI_ARG_DEF_FROM_EDGE): Direct to
new inline function.
(PHI_ARG_DEF_PTR_FROM_EDGE): Likewise.
libgomp: Fix allocator handling for Linux when libnuma is not available
Follow up to r14-2462-g450b05ce54d3f0. The case that libnuma was not
available at runtime was not properly handled; now it falls back to
the normal malloc.
libgomp/
* allocator.c (omp_init_allocator): Check whether symbol from
dlopened libnuma is available before using libnuma for
allocations.
Die Li [Fri, 14 Jul 2023 02:02:05 +0000 (02:02 +0000)]
RISC-V: Remove the redundant expressions in the and<mode>3.
When generating the gen_and<mode>3 function based on the and<mode>3
template, it produces the expression emit_insn (gen_rtx_SET (operand0,
gen_rtx_AND (<mode>, operand1, operand2)));, which is identical to the
portion I removed in this patch. Therefore, the redundant portion can be
deleted.
Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv.md: Remove redundant portion in and<mode>3.
When trying to bootstrap current trunk on macOS 14.0 beta 3 with Xcode
15 beta 4, the build failed running mklink in stage 2:
unset CC ; m2/boot-bin/mklink -s --langc++ --exit --name m2/mc-boot/main.cc
/vol/gcc/src/hg/master/darwin/gcc/m2/init/mcinit
dyld[55825]: Library not loaded: /vol/gcc/lib/libstdc++.6.dylib
While it's unclear to me why this only happens on macOS 14, the problem
is clear: unlike other C++ executables, mklink isn't linked with
-static-libstdc++ which is passed in from toplevel in LDFLAGS.
This patch fixes that and allows the build to continue.
Bootstrapped on x86_64-apple-darwin23.0.0, i386-pc-solaris2.11, and
sparc-sun-solaris2.11.
Mikael Morin [Thu, 13 Jul 2023 19:23:44 +0000 (21:23 +0200)]
fortran: Release symbols in reversed order [PR106050]
Release symbols in reversed order wrt the order they were allocated.
This fixes an error recovery ICE in the case of a misplaced
derived type declaration. Such a declaration creates nested
symbols, one for the derived type and one for each type parameter,
which should be immediately released as the declaration is
rejected. This breaks if the derived type is released first.
As the type parameter symbols are in the namespace of the derived
type, releasing the derived type releases the type parameters, so
one can't access them after that, even to release them. Hence,
the type parameters should be released first.
PR fortran/106050
gcc/fortran/ChangeLog:
* symbol.cc (gfc_restore_last_undo_checkpoint): Release symbols
in reverse order.
Darwin: Use -platform_version when available [PR110624].
Later versions of the static linker support a more flexible flag to
describe the OS, OS version and SDK used to build the code. This
replaces the functionality of '-mmacosx_version_min' (which is now
deprecated, leading to the diagnostic described in the PR).
We now use the platform_version flag when available which avoids the
diagnostic.
Carl Love [Thu, 13 Jul 2023 17:44:43 +0000 (13:44 -0400)]
rs6000, Add return value to __builtin_set_fpscr_rn
Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions. A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.
The value __SET_FPSCR_RN_RETURNS_FPSCR__ is defined if
__builtin_set_fpscr_rn returns a double.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/test_fpscr_rn_builtin.c: Rename to
test_fpscr_rn_builtin_1.c. Add comment.
* gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
Jonathan Wakely [Thu, 13 Jul 2023 09:44:57 +0000 (10:44 +0100)]
libstdc++: std::stoi etc. do not need C99 <stdlib.h> support [PR110653]
std::stoi, std::stol, std::stoul, and std::stod only depend on C89
functions, so don't need to be guarded by _GLIBCXX_USE_C99_STDLIB
std::stoll and std::stoull don't need C99 strtoll and strtoull if
sizeof(long) == sizeof(long long).
std::stold doesn't need C99 strtold if DBL_MANT_DIG == LDBL_MANT_DIG.
This only applies to the narrow character overloads, the wchar_t
overloads depend on a separate _GLIBCXX_USE_C99_WCHAR macro and none of
them can be implemented in C89 easily.
libstdc++-v3/ChangeLog:
PR libstdc++/110653
* include/bits/basic_string.h (stoi, stol, stoul, stod): Do not
depend on _GLIBCXX_USE_C99_STDLIB.
[__LONG_WIDTH__ == __LONG_LONG_WIDTH__] (stoll, stoull): Define
in terms of stol and stoul respectively.
[__DBL_MANT_DIG__ == __LDBL_MANT_DIG__] (stold): Define in terms
of stod.
Andrew Pinski [Wed, 12 Jul 2023 07:33:14 +0000 (00:33 -0700)]
Fix part of PR 110293: `A NEEQ (A NEEQ CST)` part
This fixes part of PR 110293, for the outer comparison case
being `!=` or `==`. In turn PR 110539 is able to be optimized
again as the if statement for `(a&1) == ((a & 1) != 0)` gets optimized
to `false` early enough to allow FRE/DOM to do a CSE for memory store/load.
OK? Bootstrapped and tested on x86_64-linux with no regressions.
gcc/ChangeLog:
PR tree-optimization/110293
PR tree-optimization/110539
* match.pd: Expand the `x != (typeof x)(x == 0)`
pattern to handle where the inner and outer comparsions
are either `!=` or `==` and handle other constants
than 0.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr110293-1.c: New test.
* gcc.dg/tree-ssa/pr110539-1.c: New test.
* gcc.dg/tree-ssa/pr110539-2.c: New test.
* gcc.dg/tree-ssa/pr110539-3.c: New test.
* gcc.dg/tree-ssa/pr110539-4.c: New test.
[RA][PR109520]: Catch error when there are no enough registers for asm insn
Asm insn unlike other insns can have so many operands whose
constraints can not be satisfied. It results in LRA cycling for such
test case. The following patch catches such situation and reports the
problem.
PR middle-end/109520
gcc/ChangeLog:
* lra-int.h (lra_insn_recog_data): Add member asm_reloads_num.
(lra_asm_insn_error): New prototype.
* lra.cc: Include rtl_error.h.
(lra_set_insn_recog_data): Initialize asm_reloads_num.
(lra_asm_insn_error): New func whose code is taken from ...
* lra-assigns.cc (lra_split_hard_reg_for): ... here. Use lra_asm_insn_error.
* lra-constraints.cc (curr_insn_transform): Check reloads nummber for asm.
SSA MATH: Support COND_LEN_FMA for floating-point math optimization
Hi, Richard and Richi.
Previous patch we support COND_LEN_* binary operations. However, we didn't
support COND_LEN_* ternary.
Now, this patch support COND_LEN_* ternary. Consider this following case:
__attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \
TYPE *__restrict a, \
TYPE *__restrict b,\
TYPE *__restrict c, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] += a[i] * b[i]; \
}
David Edelsohn [Wed, 12 Jul 2023 18:31:20 +0000 (14:31 -0400)]
testsuite: dg-require LTO for libgomp LTO tests
Some test cases in libgomp testsuite pass -flto as an option, but
the testcases do not require LTO target support. This patch adds
the necessary DejaGNU requirement for LTO support to the testcases..
Pan Li [Wed, 12 Jul 2023 05:38:42 +0000 (13:38 +0800)]
RISC-V: Refactor riscv mode after for VXRM and FRM
When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.
We will also prepare more test cases in another PATCH.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (vxrm_rtx): New static var.
(frm_rtx): Ditto.
(global_state_unknown_p): Removed.
(riscv_entity_mode_after): Removed.
(asm_insn_p): New function.
(vxrm_unknown_p): New function for fixed-point.
(riscv_vxrm_mode_after): Ditto.
(frm_unknown_dynamic_p): New function for floating-point.
(riscv_frm_mode_after): Ditto.
(riscv_mode_after): Leverage new functions.
Pan Li [Wed, 12 Jul 2023 15:01:39 +0000 (23:01 +0800)]
RISC-V: Add more tests for RVV floating-point FRM.
Add more test cases include both the asm check and run for RVV FRM.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: New test.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
This patch adjusts the cost handling on VMAT_CONTIGUOUS in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more. It removes function
vect_model_load_cost which becomes useless and unreachable
now.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Remove.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without
calling vect_model_load_cost.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
As the affected test case gcc.target/i386/pr70021.c shows,
the previous costing can under-cost the total generated
vector loads as for VMAT_CONTIGUOUS_PERMUTE function
vect_model_load_cost doesn't consider the group size which
is considered as vec_num during the transformation.
This patch makes the count of vector load in costing become
consistent with what we generates during the transformation.
To be more specific, for the given test case, for memory
access b[i_20], it costed for 2 vector loads before,
with this patch it costs 8 instead, it matches the final
count of generated vector loads basing from b. This costing
change makes cost model analysis feel it's not profitable
to vectorize the first loop, so this patch adjusts the test
case without vect cost model any more.
But note that this test case also exposes something we can
improve further is that although the number of vector
permutation what we costed and generated are consistent,
but DCE can further optimize some unused permutation out,
it would be good if we can predict that and generate only
those necessary permutations.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
handle memory_access_type VMAT_CONTIGUOUS, remove some
VMAT_CONTIGUOUS_PERMUTE related handlings.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
without calling vect_model_load_cost.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
This change makes us not miscount some required vector
permutation as the associated test case shows.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get
VMAT_CONTIGUOUS_REVERSE any more.
(vectorizable_load): Adjust the costing handling on
VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
This patch adjusts the cost handling on
VMAT_LOAD_STORE_LANES in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
It follows what we do in the function vect_model_load_cost,
and shouldn't have any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_LOAD_STORE_LANES without calling vect_model_load_cost.
(vectorizable_load): Remove VMAT_LOAD_STORE_LANES related handling and
assert it will never get VMAT_LOAD_STORE_LANES.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_load. We don't call function
vect_model_load_cost for it any more.
It's mainly for gather loads with IFN or emulated gather
loads, it follows the handlings in function
vect_model_load_cost. This patch shouldn't have any
functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_GATHER_SCATTER without calling vect_model_load_cost.
(vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER,
remove VMAT_GATHER_SCATTER related handlings and the related parameter
gs_info.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_load. We
don't call function vect_model_load_cost for them any more.
As PR82255 shows, we don't always need a vector construction
there, moving costing next to the transform can make us only
cost for vector construction when it's actually needed.
Besides, it can count the number of loads consistently for
some cases.
PR tree-optimization/82255
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling
on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling
vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP any more, and remove their related handlings.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test.
2023-06-13 Bill Schmidt <wschmidt@linux.ibm.com>
Kewen Lin <linkw@linux.ibm.com>
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_INVARIANT
This patch adjusts the cost handling on VMAT_INVARIANT in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more.
To make the costing on VMAT_INVARIANT better, this patch is
to query hoist_defs_of_uses for hoisting decision, and add
costs for different "where" based on it. Currently function
hoist_defs_of_uses would always hoist the defs of all SSA
uses, adding one argument HOIST_P aims to avoid the actual
hoisting during costing phase.
gcc/ChangeLog:
* tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
(vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
hoisting decision and without calling vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
and remove VMAT_INVARIANT related handlings.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
This patch adds one extra argument cost_vec to function
vect_build_gather_load_calls, so that we can do costing
next to the tranform in vect_build_gather_load_calls.
For now, the implementation just follows the handlings in
vect_model_load_cost, it isn't so good, so placing one
FIXME for any further improvement. This patch should not
cause any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings
on costing with one extra argument cost_vec.
(vectorizable_load): Adjust the call to vect_build_gather_load_calls.
(vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
gs_info.decl set any more.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Move vect_model_load_cost next to the transform in vectorizable_load
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_load_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types, hope it can make the
subsequent patches easy to review. This patch should not
have any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Move and duplicate the call
to vect_model_load_cost down to some different transform paths
according to the handlings of different vect_memory_access_types.
Kewen Lin [Thu, 13 Jul 2023 02:22:26 +0000 (21:22 -0500)]
tree: Hide wi::from_mpz from GENERATOR_FILE
Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
This patch is to fix it and avoid some possible build
errors.
gcc/ChangeLog:
* tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
mklog: Add --append option to auto add generate ChangeLog to patch file
This tiny patch add --append option to mklog.py that support add generated
change-log to the corresponding patch file. With this option there is no need
to manually copy the generated change-log to the patch file. e.g.:
Run `mklog.py --append /path/to/this/patch` will add the generated change-log
to the right place of the /path/to/this/patch file.
RISC-V: RISC-V: Support gather_load/scatter RVV auto-vectorization
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.
All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
3. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
4. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
Base on the disscussions with Richards, we don't lower vlse/vsse in RISC-V backend for strided load/store.
Instead, we leave it to the middle-end to handle that.
* gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c:
New test.
Jonathan Wakely [Wed, 12 Jul 2023 13:40:19 +0000 (14:40 +0100)]
libstdc++: Check conversion from filesystem::path to wide strings [PR95048]
The testcase added for this bug only checks conversion from wide strings
on construction, but the fix also covered conversion to wide stings via
path::wstring(). Add checks for that, and u16string() and u32string().
Jonathan Wakely [Fri, 7 Jul 2023 13:36:06 +0000 (14:36 +0100)]
libstdc++: Compile basic_file_stdio.cc for LFS
Instead of using fopen64, lseek64, and fstat64 we can just include
<bits/largefile-config.h> which defines _FILE_OFFSET_BITS=64 (and
similar target-specific macros). Then we can just use fopen, lseek and
fstat as normal, and they'll be the LFS versions if supported by the
target.
libstdc++-v3/ChangeLog:
* config/io/basic_file_stdio.cc: Define LFS macros.
(__basic_file<char>::open): Use fopen unconditionally.
(get_file_offset): Use lseek unconditionally.
(__basic_file<char>::seekoff): Likewise.
(__basic_file<char>::showmanyc): Use fstat unconditionally.
When configured with --enable-cstdio=stdio_pure we need to consistently
use fseek and not mix seeks on the file descriptor with reads and writes
on the FILE stream.
There are also a number of bugs related to error handling and return
values, because fread and fwrite return 0 on error, not -1, and fseek
returns 0 on success, not the file offset.
libstdc++-v3/ChangeLog:
PR libstdc++/110574
* acinclude.m4 (GLIBCXX_CHECK_LFS): Check for fseeko and ftello
and define _GLIBCXX_USE_FSEEKO_FTELLO.
* config.h.in: Regenerate.
* configure: Regenerate.
* config/io/basic_file_stdio.cc (xwrite) [_GLIBCXX_USE_STDIO_PURE]:
Check for fwrite error correctly.
(__basic_file<char>::xsgetn) [_GLIBCXX_USE_STDIO_PURE]: Check for
fread error correctly.
(get_file_offset): New function.
(__basic_file<char>::seekoff) [_GLIBCXX_USE_STDIO_PURE]: Use
fseeko if available. Use get_file_offset instead of return value
of fseek.
(__basic_file<char>::showmanyc): Use get_file_offset.
IRA+LRA: Change return type of predicate functions from int to bool
gcc/ChangeLog:
* ira.cc (equiv_init_varies_p): Change return type from int to bool
and adjust function body accordingly.
(equiv_init_movable_p): Ditto.
(memref_used_between_p): Ditto.
* lra-constraints.cc (valid_address_p): Ditto.
Aldy Hernandez [Thu, 29 Jun 2023 09:27:49 +0000 (11:27 +0200)]
[range-op] Enable value/mask propagation in range-op.
Throw the switch in range-ops to make full use of the value/mask
information instead of only the nonzero bits. This will cause most of
the operators implemented in range-ops to use the value/mask
information calculated by CCP's bit_value_binop() function which
range-ops uses. This opens up more optimization opportunities.
In follow-up patches I will change the global range setter
(set_range_info) to be able to save the value/mask pair, and make both
CCP and IPA be able to save the known ones bit info, instead of
throwing it away.
gcc/ChangeLog:
* range-op.cc (irange_to_masked_value): Remove.
(update_known_bitmask): Update irange value/mask pair instead of
only updating nonzero bits.
Jan Hubicka [Wed, 12 Jul 2023 15:22:03 +0000 (17:22 +0200)]
Improve profile update in loop-ch
Improve profile update in loop-ch to handle situation where duplicated header
has loop invariant test. In this case we konw that all count of the exit edge belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to duplicaiton
I also added code to turn those conditionals to constants so we do not need later
jump threading pass to clean up.
This made me to work out that the propagation was buggy in few aspects
1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
to be IV-like when they are not
2) it did not check for novops calls that are not required to return same
value on every invocation.
3) I also added check for asm statement since those are not necessarily
reproducible either.
I would like to do more changes, but tried to prevent this patch from
snowballing. The analysis of what statements will remain after duplication can
be improved. I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then. Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.
I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.
Bootstrapped/regtsted x86_64-linux, OK?
Honza
gcc/ChangeLog:
* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
parameter and rewrite profile updating code to handle edges elimination.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
(loop_iv_derived_p): New function.
(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
of PHIs and propagation of IV derived variables.
(ch_base::copy_headers): Pass around the invariant edges hash set.
Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching
ifcvt: Change return type of predicate functions from int to bool
Also change some internal variables and function arguments from int to bool.
gcc/ChangeLog:
* ifcvt.cc (cond_exec_changed_p): Change variable to bool.
(last_active_insn): Change "skip_use_p" function argument to bool.
(noce_operand_ok): Change return type from int to bool.
(find_cond_trap): Ditto.
(block_jumps_and_fallthru_p): Change "fallthru_p" and
"jump_p" variables to bool.
(noce_find_if_block): Change return type from int to bool.
(cond_exec_find_if_block): Ditto.
(find_if_case_1): Ditto.
(find_if_case_2): Ditto.
(dead_or_predicable): Ditto. Change "reversep" function arg to bool.
(block_jumps_and_fallthru): Rename from block_jumps_and_fallthru_p.
(cond_exec_process_insns): Change return type from int to bool.
Change "mod_ok" function arg to bool.
(cond_exec_process_if_block): Change return type from int to bool.
Change "do_multiple_p" function arg to bool. Change "then_mod_ok"
variable to bool.
(noce_emit_store_flag): Change return type from int to bool.
Change "reversep" function arg to bool. Change "cond_complex"
variable to bool.
(noce_try_move): Change return type from int to bool.
(noce_try_ifelse_collapse): Ditto.
(noce_try_store_flag): Ditto. Change "reversep" variable to bool.
(noce_try_addcc): Change return type from int to bool. Change
"subtract" variable to bool.
(noce_try_store_flag_constants): Change return type from int to bool.
(noce_try_store_flag_mask): Ditto. Change "reversep" variable to bool.
(noce_try_cmove): Change return type from int to bool.
(noce_try_cmove_arith): Ditto. Change "is_mem" variable to bool.
(noce_try_minmax): Change return type from int to bool. Change
"unsignedp" variable to bool.
(noce_try_abs): Change return type from int to bool. Change
"negate" variable to bool.
(noce_try_sign_mask): Change return type from int to bool.
(noce_try_move): Ditto.
(noce_try_store_flag_constants): Ditto.
(noce_try_cmove): Ditto.
(noce_try_cmove_arith): Ditto.
(noce_try_minmax): Ditto. Change "unsignedp" variable to bool.
(noce_try_bitop): Change return type from int to bool.
(noce_operand_ok): Ditto.
(noce_convert_multiple_sets): Ditto.
(noce_convert_multiple_sets_1): Ditto.
(noce_process_if_block): Ditto.
(check_cond_move_block): Ditto.
(cond_move_process_if_block): Ditto. Change "success_p"
variable to bool.
(rest_of_handle_if_conversion): Change return type to void.
VECT: Apply COND_LEN_* into vectorizable_operation
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spurious traps,
we need to restrict the operation to active lanes. Note that this
specifically doesn't apply to unhoisted invariants, since they
operate on the same value for every lane.
Similarly, if this operation is part of a reduction, a fully-masked
loop should only change the active lanes of the reduction chain,
keeping the inactive lanes as-is. */
bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);
For mask_out_inactive is true with length loop control.
So, we can these 2 following cases:
1. Integer division:
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] = a[i] % b[i]; \
}
#define TEST_ALL() \
TEST_TYPE(int8_t) \
TEST_ALL()
* libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to
'Memory allocation' section which contains the full status.
(TR11): Remove differently worded duplicated entry.
Roger Sayle [Wed, 12 Jul 2023 13:14:15 +0000 (14:14 +0100)]
i386: Fix FAIL of gcc.target/i386/pr91681-1.c
The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c. The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.
I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
define_insn_and_split derived from *add<dwi>3_doubleword_concat
and *add<dwi>3_doubleword_zext.
This patch fixes the regression PR target/110598 caused by my recent
addition of a peephole2. The intention of that optimization was to
simplify zeroing a register, followed by an IOR, XOR or PLUS operation
on it into a move, or as described in the comment:
;; Peephole2 rega = 0; rega op= regb into rega = regb.
The issue is that I'd failed to consider the (rare and unusual) case,
where regb is rega, where the transformation leads to the incorrect
"rega = rega", when it should be "rega = 0". The minimal fix is to
add a !reg_mentioned_p check to the recent peephole2.
In addition to resolving the regression, I've added a second peephole2
to optimize the problematic case above, which contains a false
dependency and is therefore tricky to optimize elsewhere. This is an
improvement over GCC 13, for example, that generates the redundant:
xorl %edx, %edx
xorq %rdx, %rdx
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110598
* config/i386/i386.md (peephole2): Check !reg_mentioned_p when
optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS].
(peephole2): Simplify rega = 0; rega op= rega cases.
gcc/testsuite/ChangeLog
PR target/110598
* gcc.target/i386/pr110598.c: New test case.
Roger Sayle [Wed, 12 Jul 2023 13:09:54 +0000 (14:09 +0100)]
i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.
I've come up with an alternate/complementary/supplementary fix to the
patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
for generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.
You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last. When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass). Retaining this clause ordering
should minimize the lines changed if things change in future.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_compare): If
testing a TImode SUBREG of a 128-bit vector register against
zero, use a PTEST instruction instead of first moving it to
a pair of scalar registers.
Robin Dapp [Mon, 10 Jul 2023 20:00:08 +0000 (22:00 +0200)]
genopinit: Allow more than 256 modes.
Upcoming changes for RISC-V will have us exceed 255 modes or 8 bits.
This patch increases the limit to 10 bits and adjusts the hashing
function for the gen* and optabs-query lookups accordingly.
Consequently, the number of optabs is limited to 4095.
gcc/ChangeLog:
* genopinit.cc (main): Adjust maximal number of optabs and
machine modes.
* gensupport.cc (find_optab): Shift optab by 20 and mode by
10 bits.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
libgomp: Use libnuma for OpenMP's partition=nearest allocation trait
As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.
The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation. However, when
running with 'numactl --preferred=<node> ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).
libgomp/ChangeLog:
* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
add GOMP_MEMKIND_LIBNUMA.
(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
(omp_init_allocator): Handle partition=nearest with libnuma if avail.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
needed.
* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
(Memory allocation): Renamed from 'Memory allocation with libmemkind';
updated for libnuma usage.
* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
* testsuite/libgomp.c-c++-common/alloc-12.c: New test.
gfortran: Allow ref'ing PDT's len() in parameter-initializer.
Fix declaring a parameter initialized using a pdt_len reference
not simplifying the reference to a constant.
2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org>
gcc/fortran/ChangeLog:
PR fortran/102003
* expr.cc (find_inquiry_ref): Replace len of pdt_string by
constant.
(simplify_ref_chain): Ensure input to find_inquiry_ref is
NULL.
(gfc_match_init_expr): Prevent PDT analysis for function calls.
(gfc_pdt_find_component_copy_initializer): Get the initializer
value for given component.
* gfortran.h (gfc_pdt_find_component_copy_initializer): New
function.
* simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt
component ref or constant.
Richard Biener [Wed, 12 Jul 2023 09:19:58 +0000 (11:19 +0200)]
tree-optimization/110630 - enhance SLP permute support
The following enhances the existing lowpart extraction support for
SLP VEC_PERM nodes to cover all vector aligned extractions. This
allows the existing bb-slp-pr95839.c testcase to be vectorized
with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase
with SSE2.
PR tree-optimization/110630
* tree-vect-slp.cc (vect_add_slp_permutation): New
offset parameter, honor that for the extract code generation.
(vectorizable_slp_permutation_1): Handle offsetted identities.
* gcc.dg/vect/bb-slp-pr95839.c: Make stricter.
* gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
RISC-V: Support integer mult highpart auto-vectorization
This patch is adding an obvious missing mult_high auto-vectorization pattern.
Consider this following case:
void __attribute__ ((noipa)) \
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \
{ \
for (int i = 0; i < count; ++i) \
dst[i] = src[i] / 17; \
}
T (int32_t) \
TEST_ALL (DEF_LOOP)
Before this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,17
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v2,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vdiv.vv v1,v1,v2
sub a2,a2,a5
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
After this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,2021163008
addiw a5,a5,-1927
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v3,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v2,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vmulh.vv v1,v2,v3
sub a2,a2,a5
vsra.vi v2,v2,31
vsra.vi v1,v1,3
vsub.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
4 more instructions are generated, we belive it's much better than before
since division is very slow in the hardward.
gcc/ChangeLog:
* config/riscv/autovec.md (smul<mode>3_highpart): New pattern.
(umul<mode>3_highpart): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
Jan Beulich [Wed, 12 Jul 2023 10:16:08 +0000 (12:16 +0200)]
x86: improve fast bfloat->float conversion
There's nothing AVX512BW-ish in here, so no reason to use Yw as the
constraints for the AVX alternative. Furthermore by using the 512-bit
form of VPSSLD (in a new alternative) all 32 registers can be used
directly by the insn without AVX512VL needing to be enabled.
Also adjust the originally last alternative's "prefix" attribute to
maybe_evex.
gcc/
* config/i386/i386.md (extendbfsf2_1): Add new AVX512F
alternative. Adjust original last alternative's "prefix"
attribute to maybe_evex.
Jan Beulich [Wed, 12 Jul 2023 10:14:08 +0000 (12:14 +0200)]
x86: make better use of VBROADCASTSS / VPBROADCASTD
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
never longer (yet sometimes shorter) than the corresponding VSHUFPS /
VPSHUFD, due to the immediate operand of the shuffle insns balancing the
(uniform) need for VEX3 in the broadcast ones. When EVEX encoding is
respective the broadcast insns are always shorter.
Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.
While touching this anyway, switch to consistently using "sseshuf1" in
the "type" attributes for all shuffle forms.
gcc/
* config/i386/sse.md (vec_dupv4sf): Make first alternative use
vbroadcastss for AVX2. New AVX512F alternative.
(*vec_dupv4si): New AVX2 and AVX512F alternatives using
vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute.
riscv: thead: Factor out XThead*-specific peepholes
This patch moves the XThead*-specific peephole passes
into thead-peephole.md with the intend to keep vendor-specific
code separated from RISC-V standard code.
This patch does not contain any functional changes.
gcc/ChangeLog:
* config/riscv/peephole.md: Remove XThead* peephole passes.
* config/riscv/thead.md: Include thead-peephole.md.
* config/riscv/thead-peephole.md: New file.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
RISC-V does currently not support index registers.
However, there are some vendor extensions that specify them.
Let's do the necessary changes in the backend so that we can
add support for such a vendor extension in the future.
This is a non-functional change without any intended side-effects.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_regno_ok_for_index_p):
New prototype.
(riscv_index_reg_class): Likewise.
* config/riscv/riscv.cc (riscv_regno_ok_for_index_p): New function.
(riscv_index_reg_class): New function.
* config/riscv/riscv.h (INDEX_REG_CLASS): Call new function
riscv_index_reg_class().
(REGNO_OK_FOR_INDEX_P): Call new function
riscv_regno_ok_for_index_p().
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
riscv: Move address classification info types to riscv-protos.h
enum riscv_address_type and struct riscv_address_info are used
to store address classification information. Let's move this types
into our common header file in order to share them with other
compilation units.
This is a non-functional change without any intendet side-effects.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum riscv_address_type):
New location of type definition.
(struct riscv_address_info): Likewise.
* config/riscv/riscv.cc (enum riscv_address_type):
Old location of type definition.
(struct riscv_address_info): Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Define a Xmode macro that specifies the registers size (XLEN)
similar to Pmode. This allows the backend code to write generic
RV32/RV64 C code (under certain circumstances).
gcc/ChangeLog:
* config/riscv/riscv.h (Xmode): New macro.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
We have the following situation for MEM RTX objects:
* TARGET_PRINT_OPERAND expands to riscv_print_operand()
* This falls into the default case (unknown or on letter) of the outer
switch-case-block and the MEM case of the inner switch-case-block and
calls output_address() in final.cc with XEXP (op, 0) (the address)
* This calls targetm.asm_out.print_operand_address() which is
riscv_print_operand_address()
* riscv_print_operand_address() is targeting the address of a MEM RTX
* riscv_print_operand_address() calls riscv_print_operand() for the offset
and directly prints the register if the address is classified as ADDRESS_REG
* This falls into the default case (unknown or on letter) of the outer
switch-case-block and the default case of the inner switch-case-block and
calls output_addr_const().
However, since we know that offset must be a CONST_INT (which will be
followed by a '(<reg>)' string), there is no need to call
riscv_print_operand() for the offset.
Instead we can take the shortcut and use output_addr_const().
This change also brings the code in riscv_print_operand_address()
in line with the other cases, where output_addr_const() is used
to print offsets.
Tested with GCC regression test suite and SPEC intrate.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand_address): Use
output_addr_const rather than riscv_print_operand.
The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.
gcc/ChangeLog:
* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extu
The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.
gcc/ChangeLog:
* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
liuhongt [Thu, 29 Jun 2023 06:25:28 +0000 (14:25 +0800)]
Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'
False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.
gcc/ChangeLog:
PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New
define_insn.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to
define_insn_and_split to avoid false dependence.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot<mode>3): Ditto.
(iornot<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110438.c: New test.
* gcc.target/i386/pr100711-6.c: Adjust testcase.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
Harald Anlauf [Tue, 11 Jul 2023 19:21:25 +0000 (21:21 +0200)]
Fortran: formal symbol attributes for intrinsic procedures [PR110288]
gcc/fortran/ChangeLog:
PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.
gcc/testsuite/ChangeLog:
PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.
cfg+gcse: Change return type of predicate functions from int to bool
Also change some internal variables from int to bool.
gcc/ChangeLog:
* cfghooks.cc (verify_flow_info): Change "err" variable to bool.
* cfghooks.h (struct cfg_hooks): Change return type of
verify_flow_info from integer to bool.
* cfgrtl.cc (can_delete_note_p): Change return type from int to bool.
(can_delete_label_p): Ditto.
(rtl_verify_flow_info): Change return type from int to bool
and adjust function body accordingly. Change "err" variable to bool.
(rtl_verify_flow_info_1): Ditto.
(free_bb_for_insn): Change return type to void.
(rtl_merge_blocks): Change "b_empty" variable to bool.
(try_redirect_by_replacing_jump): Change "fallthru" variable to bool.
(verify_hot_cold_block_grouping): Change return type from int to bool.
Change "err" variable to bool.
(rtl_verify_edges): Ditto.
(rtl_verify_bb_insns): Ditto.
(rtl_verify_bb_pointers): Ditto.
(rtl_verify_bb_insn_chain): Ditto.
(rtl_verify_fallthru): Ditto.
(rtl_verify_bb_layout): Ditto.
(purge_all_dead_edges): Change "purged" variable to bool.
* cfgrtl.h (free_bb_for_insn): Change return type from int to void.
* postreload-gcse.cc (expr_hasher::equal): Change "equiv_p" to bool.
(load_killed_in_block_p): Change return type from int to bool
and adjust function body accordingly.
(oprs_unchanged_p): Return true/false.
(rest_of_handle_gcse2): Change return type to void.
* tree-cfg.cc (gimple_verify_flow_info): Change return type from
int to bool. Change "err" variable to bool.
Carl Love [Tue, 11 Jul 2023 16:28:47 +0000 (12:28 -0400)]
rs6000: Update the vsx-vector-6.* tests.
The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c. The .h file
contains a large number of vsx vector built-in tests. The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor. The tests are compile only.
This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in. There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.
gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
testsuite: Require vectors of doubles for pr97428.c
The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.
gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
Gaius Mulley [Tue, 11 Jul 2023 14:28:42 +0000 (15:28 +0100)]
[modula2] Improve uninitialized variable analysis by combining basic blocks
This patch combines basic blocks for static analysis of uninitialized
variables providing that they are not the top of a loop, are not reached
by a conditional and are not reached after a procedure call. It also
avoids checking array accesses for static analysis. Finally the patch
adds switch modifiers to allow static analysis to include conditional
branches for subsequent basic block analysis.
gcc/ChangeLog:
* doc/gm2.texi (-Wuninit-variable-checking=) New item.
gcc/m2/ChangeLog:
* gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New
parameter ScopeSym.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New
parameter ScopeSym.
(InitBasicBlocksFromRange): New parameter ScopeSym. Call
ConvertQuads2BasicBlock with ScopeSym.
(DisplayBasicBlocks): Uncomment.
* gm2-compiler/M2Code.mod: Replace VariableAnalysis with
ScopeBlockVariableAnalysis.
(InitialDeclareAndOptiomize): Add parameter scope.
(SecondDeclareAndOptimize): Add parameter scope.
* gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope
parameter to DeclareTypesConstantsProceduresInRange.
(DeclareTypesConstantsProceduresInRange): New parameter scope.
Pass scope to DisplayQuadRange. Reformatted.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2Optimize.mod (KnownReachable): New parameter
scope.
* gm2-compiler/M2Options.def (SetUninitVariableChecking): Add
arg parameter.
* gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add
arg parameter and set boolean UninitVariableChecking and
UninitVariableConditionalChecking.
(UninitVariableConditionalChecking): New boolean set to FALSE.
* gm2-compiler/M2Quads.def (IsGoto): New procedure function.
(DisplayQuadRange): Add scope parameter.
(LoopAnalysis): Add scope parameter.
* gm2-compiler/M2Quads.mod: Import PutVarArrayRef.
(IsGoto): New procedure function.
(LoopAnalysis): Add scope parameter and use MetaErrorT1 instead
of WarnStringAt.
(BuildStaticArray): Call PutVarArrayRef.
(BuildDynamicArray): Call PutVarArrayRef.
(DisplayQuadRange): Add scope parameter.
(GetM2OperatorDesc): Add relational condition cases.
* gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter.
* gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to
DisplayQuadRange.
(ForeachScopeBlockDo): Pass scopeSym to p.
* gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ...
(ScopeBlockVariableAnalysis): ... this.
* gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add
scope parameter.
(bbEntry): New pointer to record.
(bbArray): New array.
(bbFreeList): New variable.
(errorList): New list.
(IssueConditional): New procedure.
(GenerateNoteFlow): New procedure.
(IssueWarning): New procedure.
(IsUniqueWarning): New procedure.
(CheckDeferredRecordAccess): Re-implement.
(CheckBinary): Add warning and lst parameters.
(CheckUnary): Add warning and lst parameters.
(CheckXIndr): Add warning and lst parameters.
(CheckIndrX): Add warning and lst parameters.
(CheckBecomes): Add warning and lst parameters.
(CheckComparison): Add warning and lst parameters.
(CheckReadBeforeInitQuad): Add warning and lst parameters to all
Check procedures. Add all case quadruple clauses.
(FilterCheckReadBeforeInitQuad): Add warning and lst parameters.
(CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters.
(bbArrayKill): New procedure.
(DumpBBEntry): New procedure.
(DumpBBArray): New procedure.
(DumpBBSequence): New procedure.
(TestBBSequence): New procedure.
(CreateBBPermultations): New procedure.
(ScopeBlockVariableAnalysis): New procedure.
(GetOp3): New procedure.
(GenerateCFG): New procedure.
(NewEntry): New procedure.
(AppendEntry): New procedure.
(init): Initialize bbFreeList and errorList.
* gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field.
(MakeVar): Set ArrayRef to FALSE.
(PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype.
(init_PerCompilationInit): Add call to _M2_M2SymInit_init.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New definition.
* gm2-lang.cc (gm2_langhook_handle_option): Add new case
OPT_Wuninit_variable_checking_.
* lang.opt: Wuninit-variable-checking= new entry.
gcc/testsuite/ChangeLog:
* gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test.
* gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp:
New test.
* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.
Patrick Palka [Tue, 11 Jul 2023 14:05:19 +0000 (10:05 -0400)]
c++: coercing variable template from current inst [PR110580]
Here during ahead of time coercion of the variable template-id v1<int>,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U just lowers U from level 2 to level 1 rather
than replacing it with int as expected. Thus after coercion we incorrectly
end up with (effectively) v1<int, T> instead of v1<int, int>.
Coercion of a class/alias template-id on the other hand always passes
all levels arguments, which avoids this issue. So this patch makes us
do the same for variable template-ids.
PR c++/110580
gcc/cp/ChangeLog:
* pt.cc (lookup_template_variable): Pass all levels of arguments
to coerce_template_parms, and use the parameters from the most
general template.
Notice here, we use dummp_mask = { -1, -1, .... , -1 }
2. Integer conditional division:
Similar case with (1) but with condtion:
void
f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] / c[i];
}
}
Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
and a real length which is produced by loop control : loop_len_55 = SELECT_VL
3. conditional Floating-point operations (no -ffast-math):
void
f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] + a[i];
}
}
ARM SVE IR:
max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });
int32_t
f (int32_t *restrict a,
int32_t *restrict cond, int n)
{
int32_t result = 0;
for (int i = 0; i < n; ++i)
{
if (cond[i])
result += a[i];
}
return result;
}
I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.
Richard Biener [Tue, 11 Jul 2023 08:40:19 +0000 (10:40 +0200)]
tree-optimization/110614 - SLP splat and re-align (optimized)
The following properly guards the re-align (optimized) paths used
on old power CPUs for the added case of SLP splats from non-grouped
loads. Testcases are existing in dg-torture.
PR tree-optimization/110614
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
SLP splats are not suitable for re-align ops.
Bob Duff [Mon, 3 Jul 2023 16:01:01 +0000 (12:01 -0400)]
ada: Avoid renaming_decl in case of constrained array
This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);".
That rewrite is incorrect if S is a constrained array subtype,
because it changes the semantics. In the original, the
bounds of X are that of S. But constraints are ignored in
renamings, so the bounds of X would come from F'Result.
This can cause spurious Constraint_Errors in some obscure
cases. It causes unnecessary checks to be inserted, and even
when such checks pass (more common case), they might be less
efficient.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to
a renaming in case of constrained array that comes from source.
Eric Botcazou [Sun, 2 Jul 2023 22:33:18 +0000 (00:33 +0200)]
ada: Fix wrong resolution for hidden discriminant in predicate
The problem occurs for hidden discriminants of private discriminated types.
gcc/ada/
* sem_ch13.adb (Replace_Type_References_Generic.Visible_Component):
In the case of private discriminated types, return a discriminant
only if it is listed in the discriminant part of the declaration.
Peter Bergner [Mon, 10 Jul 2023 22:51:23 +0000 (17:51 -0500)]
rs6000: Remove redundant MEM_P predicate usage
The quad_memory_operand and vsx_quad_dform_memory_operand predicates contain
a (match_code "mem") test, making their MEM_P usage redundant. Remove them.
- Import dmd v2.104.1.
- Deprecation phase ended for access to private method when
overloaded with public method.
D runtime changes:
- Import druntime v2.104.1.
- Linux input header translations were added to druntime.
- Integration with the Valgrind `memcheck' tool has been added
to the garbage collector.
Phobos changes:
- Import phobos v2.104.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd a88e1335f7.
* dmd/VERSION: Bump version to v2.104.1.