Pan Li [Wed, 12 Jul 2023 15:01:39 +0000 (23:01 +0800)]
RISC-V: Add more tests for RVV floating-point FRM.
Add more test cases include both the asm check and run for RVV FRM.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: New test.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
This patch adjusts the cost handling on VMAT_CONTIGUOUS in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more. It removes function
vect_model_load_cost which becomes useless and unreachable
now.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Remove.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without
calling vect_model_load_cost.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
As the affected test case gcc.target/i386/pr70021.c shows,
the previous costing can under-cost the total generated
vector loads as for VMAT_CONTIGUOUS_PERMUTE function
vect_model_load_cost doesn't consider the group size which
is considered as vec_num during the transformation.
This patch makes the count of vector load in costing become
consistent with what we generates during the transformation.
To be more specific, for the given test case, for memory
access b[i_20], it costed for 2 vector loads before,
with this patch it costs 8 instead, it matches the final
count of generated vector loads basing from b. This costing
change makes cost model analysis feel it's not profitable
to vectorize the first loop, so this patch adjusts the test
case without vect cost model any more.
But note that this test case also exposes something we can
improve further is that although the number of vector
permutation what we costed and generated are consistent,
but DCE can further optimize some unused permutation out,
it would be good if we can predict that and generate only
those necessary permutations.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
handle memory_access_type VMAT_CONTIGUOUS, remove some
VMAT_CONTIGUOUS_PERMUTE related handlings.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
without calling vect_model_load_cost.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
Kewen Lin [Thu, 13 Jul 2023 02:23:22 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
This change makes us not miscount some required vector
permutation as the associated test case shows.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get
VMAT_CONTIGUOUS_REVERSE any more.
(vectorizable_load): Adjust the costing handling on
VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
This patch adjusts the cost handling on
VMAT_LOAD_STORE_LANES in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.
It follows what we do in the function vect_model_load_cost,
and shouldn't have any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_LOAD_STORE_LANES without calling vect_model_load_cost.
(vectorizable_load): Remove VMAT_LOAD_STORE_LANES related handling and
assert it will never get VMAT_LOAD_STORE_LANES.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_load. We don't call function
vect_model_load_cost for it any more.
It's mainly for gather loads with IFN or emulated gather
loads, it follows the handlings in function
vect_model_load_cost. This patch shouldn't have any
functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_GATHER_SCATTER without calling vect_model_load_cost.
(vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER,
remove VMAT_GATHER_SCATTER related handlings and the related parameter
gs_info.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_load. We
don't call function vect_model_load_cost for them any more.
As PR82255 shows, we don't always need a vector construction
there, moving costing next to the transform can make us only
cost for vector construction when it's actually needed.
Besides, it can count the number of loads consistently for
some cases.
PR tree-optimization/82255
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling
on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling
vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP any more, and remove their related handlings.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test.
2023-06-13 Bill Schmidt <wschmidt@linux.ibm.com>
Kewen Lin <linkw@linux.ibm.com>
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_INVARIANT
This patch adjusts the cost handling on VMAT_INVARIANT in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more.
To make the costing on VMAT_INVARIANT better, this patch is
to query hoist_defs_of_uses for hoisting decision, and add
costs for different "where" based on it. Currently function
hoist_defs_of_uses would always hoist the defs of all SSA
uses, adding one argument HOIST_P aims to avoid the actual
hoisting during costing phase.
gcc/ChangeLog:
* tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
(vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
hoisting decision and without calling vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
and remove VMAT_INVARIANT related handlings.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
This patch adds one extra argument cost_vec to function
vect_build_gather_load_calls, so that we can do costing
next to the tranform in vect_build_gather_load_calls.
For now, the implementation just follows the handlings in
vect_model_load_cost, it isn't so good, so placing one
FIXME for any further improvement. This patch should not
cause any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings
on costing with one extra argument cost_vec.
(vectorizable_load): Adjust the call to vect_build_gather_load_calls.
(vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
gs_info.decl set any more.
Kewen Lin [Thu, 13 Jul 2023 02:23:21 +0000 (21:23 -0500)]
vect: Move vect_model_load_cost next to the transform in vectorizable_load
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_load_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types, hope it can make the
subsequent patches easy to review. This patch should not
have any functional changes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Move and duplicate the call
to vect_model_load_cost down to some different transform paths
according to the handlings of different vect_memory_access_types.
Kewen Lin [Thu, 13 Jul 2023 02:22:26 +0000 (21:22 -0500)]
tree: Hide wi::from_mpz from GENERATOR_FILE
Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
This patch is to fix it and avoid some possible build
errors.
gcc/ChangeLog:
* tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
mklog: Add --append option to auto add generate ChangeLog to patch file
This tiny patch add --append option to mklog.py that support add generated
change-log to the corresponding patch file. With this option there is no need
to manually copy the generated change-log to the patch file. e.g.:
Run `mklog.py --append /path/to/this/patch` will add the generated change-log
to the right place of the /path/to/this/patch file.
RISC-V: RISC-V: Support gather_load/scatter RVV auto-vectorization
This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.
All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
3. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
4. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
Base on the disscussions with Richards, we don't lower vlse/vsse in RISC-V backend for strided load/store.
Instead, we leave it to the middle-end to handle that.
* gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c:
New test.
Jonathan Wakely [Wed, 12 Jul 2023 13:40:19 +0000 (14:40 +0100)]
libstdc++: Check conversion from filesystem::path to wide strings [PR95048]
The testcase added for this bug only checks conversion from wide strings
on construction, but the fix also covered conversion to wide stings via
path::wstring(). Add checks for that, and u16string() and u32string().
Jonathan Wakely [Fri, 7 Jul 2023 13:36:06 +0000 (14:36 +0100)]
libstdc++: Compile basic_file_stdio.cc for LFS
Instead of using fopen64, lseek64, and fstat64 we can just include
<bits/largefile-config.h> which defines _FILE_OFFSET_BITS=64 (and
similar target-specific macros). Then we can just use fopen, lseek and
fstat as normal, and they'll be the LFS versions if supported by the
target.
libstdc++-v3/ChangeLog:
* config/io/basic_file_stdio.cc: Define LFS macros.
(__basic_file<char>::open): Use fopen unconditionally.
(get_file_offset): Use lseek unconditionally.
(__basic_file<char>::seekoff): Likewise.
(__basic_file<char>::showmanyc): Use fstat unconditionally.
When configured with --enable-cstdio=stdio_pure we need to consistently
use fseek and not mix seeks on the file descriptor with reads and writes
on the FILE stream.
There are also a number of bugs related to error handling and return
values, because fread and fwrite return 0 on error, not -1, and fseek
returns 0 on success, not the file offset.
libstdc++-v3/ChangeLog:
PR libstdc++/110574
* acinclude.m4 (GLIBCXX_CHECK_LFS): Check for fseeko and ftello
and define _GLIBCXX_USE_FSEEKO_FTELLO.
* config.h.in: Regenerate.
* configure: Regenerate.
* config/io/basic_file_stdio.cc (xwrite) [_GLIBCXX_USE_STDIO_PURE]:
Check for fwrite error correctly.
(__basic_file<char>::xsgetn) [_GLIBCXX_USE_STDIO_PURE]: Check for
fread error correctly.
(get_file_offset): New function.
(__basic_file<char>::seekoff) [_GLIBCXX_USE_STDIO_PURE]: Use
fseeko if available. Use get_file_offset instead of return value
of fseek.
(__basic_file<char>::showmanyc): Use get_file_offset.
IRA+LRA: Change return type of predicate functions from int to bool
gcc/ChangeLog:
* ira.cc (equiv_init_varies_p): Change return type from int to bool
and adjust function body accordingly.
(equiv_init_movable_p): Ditto.
(memref_used_between_p): Ditto.
* lra-constraints.cc (valid_address_p): Ditto.
Aldy Hernandez [Thu, 29 Jun 2023 09:27:49 +0000 (11:27 +0200)]
[range-op] Enable value/mask propagation in range-op.
Throw the switch in range-ops to make full use of the value/mask
information instead of only the nonzero bits. This will cause most of
the operators implemented in range-ops to use the value/mask
information calculated by CCP's bit_value_binop() function which
range-ops uses. This opens up more optimization opportunities.
In follow-up patches I will change the global range setter
(set_range_info) to be able to save the value/mask pair, and make both
CCP and IPA be able to save the known ones bit info, instead of
throwing it away.
gcc/ChangeLog:
* range-op.cc (irange_to_masked_value): Remove.
(update_known_bitmask): Update irange value/mask pair instead of
only updating nonzero bits.
Jan Hubicka [Wed, 12 Jul 2023 15:22:03 +0000 (17:22 +0200)]
Improve profile update in loop-ch
Improve profile update in loop-ch to handle situation where duplicated header
has loop invariant test. In this case we konw that all count of the exit edge belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to duplicaiton
I also added code to turn those conditionals to constants so we do not need later
jump threading pass to clean up.
This made me to work out that the propagation was buggy in few aspects
1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
to be IV-like when they are not
2) it did not check for novops calls that are not required to return same
value on every invocation.
3) I also added check for asm statement since those are not necessarily
reproducible either.
I would like to do more changes, but tried to prevent this patch from
snowballing. The analysis of what statements will remain after duplication can
be improved. I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then. Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.
I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.
Bootstrapped/regtsted x86_64-linux, OK?
Honza
gcc/ChangeLog:
* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
parameter and rewrite profile updating code to handle edges elimination.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
(loop_iv_derived_p): New function.
(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
of PHIs and propagation of IV derived variables.
(ch_base::copy_headers): Pass around the invariant edges hash set.
Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching
ifcvt: Change return type of predicate functions from int to bool
Also change some internal variables and function arguments from int to bool.
gcc/ChangeLog:
* ifcvt.cc (cond_exec_changed_p): Change variable to bool.
(last_active_insn): Change "skip_use_p" function argument to bool.
(noce_operand_ok): Change return type from int to bool.
(find_cond_trap): Ditto.
(block_jumps_and_fallthru_p): Change "fallthru_p" and
"jump_p" variables to bool.
(noce_find_if_block): Change return type from int to bool.
(cond_exec_find_if_block): Ditto.
(find_if_case_1): Ditto.
(find_if_case_2): Ditto.
(dead_or_predicable): Ditto. Change "reversep" function arg to bool.
(block_jumps_and_fallthru): Rename from block_jumps_and_fallthru_p.
(cond_exec_process_insns): Change return type from int to bool.
Change "mod_ok" function arg to bool.
(cond_exec_process_if_block): Change return type from int to bool.
Change "do_multiple_p" function arg to bool. Change "then_mod_ok"
variable to bool.
(noce_emit_store_flag): Change return type from int to bool.
Change "reversep" function arg to bool. Change "cond_complex"
variable to bool.
(noce_try_move): Change return type from int to bool.
(noce_try_ifelse_collapse): Ditto.
(noce_try_store_flag): Ditto. Change "reversep" variable to bool.
(noce_try_addcc): Change return type from int to bool. Change
"subtract" variable to bool.
(noce_try_store_flag_constants): Change return type from int to bool.
(noce_try_store_flag_mask): Ditto. Change "reversep" variable to bool.
(noce_try_cmove): Change return type from int to bool.
(noce_try_cmove_arith): Ditto. Change "is_mem" variable to bool.
(noce_try_minmax): Change return type from int to bool. Change
"unsignedp" variable to bool.
(noce_try_abs): Change return type from int to bool. Change
"negate" variable to bool.
(noce_try_sign_mask): Change return type from int to bool.
(noce_try_move): Ditto.
(noce_try_store_flag_constants): Ditto.
(noce_try_cmove): Ditto.
(noce_try_cmove_arith): Ditto.
(noce_try_minmax): Ditto. Change "unsignedp" variable to bool.
(noce_try_bitop): Change return type from int to bool.
(noce_operand_ok): Ditto.
(noce_convert_multiple_sets): Ditto.
(noce_convert_multiple_sets_1): Ditto.
(noce_process_if_block): Ditto.
(check_cond_move_block): Ditto.
(cond_move_process_if_block): Ditto. Change "success_p"
variable to bool.
(rest_of_handle_if_conversion): Change return type to void.
VECT: Apply COND_LEN_* into vectorizable_operation
Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:
Support for the situation that in "vectorizable_operation":
/* If operating on inactive elements could generate spurious traps,
we need to restrict the operation to active lanes. Note that this
specifically doesn't apply to unhoisted invariants, since they
operate on the same value for every lane.
Similarly, if this operation is part of a reduction, a fully-masked
loop should only change the active lanes of the reduction chain,
keeping the inactive lanes as-is. */
bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);
For mask_out_inactive is true with length loop control.
So, we can these 2 following cases:
1. Integer division:
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
{ \
for (int i = 0; i < n; i++) \
dst[i] = a[i] % b[i]; \
}
#define TEST_ALL() \
TEST_TYPE(int8_t) \
TEST_ALL()
* libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to
'Memory allocation' section which contains the full status.
(TR11): Remove differently worded duplicated entry.
Roger Sayle [Wed, 12 Jul 2023 13:14:15 +0000 (14:14 +0100)]
i386: Fix FAIL of gcc.target/i386/pr91681-1.c
The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c. The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.
I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
define_insn_and_split derived from *add<dwi>3_doubleword_concat
and *add<dwi>3_doubleword_zext.
This patch fixes the regression PR target/110598 caused by my recent
addition of a peephole2. The intention of that optimization was to
simplify zeroing a register, followed by an IOR, XOR or PLUS operation
on it into a move, or as described in the comment:
;; Peephole2 rega = 0; rega op= regb into rega = regb.
The issue is that I'd failed to consider the (rare and unusual) case,
where regb is rega, where the transformation leads to the incorrect
"rega = rega", when it should be "rega = 0". The minimal fix is to
add a !reg_mentioned_p check to the recent peephole2.
In addition to resolving the regression, I've added a second peephole2
to optimize the problematic case above, which contains a false
dependency and is therefore tricky to optimize elsewhere. This is an
improvement over GCC 13, for example, that generates the redundant:
xorl %edx, %edx
xorq %rdx, %rdx
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110598
* config/i386/i386.md (peephole2): Check !reg_mentioned_p when
optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS].
(peephole2): Simplify rega = 0; rega op= rega cases.
gcc/testsuite/ChangeLog
PR target/110598
* gcc.target/i386/pr110598.c: New test case.
Roger Sayle [Wed, 12 Jul 2023 13:09:54 +0000 (14:09 +0100)]
i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.
I've come up with an alternate/complementary/supplementary fix to the
patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
for generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.
You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last. When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass). Retaining this clause ordering
should minimize the lines changed if things change in future.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_compare): If
testing a TImode SUBREG of a 128-bit vector register against
zero, use a PTEST instruction instead of first moving it to
a pair of scalar registers.
Robin Dapp [Mon, 10 Jul 2023 20:00:08 +0000 (22:00 +0200)]
genopinit: Allow more than 256 modes.
Upcoming changes for RISC-V will have us exceed 255 modes or 8 bits.
This patch increases the limit to 10 bits and adjusts the hashing
function for the gen* and optabs-query lookups accordingly.
Consequently, the number of optabs is limited to 4095.
gcc/ChangeLog:
* genopinit.cc (main): Adjust maximal number of optabs and
machine modes.
* gensupport.cc (find_optab): Shift optab by 20 and mode by
10 bits.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
libgomp: Use libnuma for OpenMP's partition=nearest allocation trait
As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.
The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation. However, when
running with 'numactl --preferred=<node> ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).
libgomp/ChangeLog:
* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
add GOMP_MEMKIND_LIBNUMA.
(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
(omp_init_allocator): Handle partition=nearest with libnuma if avail.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
needed.
* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
(Memory allocation): Renamed from 'Memory allocation with libmemkind';
updated for libnuma usage.
* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
* testsuite/libgomp.c-c++-common/alloc-12.c: New test.
gfortran: Allow ref'ing PDT's len() in parameter-initializer.
Fix declaring a parameter initialized using a pdt_len reference
not simplifying the reference to a constant.
2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org>
gcc/fortran/ChangeLog:
PR fortran/102003
* expr.cc (find_inquiry_ref): Replace len of pdt_string by
constant.
(simplify_ref_chain): Ensure input to find_inquiry_ref is
NULL.
(gfc_match_init_expr): Prevent PDT analysis for function calls.
(gfc_pdt_find_component_copy_initializer): Get the initializer
value for given component.
* gfortran.h (gfc_pdt_find_component_copy_initializer): New
function.
* simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt
component ref or constant.
Richard Biener [Wed, 12 Jul 2023 09:19:58 +0000 (11:19 +0200)]
tree-optimization/110630 - enhance SLP permute support
The following enhances the existing lowpart extraction support for
SLP VEC_PERM nodes to cover all vector aligned extractions. This
allows the existing bb-slp-pr95839.c testcase to be vectorized
with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase
with SSE2.
PR tree-optimization/110630
* tree-vect-slp.cc (vect_add_slp_permutation): New
offset parameter, honor that for the extract code generation.
(vectorizable_slp_permutation_1): Handle offsetted identities.
* gcc.dg/vect/bb-slp-pr95839.c: Make stricter.
* gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
RISC-V: Support integer mult highpart auto-vectorization
This patch is adding an obvious missing mult_high auto-vectorization pattern.
Consider this following case:
void __attribute__ ((noipa)) \
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \
{ \
for (int i = 0; i < count; ++i) \
dst[i] = src[i] / 17; \
}
T (int32_t) \
TEST_ALL (DEF_LOOP)
Before this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,17
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v2,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vdiv.vv v1,v1,v2
sub a2,a2,a5
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
After this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,2021163008
addiw a5,a5,-1927
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v3,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v2,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vmulh.vv v1,v2,v3
sub a2,a2,a5
vsra.vi v2,v2,31
vsra.vi v1,v1,3
vsub.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
4 more instructions are generated, we belive it's much better than before
since division is very slow in the hardward.
gcc/ChangeLog:
* config/riscv/autovec.md (smul<mode>3_highpart): New pattern.
(umul<mode>3_highpart): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
Jan Beulich [Wed, 12 Jul 2023 10:16:08 +0000 (12:16 +0200)]
x86: improve fast bfloat->float conversion
There's nothing AVX512BW-ish in here, so no reason to use Yw as the
constraints for the AVX alternative. Furthermore by using the 512-bit
form of VPSSLD (in a new alternative) all 32 registers can be used
directly by the insn without AVX512VL needing to be enabled.
Also adjust the originally last alternative's "prefix" attribute to
maybe_evex.
gcc/
* config/i386/i386.md (extendbfsf2_1): Add new AVX512F
alternative. Adjust original last alternative's "prefix"
attribute to maybe_evex.
Jan Beulich [Wed, 12 Jul 2023 10:14:08 +0000 (12:14 +0200)]
x86: make better use of VBROADCASTSS / VPBROADCASTD
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
never longer (yet sometimes shorter) than the corresponding VSHUFPS /
VPSHUFD, due to the immediate operand of the shuffle insns balancing the
(uniform) need for VEX3 in the broadcast ones. When EVEX encoding is
respective the broadcast insns are always shorter.
Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.
While touching this anyway, switch to consistently using "sseshuf1" in
the "type" attributes for all shuffle forms.
gcc/
* config/i386/sse.md (vec_dupv4sf): Make first alternative use
vbroadcastss for AVX2. New AVX512F alternative.
(*vec_dupv4si): New AVX2 and AVX512F alternatives using
vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute.
riscv: thead: Factor out XThead*-specific peepholes
This patch moves the XThead*-specific peephole passes
into thead-peephole.md with the intend to keep vendor-specific
code separated from RISC-V standard code.
This patch does not contain any functional changes.
gcc/ChangeLog:
* config/riscv/peephole.md: Remove XThead* peephole passes.
* config/riscv/thead.md: Include thead-peephole.md.
* config/riscv/thead-peephole.md: New file.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
RISC-V does currently not support index registers.
However, there are some vendor extensions that specify them.
Let's do the necessary changes in the backend so that we can
add support for such a vendor extension in the future.
This is a non-functional change without any intended side-effects.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_regno_ok_for_index_p):
New prototype.
(riscv_index_reg_class): Likewise.
* config/riscv/riscv.cc (riscv_regno_ok_for_index_p): New function.
(riscv_index_reg_class): New function.
* config/riscv/riscv.h (INDEX_REG_CLASS): Call new function
riscv_index_reg_class().
(REGNO_OK_FOR_INDEX_P): Call new function
riscv_regno_ok_for_index_p().
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
riscv: Move address classification info types to riscv-protos.h
enum riscv_address_type and struct riscv_address_info are used
to store address classification information. Let's move this types
into our common header file in order to share them with other
compilation units.
This is a non-functional change without any intendet side-effects.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum riscv_address_type):
New location of type definition.
(struct riscv_address_info): Likewise.
* config/riscv/riscv.cc (enum riscv_address_type):
Old location of type definition.
(struct riscv_address_info): Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Define a Xmode macro that specifies the registers size (XLEN)
similar to Pmode. This allows the backend code to write generic
RV32/RV64 C code (under certain circumstances).
gcc/ChangeLog:
* config/riscv/riscv.h (Xmode): New macro.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
We have the following situation for MEM RTX objects:
* TARGET_PRINT_OPERAND expands to riscv_print_operand()
* This falls into the default case (unknown or on letter) of the outer
switch-case-block and the MEM case of the inner switch-case-block and
calls output_address() in final.cc with XEXP (op, 0) (the address)
* This calls targetm.asm_out.print_operand_address() which is
riscv_print_operand_address()
* riscv_print_operand_address() is targeting the address of a MEM RTX
* riscv_print_operand_address() calls riscv_print_operand() for the offset
and directly prints the register if the address is classified as ADDRESS_REG
* This falls into the default case (unknown or on letter) of the outer
switch-case-block and the default case of the inner switch-case-block and
calls output_addr_const().
However, since we know that offset must be a CONST_INT (which will be
followed by a '(<reg>)' string), there is no need to call
riscv_print_operand() for the offset.
Instead we can take the shortcut and use output_addr_const().
This change also brings the code in riscv_print_operand_address()
in line with the other cases, where output_addr_const() is used
to print offsets.
Tested with GCC regression test suite and SPEC intrate.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand_address): Use
output_addr_const rather than riscv_print_operand.
The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.
gcc/ChangeLog:
* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extu
The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.
gcc/ChangeLog:
* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
liuhongt [Thu, 29 Jun 2023 06:25:28 +0000 (14:25 +0800)]
Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'
False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.
gcc/ChangeLog:
PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New
define_insn.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to
define_insn_and_split to avoid false dependence.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot<mode>3): Ditto.
(iornot<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110438.c: New test.
* gcc.target/i386/pr100711-6.c: Adjust testcase.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
Harald Anlauf [Tue, 11 Jul 2023 19:21:25 +0000 (21:21 +0200)]
Fortran: formal symbol attributes for intrinsic procedures [PR110288]
gcc/fortran/ChangeLog:
PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.
gcc/testsuite/ChangeLog:
PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.
cfg+gcse: Change return type of predicate functions from int to bool
Also change some internal variables from int to bool.
gcc/ChangeLog:
* cfghooks.cc (verify_flow_info): Change "err" variable to bool.
* cfghooks.h (struct cfg_hooks): Change return type of
verify_flow_info from integer to bool.
* cfgrtl.cc (can_delete_note_p): Change return type from int to bool.
(can_delete_label_p): Ditto.
(rtl_verify_flow_info): Change return type from int to bool
and adjust function body accordingly. Change "err" variable to bool.
(rtl_verify_flow_info_1): Ditto.
(free_bb_for_insn): Change return type to void.
(rtl_merge_blocks): Change "b_empty" variable to bool.
(try_redirect_by_replacing_jump): Change "fallthru" variable to bool.
(verify_hot_cold_block_grouping): Change return type from int to bool.
Change "err" variable to bool.
(rtl_verify_edges): Ditto.
(rtl_verify_bb_insns): Ditto.
(rtl_verify_bb_pointers): Ditto.
(rtl_verify_bb_insn_chain): Ditto.
(rtl_verify_fallthru): Ditto.
(rtl_verify_bb_layout): Ditto.
(purge_all_dead_edges): Change "purged" variable to bool.
* cfgrtl.h (free_bb_for_insn): Change return type from int to void.
* postreload-gcse.cc (expr_hasher::equal): Change "equiv_p" to bool.
(load_killed_in_block_p): Change return type from int to bool
and adjust function body accordingly.
(oprs_unchanged_p): Return true/false.
(rest_of_handle_gcse2): Change return type to void.
* tree-cfg.cc (gimple_verify_flow_info): Change return type from
int to bool. Change "err" variable to bool.
Carl Love [Tue, 11 Jul 2023 16:28:47 +0000 (12:28 -0400)]
rs6000: Update the vsx-vector-6.* tests.
The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c. The .h file
contains a large number of vsx vector built-in tests. The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor. The tests are compile only.
This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in. There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.
gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
testsuite: Require vectors of doubles for pr97428.c
The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.
gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
Gaius Mulley [Tue, 11 Jul 2023 14:28:42 +0000 (15:28 +0100)]
[modula2] Improve uninitialized variable analysis by combining basic blocks
This patch combines basic blocks for static analysis of uninitialized
variables providing that they are not the top of a loop, are not reached
by a conditional and are not reached after a procedure call. It also
avoids checking array accesses for static analysis. Finally the patch
adds switch modifiers to allow static analysis to include conditional
branches for subsequent basic block analysis.
gcc/ChangeLog:
* doc/gm2.texi (-Wuninit-variable-checking=) New item.
gcc/m2/ChangeLog:
* gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New
parameter ScopeSym.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New
parameter ScopeSym.
(InitBasicBlocksFromRange): New parameter ScopeSym. Call
ConvertQuads2BasicBlock with ScopeSym.
(DisplayBasicBlocks): Uncomment.
* gm2-compiler/M2Code.mod: Replace VariableAnalysis with
ScopeBlockVariableAnalysis.
(InitialDeclareAndOptiomize): Add parameter scope.
(SecondDeclareAndOptimize): Add parameter scope.
* gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope
parameter to DeclareTypesConstantsProceduresInRange.
(DeclareTypesConstantsProceduresInRange): New parameter scope.
Pass scope to DisplayQuadRange. Reformatted.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2Optimize.mod (KnownReachable): New parameter
scope.
* gm2-compiler/M2Options.def (SetUninitVariableChecking): Add
arg parameter.
* gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add
arg parameter and set boolean UninitVariableChecking and
UninitVariableConditionalChecking.
(UninitVariableConditionalChecking): New boolean set to FALSE.
* gm2-compiler/M2Quads.def (IsGoto): New procedure function.
(DisplayQuadRange): Add scope parameter.
(LoopAnalysis): Add scope parameter.
* gm2-compiler/M2Quads.mod: Import PutVarArrayRef.
(IsGoto): New procedure function.
(LoopAnalysis): Add scope parameter and use MetaErrorT1 instead
of WarnStringAt.
(BuildStaticArray): Call PutVarArrayRef.
(BuildDynamicArray): Call PutVarArrayRef.
(DisplayQuadRange): Add scope parameter.
(GetM2OperatorDesc): Add relational condition cases.
* gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter.
* gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to
DisplayQuadRange.
(ForeachScopeBlockDo): Pass scopeSym to p.
* gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ...
(ScopeBlockVariableAnalysis): ... this.
* gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add
scope parameter.
(bbEntry): New pointer to record.
(bbArray): New array.
(bbFreeList): New variable.
(errorList): New list.
(IssueConditional): New procedure.
(GenerateNoteFlow): New procedure.
(IssueWarning): New procedure.
(IsUniqueWarning): New procedure.
(CheckDeferredRecordAccess): Re-implement.
(CheckBinary): Add warning and lst parameters.
(CheckUnary): Add warning and lst parameters.
(CheckXIndr): Add warning and lst parameters.
(CheckIndrX): Add warning and lst parameters.
(CheckBecomes): Add warning and lst parameters.
(CheckComparison): Add warning and lst parameters.
(CheckReadBeforeInitQuad): Add warning and lst parameters to all
Check procedures. Add all case quadruple clauses.
(FilterCheckReadBeforeInitQuad): Add warning and lst parameters.
(CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters.
(bbArrayKill): New procedure.
(DumpBBEntry): New procedure.
(DumpBBArray): New procedure.
(DumpBBSequence): New procedure.
(TestBBSequence): New procedure.
(CreateBBPermultations): New procedure.
(ScopeBlockVariableAnalysis): New procedure.
(GetOp3): New procedure.
(GenerateCFG): New procedure.
(NewEntry): New procedure.
(AppendEntry): New procedure.
(init): Initialize bbFreeList and errorList.
* gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field.
(MakeVar): Set ArrayRef to FALSE.
(PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype.
(init_PerCompilationInit): Add call to _M2_M2SymInit_init.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New definition.
* gm2-lang.cc (gm2_langhook_handle_option): Add new case
OPT_Wuninit_variable_checking_.
* lang.opt: Wuninit-variable-checking= new entry.
gcc/testsuite/ChangeLog:
* gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test.
* gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp:
New test.
* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.
Patrick Palka [Tue, 11 Jul 2023 14:05:19 +0000 (10:05 -0400)]
c++: coercing variable template from current inst [PR110580]
Here during ahead of time coercion of the variable template-id v1<int>,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U just lowers U from level 2 to level 1 rather
than replacing it with int as expected. Thus after coercion we incorrectly
end up with (effectively) v1<int, T> instead of v1<int, int>.
Coercion of a class/alias template-id on the other hand always passes
all levels arguments, which avoids this issue. So this patch makes us
do the same for variable template-ids.
PR c++/110580
gcc/cp/ChangeLog:
* pt.cc (lookup_template_variable): Pass all levels of arguments
to coerce_template_parms, and use the parameters from the most
general template.
Notice here, we use dummp_mask = { -1, -1, .... , -1 }
2. Integer conditional division:
Similar case with (1) but with condtion:
void
f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] / c[i];
}
}
Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
and a real length which is produced by loop control : loop_len_55 = SELECT_VL
3. conditional Floating-point operations (no -ffast-math):
void
f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i] = b[i] + a[i];
}
}
ARM SVE IR:
max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });
int32_t
f (int32_t *restrict a,
int32_t *restrict cond, int n)
{
int32_t result = 0;
for (int i = 0; i < n; ++i)
{
if (cond[i])
result += a[i];
}
return result;
}
I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.
Richard Biener [Tue, 11 Jul 2023 08:40:19 +0000 (10:40 +0200)]
tree-optimization/110614 - SLP splat and re-align (optimized)
The following properly guards the re-align (optimized) paths used
on old power CPUs for the added case of SLP splats from non-grouped
loads. Testcases are existing in dg-torture.
PR tree-optimization/110614
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
SLP splats are not suitable for re-align ops.
Bob Duff [Mon, 3 Jul 2023 16:01:01 +0000 (12:01 -0400)]
ada: Avoid renaming_decl in case of constrained array
This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);".
That rewrite is incorrect if S is a constrained array subtype,
because it changes the semantics. In the original, the
bounds of X are that of S. But constraints are ignored in
renamings, so the bounds of X would come from F'Result.
This can cause spurious Constraint_Errors in some obscure
cases. It causes unnecessary checks to be inserted, and even
when such checks pass (more common case), they might be less
efficient.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to
a renaming in case of constrained array that comes from source.
Eric Botcazou [Sun, 2 Jul 2023 22:33:18 +0000 (00:33 +0200)]
ada: Fix wrong resolution for hidden discriminant in predicate
The problem occurs for hidden discriminants of private discriminated types.
gcc/ada/
* sem_ch13.adb (Replace_Type_References_Generic.Visible_Component):
In the case of private discriminated types, return a discriminant
only if it is listed in the discriminant part of the declaration.
Peter Bergner [Mon, 10 Jul 2023 22:51:23 +0000 (17:51 -0500)]
rs6000: Remove redundant MEM_P predicate usage
The quad_memory_operand and vsx_quad_dform_memory_operand predicates contain
a (match_code "mem") test, making their MEM_P usage redundant. Remove them.
- Import dmd v2.104.1.
- Deprecation phase ended for access to private method when
overloaded with public method.
D runtime changes:
- Import druntime v2.104.1.
- Linux input header translations were added to druntime.
- Integration with the Valgrind `memcheck' tool has been added
to the garbage collector.
Phobos changes:
- Import phobos v2.104.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd a88e1335f7.
* dmd/VERSION: Bump version to v2.104.1.
reorg: Change return type of predicate functions from int to bool
Also change some internal variables and function arguments from int to bool.
gcc/ChangeLog:
* reorg.cc (stop_search_p): Change return type from int to bool
and adjust function body accordingly.
(resource_conflicts_p): Ditto.
(insn_references_resource_p): Change return type from int to bool.
(insn_sets_resource_p): Ditto.
(redirect_with_delay_slots_safe_p): Ditto.
(condition_dominates_p): Change return type from int to bool
and adjust function body accordingly.
(redirect_with_delay_list_safe_p): Ditto.
(check_annul_list_true_false): Ditto. Change "annul_true_p"
function argument to bool.
(steal_delay_list_from_target): Change "pannul_p" function
argument to bool pointer. Change "must_annul" and "used_annul"
variables from int to bool.
(steal_delay_list_from_fallthrough): Ditto.
(own_thread_p): Change return type from int to bool and adjust
function body accordingly. Change "allow_fallthrough" function
argument to bool.
(reorg_redirect_jump): Change return type from int to bool.
(fill_simple_delay_slots): Change "non_jumps_p" function
argument from int to bool. Change "maybe_never" varible to bool.
(fill_slots_from_thread): Change "likely", "thread_if_true" and
"own_thread" function arguments to bool. Change "lose" and
"must_annul" variables to bool.
(delete_from_delay_slot): Change "had_barrier" variable to bool.
(try_merge_delay_insns): Change "annul_p" variable to bool.
(fill_eager_delay_slots): Change "own_target" and "own_fallthrouhg"
variables to bool.
(rest_of_handle_delay_slots): Change return type from int to void
and adjust function body accordingly.
Patrick Palka [Mon, 10 Jul 2023 14:59:40 +0000 (10:59 -0400)]
c++: redeclare_class_template and ttps [PR110523]
Now that we cache level-lowered ttps we can end up processing the same
ttp multiple times via (multiple calls to) redeclare_class_template, so
we can't assume a ttp's DECL_CONTEXT is initially empty.
PR c++/110523
gcc/cp/ChangeLog:
* pt.cc (redeclare_class_template): Relax the ttp DECL_CONTEXT
assert, and downgrade it to a checking assert.
Kito Cheng [Mon, 10 Jul 2023 13:57:36 +0000 (21:57 +0800)]
doc: Add doc for RISC-V Operand Modifiers
Document `z` and `i` operand modifiers, we have much more modifiers
other than those two, but they are the only two implement on both
GCC and LLVM, consider the compatibility I would like to document those
two first, and then review other modifiers later to see if any other should
expose and implement on RISC-V LLVM too.
GCSE: Export 'insert_insn_end_basic_block' as global function
Since VSETVL PASS in RISC-V port is using common part of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and we will also this helper function in riscv.cc for the following patches.
So extract the common part codes of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)', the new function
of the common part is also call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' but with different arguments.
And call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' in 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and VSETVL PASS in RISC-V port.
Remove redundant codes of VSETVL PASS in RISC-V port.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (add_label_notes): Remove it.
(insert_insn_end_basic_block): Ditto.
(pass_vsetvl::commit_vsetvls): Adapt for new helper function.
* gcse.cc (insert_insn_end_basic_block): Export as global function.
* gcse.h (insert_insn_end_basic_block): Ditto.
Christophe Lyon [Mon, 26 Jun 2023 14:39:47 +0000 (14:39 +0000)]
arm: Fix MVE intrinsics support with LTO (PR target/110268)
After the recent MVE intrinsics re-implementation, LTO stopped working
because the intrinsics would no longer be defined.
The main part of the patch is simple and similar to what we do for
AArch64:
- call handle_arm_mve_h() from arm_init_mve_builtins to declare the
intrinsics when the compiler is in LTO mode
- actually implement arm_builtin_decl for MVE.
It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
its value in the user code cannot be guessed at LTO time, so we always
have to assume that it was not defined. The led to a few fixes in the
way we register MVE builtins as placeholders or not. Without this
patch, we would just omit some versions of the inttrinsics when
__ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
placeholders, we need to always keep entries for all of them to ensure
that we have a consistent numbering scheme.
testsuite: Add _link flavor for several arm_arch* and arm* effective-targets
For arm targets, we generate many effective-targets with
check_effective_target_FUNC_multilib and
check_effective_target_arm_arch_FUNC_multilib which check if we can
link and execute a simple program with a given set of flags/multilibs.
In some cases however, it's possible to link but not to execute a
program, so this patch adds similar _link effective-targets which only
check if link succeeds.
The patch does not uupdate the documentation as it already lacks the
numerous existing related effective-targets.
Tom Tromey [Thu, 29 Jun 2023 14:52:59 +0000 (08:52 -0600)]
ada: Add typedefs to snames.h-tmpl
A future patch will change sname.h-tmpl to use enums rather than
preprocessor defines. In order to do this, first introduce some
typedefs that can be used in gcc-interface.
gcc/ada/
* snames.h-tmpl (Name_Id, Attribute_Id, Convention_Id)
(Pragma_Id): New typedefs.
(Get_Attribute_Id, Get_Pragma_Id): Use typedef.
Bob Duff [Thu, 22 Jun 2023 17:43:00 +0000 (13:43 -0400)]
ada: Documentation for mixed declarations and statements
This patch documents the new feature that allows declarations mixed with
statements, primarily by referring to the RFC.
gcc/ada/
* doc/gnat_rm/gnat_language_extensions.rst
(Local Declarations Without Block): Document the feature very
briefly, and refer the reader to the RFC for details and examples.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Yannick Moy [Wed, 28 Jun 2023 13:56:26 +0000 (13:56 +0000)]
ada: Adapt proof of System.Arith_Double to remove CVC4
The proof of System.Arith_Double still required the use of
CVC4, now replaced by its successor cvc5. Adapt the proof to be
able to remove CVC4 in the proof of run-time units.
gcc/ada/
* libgnat/s-aridou.adb (Lemma_Div_Mult): New simple lemma.
(Lemma_Powers_Of_2_Commutation): State post in else branch.
(Lemma_Div_Pow2): Introduce local lemma and use it.
(Scaled_Divide): Use cut operations in assertions, lemmas, new
assertions. Introduce local lemma and use it.
Roger Sayle [Mon, 10 Jul 2023 08:06:52 +0000 (09:06 +0100)]
i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.
This patch implements another of Uros' suggestions, to investigate a
insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64.
In PR 88873, the RTL the middle-end expands for passing V2DF in TImode
is subtly different from what it does for V2DI in TImode, sufficiently so
that my explanations for why insvti_lowpart_1 isn't required don't apply
in this case.
This patch adds an insvti_lowpart_1 pattern, complementing the existing
insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1.
Because the middle-end represents 128-bit constants using CONST_WIDE_INT
and 64-bit constants using CONST_INT, it's easiest to treat these as
different patterns, rather than attempt <dwi> parameterization.
This patch also includes a peephole2 (actually a pair) to transform
xchg instructions into mov instructions, when one of the destinations
is unused. This optimization is required to produce the optimal code
sequences below.
For the 64-bit case:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 m = ~((__int128)~0ull);
__int128 t = x & m;
__int128 r = t | y;
return r;
}
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform xchg insn with a
REG_UNUSED note to a (simple) move.
(*insvti_lowpart_1): New define_insn_and_split.
(*insvdi_lowpart_1): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvdi_lowpart-1.c: New test case.
* gcc.target/i386/insvti_lowpart-1.c: Likewise.
Roger Sayle [Mon, 10 Jul 2023 08:04:29 +0000 (09:04 +0100)]
i386: Add AVX512 support for STV of SI/DImode rotation by constant.
Following Uros' suggestion, this patch adds support for AVX512VL's
vpro[lr][dq] instructions to the recently added scalar-to-vector (STV)
enhancements to handle DImode and SImode rotations by a constant.
For the test cases:
unsigned long long rot1(unsigned long long x) {
return (x>>1) | (x<<63);
}
void mem1(unsigned long long *p) {
*p = rot1(*p);
}
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL.
(general_scalar_chain::convert_rotate): On TARGET_AVX512F generate
avx512vl_rolv2di or avx412vl_rolv4si when appropriate.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.
Add pre_reload splitter to detect fp min/max pattern.
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.
This patch adds pre_reload splitter to detect the min/max pattern.
Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.
gcc/ChangeLog:
PR target/110170
* config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min<mode>3_1): Ditto, but for fp min pattern.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.
- Import dmd v2.104.0-beta.1.
- Better error message when attribute inference fails down the
call stack.
- Using `;' as an empty statement has been turned into an error.
- Using `in' parameters with non- `extern(D)' or `extern(C++)'
functions is deprecated.
- `in ref' on parameters has been deprecated in favor of
`-preview=in'.
- Throwing `immutable', `const', `inout', and `shared' qualified
objects is now deprecated.
- User Defined Attributes now parse Template Arguments.
D runtime changes:
- Import druntime v2.104.0-beta.1.
Phobos changes:
- Import phobos v2.104.0-beta.1.
- Better static assert messages when instantiating
`std.algorithm.comparison.clamp' with wrong inputs.
- `std.typecons.Rebindable' now supports all types.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 28a3b24c2e.
* dmd/VERSION: Bump version to v2.104.0-beta.1.
* d-codegen.cc (build_bounds_slice_condition): Update for new
front-end interface.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Initialize global.compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation
with new front-end lowering.
(ExprVisitor::visit (LoweredAssignExp *)): New method.
(ExprVisitor::visit (StructLiteralExp *)): Don't generate static
initializer symbols for structs defined in C sources.
* runtime.def (ARRAYCATT): Remove.
(ARRAYCATNTX): Remove.
Jan Hubicka [Sun, 9 Jul 2023 13:14:54 +0000 (15:14 +0200)]
Improve dumping of profile_count
Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point
values. In many cases one looks at a single function and it is better to think of
basic block frequency, that is how many times it is executed each invocatoin. This
patch makes CFG dumps to also print this info.
For example:
main()
{
for (int i = 0; i < 10; i++)
t();
}
the -fdump-tree-optimized-blocks-details now prints:
int main ()
{
unsigned int ivtmp_1;
unsigned int ivtmp_2;
Paul Thomas [Sat, 8 Jul 2023 17:13:23 +0000 (18:13 +0100)]
Fortran: Fix default type bugs in gfortran [PR99139, PR99368]
2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu>
gcc/fortran
PR fortran/99139
PR fortran/99368
* match.cc (gfc_match_namelist): Check for host associated or
defined types before applying default type.
(gfc_match_select_rank): Apply default type to selector of
unknown type if possible.
* resolve.cc (resolve_fl_variable): Do not apply local default
initialization to assumed rank entities.
gcc/testsuite/
PR fortran/99139
* gfortran.dg/pr99139.f90 : New test
PR fortran/99368
* gfortran.dg/pr99368.f90 : New test