Jason Merrill [Wed, 20 Nov 2024 09:43:30 +0000 (10:43 +0100)]
c++: modules and debug marker stmts
21_strings/basic_string/operations/contains/nonnull.cc was failing because
the module was built with debug markers and the testcase was built not
expecting debug markers, so we crashed in lower_stmt. Let's accommodate
this by discarding debug marker statements we don't want.
gcc/cp/ChangeLog:
* module.cc (trees_in::core_vals) [STATEMENT_LIST]: Skip
DEBUG_BEGIN_STMT if !MAY_HAVE_DEBUG_MARKER_STMTS.
Jason Merrill [Wed, 20 Nov 2024 12:51:10 +0000 (13:51 +0100)]
c++: modules and tsubst_friend_class
In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate
_Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded
the namespace-scope declaration, so lookup_imported_hidden_friend doesn't
find it. But then we load the namespace-scope declaration in
lookup_template_class during substitution, and so when we get around to
pushing the result of substitution, they conflict. Fixed by calling
lazy_load_pendings in lookup_imported_hidden_friend.
Georg-Johann Lay [Wed, 20 Nov 2024 11:25:18 +0000 (12:25 +0100)]
AVR: target/117726 - Better optimizations of ASHIFT:SI insns.
This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
splits them after peephole2 / before avr-fuse-move into
a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default. 2) is even
performed with -Os, but not with -Oz.
PR target/117726
gcc/
* config/avr/avr.opt (-msplit-bit-shift): Add new optimization option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift.
* config/avr/avr.h (machine_function.n_avr_fuse_add_executed):
New bool component.
* config/avr/avr.md (attr "isa") <2op, 3op>: Add new values.
(attr "enabled"): Handle them.
(ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative.
Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op).
(define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters.
(define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch
won't overlap with the output operand of the matched insn.
(*ashl<mode>3_const_split): Remove unused ashift:ALL4 splitter.
* config/avr/avr-passes.cc (emit_valid_insn)
(emit_valid_move_clobbercc): Move out of anonymous namespace.
(make_avr_pass_fuse_add) <gate>: Don't override.
<execute>: Set n_avr_fuse_add_executed according to
func->machine->n_avr_fuse_add_executed.
(pass_data avr_pass_data_split_after_peephole2): New object.
(avr_pass_split_after_peephole2): New rtl_opt_pass.
(avr_emit_shift): New static function.
(avr_shift_is_3op, avr_split_shift_p, avr_split_shift)
(make_avr_pass_split_after_peephole2): New functions.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Insert new pass after pass_peephole2.
* config/avr/avr-protos.h
(n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p)
(avr_split_shift, avr_optimize_size_level)
(make_avr_pass_split_after_peephole2): New prototypes.
* config/avr/avr.cc (n_avr_fuse_add_executed): New global variable.
(avr_optimize_size_level): New function.
(avr_set_current_function): Set n_avr_fuse_add_executed
according to cfun->machine->n_avr_fuse_add_executed.
(ashlsi3_out) [case 15]: Output optimized code for this offset.
(avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16.
* config/avr/constraints.md (C4a, C4r, C4r): New constraints.
* pass_manager.h (pass_manager): Adjust comments.
Jeff Law [Thu, 21 Nov 2024 15:24:10 +0000 (08:24 -0700)]
[RISC-V][PR target/116590] Avoid emitting multiple instructions from fmacc patterns
So much like my patch from last week, this removes alternatives that
create multiple instructions that we really should have never needed.
In this case it fixes one of two bugs in pr116590. In particular we
don't want vmvNr instructions for thead-vector. Those instructions were
emitted as part of those two instruction sequences.
I've tested this in my tester and assuming the pre-commit tester is
happy, I'll push it to the trunk.
Pan Li [Mon, 11 Nov 2024 08:44:24 +0000 (16:44 +0800)]
Match: Refactor the unsigned SAT_ADD match pattern [NFC]
This patch would like to refactor the unsigned SAT_ADD pattern by:
* Extract type check outside.
* Extract common sub pattern.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of unsigned SAT_ADD match pattern.
Signed-off-by: Pan Li <pan2.li@intel.com> Signed-off-by: Pan Li <pan2.li@intel.com>
Tamar Christina [Thu, 21 Nov 2024 12:49:35 +0000 (12:49 +0000)]
middle-end: Pass along SLP node when costing vector loads/stores
With the support to SLP only we now pass the VMAT through the SLP node, however
the majority of the costing calls inside vectorizable_load and
vectorizable_store do no pass the SLP node along. Due to this the backend costing
never sees the VMAT for these cases anymore.
Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are
passed would only pass the SLP node along. However the SLP node doesn't contain
all the info available in the stmt_vinfo and we'd have to go through the
SLP_TREE_REPRESENTATIVE anyway. As such I changed the function to just Always
pass both along. Unlike the VMAT changes, I don't believe there to be a
correctness issue here but would minimize the number of churn in the backend
costing until vectorizer costing as a whole is revisited in GCC 16.
These changes re-enable the cost model on AArch64 and also correctly find the
VMATs on loads and stores fixing testcases such as sve_iters_low_2.c.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP
node.
* tree-vect-stmts.cc (record_stmt_cost): Expose.
(vect_get_store_cost, vect_get_load_cost): Extend with SLP node.
(vectorizable_store, vectorizable_load): Pass SLP node to all costing.
* tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and
stmt_vinfo to costing.
(vect_get_load_cost, vect_get_store_cost): Extend with SLP node.
elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size.
was applied, those were missed. At the same time, the testcase was
restricted to Linux though there's nothing Linux-specific in there, so
the error remained undetected.
This patch fixes the definitions to match elfos.h and enables the test
on Solaris, too.
Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.
forwprop: Try to blend two isomorphic VEC_PERM sequences
This extends forwprop by yet another VEC_PERM optimization:
It attempts to blend two isomorphic vector sequences by using the
redundancy in the lane utilization in these sequences.
This redundancy in lane utilization comes from the way how specific
scalar statements end up vectorized: two VEC_PERMs on top, binary operations
on both of them, and a final VEC_PERM to create the result.
Here is an example of this sequence:
To remove the redundancy, lanes 2 and 3 can be freed, which allows to
change the last statement into:
v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}>
// v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}
The cost of eliminating the redundancy in the lane utilization is that
lowering the VEC PERM expression could get more expensive because of
tighter packing of the lanes. Therefore this optimization is not done
alone, but in only in case we identify two such sequences that can be
blended.
Once all candidate sequences have been identified, we try to blend them,
so that we can use the freed lanes for the second sequence.
On success we convert 2x (2x BINOP + 1x VEC_PERM) to
2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.
The implemented transformation reuses (rewrites) the statements
of the first sequence and the last VEC_PERM of the second sequence.
The remaining four statements of the second statment are left untouched
and will be eliminated by DCE later.
This targets x264_pixel_satd_8x4, which calculates the sum of absolute
transformed differences (SATD) using Hadamard transformation.
We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
speedup on an AArch64 machine.
Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).
gcc/ChangeLog:
* tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
structure to store analysis results of a vec perm simplify sequence.
(get_vect_selector_index_map): Helper to get an index map from the
provided vector permute selector.
(recognise_vec_perm_simplify_seq): Helper to recognise a
vec perm simplify sequence.
(narrow_vec_perm_simplify_seq): Helper to pack the lanes more
tight.
(can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
sequences can be blended.
(calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
permutation indices.
(blend_vec_perm_simplify_seqs): Helper to blend two vec perm
simplify sequences.
(process_vec_perm_simplify_seq_list): Helper to process a list
of vec perm simplify sequences.
(append_vec_perm_simplify_seq_list): Helper to add a vec perm
simplify sequence to the list.
(pass_forwprop::execute): Integrate new functionality.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/satd-hadamard.c: New test.
* gcc.dg/tree-ssa/vector-10.c: New test.
* gcc.dg/tree-ssa/vector-8.c: New test.
* gcc.dg/tree-ssa/vector-9.c: New test.
* gcc.target/aarch64/sve/satd-hadamard.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
H.J. Lu [Thu, 21 Nov 2024 11:08:03 +0000 (19:08 +0800)]
apx-ndd-tls-1[ab].c: Add -std=gnu17
Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c
to avoid:
gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’:
gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’
gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here
Rainer Orth [Thu, 21 Nov 2024 10:46:36 +0000 (11:46 +0100)]
libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c etc. for C23 on non-Linux
Since the switch to a C23 default, three libgomp tests FAIL on Solaris:
FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable
Excess errors:
/vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3: error: too many arguments to function 'set_pin_limit'
Fixed by adding the missing size argument to the stub functions.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
Jakub Jelinek [Thu, 21 Nov 2024 09:17:03 +0000 (10:17 +0100)]
include: Add new post-DWARF 5 DW_LANG_* enumerators
DWARF changed the language code assignment to be on a web page and
after DWARF 5 has been published already 27 codes have been assigned.
We have some of those already in the header, but most of them were missing,
including one added just yesterday (DW_LANG_C23).
Note, this is really post-DWARF 5 stuff rather than DWARF 6, because
DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version}
pair where we'll say DW_LNAME_C with 202311 version instead of this.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
* dwarf2.h (enum dwarf_source_language): Add comment where
the post DWARF 5 additions start. Refresh list from
https://dwarfstd.org/languages.html.
Richard Biener [Thu, 21 Nov 2024 08:14:53 +0000 (09:14 +0100)]
tree-optimization/117720 - check alignment for VMAT_STRIDED_SLP
While vectorizable_store was already checking alignment requirement
of the stores and fall back to elementwise accesses if not honored
the vectorizable_load path wasn't doing this. After the previous
change to disregard alignment checking for VMAT_STRIDED_SLP in
get_group_load_store_type this now tripped on power.
PR tree-optimization/117720
* tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP
verify the choosen load type is OK with regard to alignment.
Jakub Jelinek [Thu, 21 Nov 2024 08:40:37 +0000 (09:40 +0100)]
c-family, docs: Adjust descriptions/documentation for C23 publication
As C23 has been published already https://www.iso.org/standard/82075.html
we don't need to say that it is expected to be published etc.
Furthermore, standards.texi was still documenting that -std=gnu17
is the default.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (-std=c23): Adjust documentation for
publication of the ISO/IEC 9899:2024 standard.
* doc/standards.texi: Likewise. Document -std=gnu17 and
-std=gnu23 options. Mention that -std=gnu23 rather than
-std=gnu17 is now the default for C.
gcc/c-family/
* c.opt (std=c23, std=gnu23, std=iso9899:2024): Adjust description
for publication of the ISO/IEC 9899:2024 standard.
Jakub Jelinek [Thu, 21 Nov 2024 08:39:06 +0000 (09:39 +0100)]
phiopt: Improve spaceship_replacement for HONOR_NANS [PR117612]
The following patch optimizes spaceship followed by comparisons of the
spaceship value even for floating point spaceship when NaNs can appear.
operator<=> for this emits roughly
signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2;
and I believe the
/* The optimization may be unsafe due to NaNs. */
comment just isn't true.
Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if
one of the operands is qNaN, then i == j is false and i < j or i > j
is then executed and raises exceptions even on qNaNs.
And we can safely optimize say
c == -1 comparison after the above into i < j, that also raises
exceptions like before and handles NaNs the same way as the original.
The only unsafe transormation would be c == 0 or c != 0, turning it
into i == j or i != j wouldn't raise exception, so I'm not doing that
optimization (but other parts of the compiler optimize the i < j comparison
away anyway).
Anyway, to match the HONOR_NANS case, we need to verify that the
second comparison has true edge to the phi_bb (yielding there -1 or 1),
it can't be the false edge because when NaNs are honored, the false
edge is for both the case where the inverted comparison is true or when
one of the operands is NaN. Similarly we need to ensure that the two
non-equality comparisons are the opposite, while for -ffast-math we can in
some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine,
because NaN is UB, when NaNs are honored, they must be different to leave
the unordered case with 2 value as the last one remaining.
The patch also punts if HONOR_NANS and the phi has just 3 arguments instead
of 4.
When NaNs are honored, we also in some cases need to perform some comparison
and then invert its result (so that exceptions are properly thrown and we
get the correct result).
2024-11-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Handle
HONOR_NANS (TREE_TYPE (lhs1)) case when possible.
* gcc.dg/pr94589-5.c: New test.
* gcc.dg/pr94589-6.c: New test.
* g++.dg/opt/pr94589-5.C: New test.
* g++.dg/opt/pr94589-6.C: New test.
Jakub Jelinek [Thu, 21 Nov 2024 08:38:01 +0000 (09:38 +0100)]
phiopt: Fix a pasto in spaceship_replacement [PR117612]
When working on the PR117612 fix, I've noticed a pasto in
tree-ssa-phiopt.cc (spaceship_replacement).
The code is
if (absu_hwi (tree_to_shwi (arg2)) != 1)
return false;
if (e1->flags & EDGE_TRUE_VALUE)
{
if (tree_to_shwi (arg0) != 2
|| absu_hwi (tree_to_shwi (arg1)) != 1
|| wi::to_widest (arg1) == wi::to_widest (arg2))
return false;
}
else if (tree_to_shwi (arg1) != 2
|| absu_hwi (tree_to_shwi (arg0)) != 1
|| wi::to_widest (arg0) == wi::to_widest (arg1))
return false;
where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
But due to pasto in the latte case doesn't verify that arg0
is different from arg2, it could be both -1 or both 1 and we wouldn't
punt. The wi::to_widest (arg0) == wi::to_widest (arg1) test
is always false when we've made sure in the earlier conditions that
arg1 is 2 and arg0 is -1 or 1, so never 2.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Fix up
a pasto in check when arg1 is 2.
Kewen Lin [Thu, 21 Nov 2024 07:41:34 +0000 (07:41 +0000)]
rs6000: Adjust FLOAT128 signbit2 expander for P8 LE [PR114567]
As the associated test case shows, signbit generated assembly
is sub-optimal for _Float128 argument from memory on P8 LE.
On P8 LE, p8swap pass puts an explicit AND -16 on the memory,
which causes mode_dependent_address_p considers it's invalid
to change its mode and combine fails to make use of the
existing pattern signbit<SIGNBIT:mode>2_dm_mem. Considering
it's always more efficient to make use of 8 bytes load and
shift on P8 LE, this patch is to adjust the current expander
and treat it specially.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Use standard name {add,sub}v1ti3 for altivec_v{add,sub}uqm
This patch is to adjust define_insn altivec_v{add,sub}uqm
with standard names, as the associated test case shows, w/o
this patch, it ends up with scalar {add,subf}c/{add,subf}e,
the standard names help to exploit v{add,sub}uqm.
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vadduqm): Rename to ...
(addv1ti3): ... this.
(altivec_vsubuqm): Rename to ...
(subv1ti3): ... this.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vadduqm):
Replace bif expander altivec_vadduqm with addv1ti3.
(__builtin_altivec_vsubuqm): Replace bif expander altivec_vsubuqm with
subv1ti3.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p8vector-int128-3.c: New test.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Remove entry for V1TImode from VI_unit
When making a patch to adjust VECTOR_P8_VECTOR rs6000_vector
enum, I noticed that V1TImode's mode attribute in VI_unit
VECTOR_UNIT_ALTIVEC_P (V1TImode) is never true, since
VECTOR_UNIT_ALTIVEC_P checks if vector_unit[V1TImode] is
equal to VECTOR_ALTIVEC, but vector_unit[V1TImode] can only
be VECTOR_NONE or VECTOR_P8_VECTOR, there is no chance to be
VECTOR_ALTIVEC:
rs6000_vector_unit[V1TImode]
= (TARGET_P8_VECTOR) ? VECTOR_P8_VECTOR : VECTOR_NONE;
By checking all uses of VI_unit, the used mode iterator is
one of VI2, VI, VP_small and VP, none of them has V1TImode,
so the entry for V1TImode is useless. I guessed it was
designed to have one mode attribute to cover all integer
vector modes, but later we separated V1TI handlings to its
own patterns (those guarded with TARGET_VADDUQM). Anyway,
this patch is to remove this useless and confusing entry.
gcc/ChangeLog:
* config/rs6000/altivec.md (mode attr for V1TI in VI_unit): Remove.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Add veqv support to *eqv<mode>3_internal1
When making patch to replace TARGET_P8_VECTOR, I noticed
for *eqv<BOOL_128:mode>3_internal1 unlike the other logical
operations, we only exploited the vsx version. I think it
is an oversight, this patch is to consider veqv as well.
gcc/ChangeLog:
* config/rs6000/rs6000.md (*eqv<BOOL_128:mode>3_internal1): Generate
insn veqv if TARGET_ALTIVEC and operands are altivec_register_operand.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Remove ISA_3_0_MASKS_IEEE and check P9_VECTOR instead
When working to get rid of mask bit OPTION_MASK_P8_VECTOR,
I noticed that the check on ISA_3_0_MASKS_IEEE is actually
to check TARGET_P9_VECTOR, since we check all three mask
bits together and p9 vector guarantees p8 vector and vsx
should be enabled. So this patch is to adjust this first
as preparatory patch for the following patch to change
all uses of OPTION_MASK_P8_VECTOR and TARGET_P8_VECTOR.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Simplify some conditions or code related to TARGET_DIRECT_MOVE
When I was making a patch to rework TARGET_P8_VECTOR, I
noticed that there are some redundant checks and dead code
related to TARGET_DIRECT_MOVE, so I made this patch as one
separated preparatory patch, it consists of:
- Check either TARGET_DIRECT_MOVE or TARGET_P8_VECTOR only
according to the context, rather than checking both of
them since they are actually the same (TARGET_DIRECT_MOVE
is defined as TARGET_P8_VECTOR).
- Simplify TARGET_VSX && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE since direct move ensures VSX enabled.
- Replace some TARGET_POWERPC64 && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE_64BIT to simplify it.
- Remove some dead code guarded with TARGET_DIRECT_MOVE
but the condition never holds here.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Simplify
TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_P8_VECTOR.
(rs6000_output_move_128bit): Simplify TARGET_VSX && TARGET_DIRECT_MOVE
as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.h (TARGET_XSCVDPSPN): Simplify conditions
TARGET_DIRECT_MOVE || TARGET_P8_VECTOR as TARGET_P8_VECTOR.
(TARGET_XSCVSPDPN): Likewise.
(TARGET_DIRECT_MOVE_128): Simplify TARGET_DIRECT_MOVE &&
TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
(TARGET_VEXTRACTUB): Likewise.
(TARGET_DIRECT_MOVE_64BIT): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.md (signbit<mode>2, @signbit<mode>2_dm,
*signbit<mode>2_dm_mem, floatsi<mode>2_lfiwax,
floatsi<SFDF:mode>2_lfiwax_<QHI:mode>_mem_zext,
floatunssi<mode>2_lfiwzx, float<QHI:mode><SFDF:mode>2,
*float<QHI:mode><SFDF:mode>2_internal, floatuns<QHI:mode><SFDF:mode>2,
*floatuns<QHI:mode><SFDF:mode>2_internal, p8_mtvsrd_v16qidi2,
p8_mtvsrd_df, p8_xxpermdi_<mode>, reload_vsx_from_gpr<mode>,
p8_mtvsrd_sf, reload_vsx_from_gprsf, p8_mfvsrd_3_<mode>,
reload_gpr_from_vsx<mode>, reload_gpr_from_vsxsf, unpack<mode>_dm):
Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as
TARGET_DIRECT_MOVE_64BIT.
(unpack<mode>_nodm): Simplify !TARGET_DIRECT_MOVE || !TARGET_POWERPC64
as !TARGET_DIRECT_MOVE_64BIT.
(fix_trunc<mode>si2, fix_trunc<mode>si2_stfiwx,
fix_trunc<mode>si2_internal): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
(fix_trunc<mode>si2_stfiwx, fixuns_trunc<mode>si2_stfiwx): Remove some
dead code as the guard TARGET_DIRECT_MOVE there never holds.
(fixuns_trunc<mode>si2_stfiwx): Change TARGET_P8_VECTOR with
TARGET_DIRECT_MOVE which is a better fit.
* config/rs6000/vsx.md (define_peephole2 for SFmode in GPR): Simplify
TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
Lewis Hyatt [Fri, 25 Oct 2024 18:55:09 +0000 (14:55 -0400)]
tree-cfg: Fix call to next_discriminator_for_locus()
While testing future 64-bit location_t support, I ran into an
-fcompare-debug issue that was traced back here. Despite the name,
next_discriminator_for_locus() is meant to take an integer line number
argument, not a location_t. There is one call site which has been passing a
location_t instead. For the most part that is harmless, although in case
there are two CALL stmts on the same line with different location_t, it may
fail to generate a unique discriminator where it should. If/when location_t
changes to be 64-bit, however, it will produce an -fcompare-debug
failure. Fix it by passing the line number rather than the location_t.
I am not aware of a testcase that demonstrates any observable wrong
behavior, but the file debug/pr53466.C is an example where the discriminator
assignment is indeed different before and after this change.
gcc/ChangeLog:
* tree-cfg.cc (assign_discriminators): Fix incorrect value passed to
next_discriminator_for_locus().
Harald Anlauf [Wed, 20 Nov 2024 20:59:22 +0000 (21:59 +0100)]
Fortran: fix checking of protected variables in submodules [PR83135]
When a symbol was use-associated in the ancestor of a submodule, a
PROTECTED attribute was ignored in the submodule or its descendants.
Find the real ancestor of symbols when used in a variable definition
context in a submodule.
PR fortran/83135
gcc/fortran/ChangeLog:
* expr.cc (sym_is_from_ancestor): New helper function.
(gfc_check_vardef_context): Refine checking of PROTECTED attribute
of symbols that are indirectly use-associated in a submodule.
Joseph Myers [Wed, 20 Nov 2024 21:29:48 +0000 (21:29 +0000)]
c: Diagnose compound literal for empty array [PR114266]
As reported in bug 114266, GCC fails to pedwarn for a compound
literal, whose type is an array of unknown size, initialized with an
empty initializer. This case is disallowed by C23 (which doesn't have
zero-size objects); the case of a named object is diagnosed as
expected, but not that for compound literals. (Before C23, the
pedwarn for empty initializers sufficed.) Add a check for this
specific case with a pedwarn.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/114266
gcc/c/
* c-decl.cc (build_compound_literal): Diagnose array of unknown
size with empty initializer for C23.
gcc/testsuite/
* gcc.dg/c23-empty-init-4.c: New test.
Antoni Boucher [Thu, 18 Jan 2024 22:54:59 +0000 (17:54 -0500)]
libgccjit: Add support for creating temporary variables
gcc/jit/ChangeLog:
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_33): New ABI tag.
* docs/topics/functions.rst: Document gcc_jit_function_new_temp.
* jit-playback.cc (new_local): Add support for temporary
variables.
* jit-recording.cc (recording::function::new_temp): New method.
(recording::local::write_reproducer): Support temporary
variables.
* jit-recording.h (new_temp): New method.
* libgccjit.cc (gcc_jit_function_new_temp): New function.
* libgccjit.h (gcc_jit_function_new_temp): New function.
* libgccjit.map: New function.
gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Mention test-temp.c.
* jit.dg/test-temp.c: New test.
[PR116587][LRA]: Fix last chance reload pseudo allocation
On i686 PR116587 test compilation resulted in LRA failure to find
registers for a reload insn pseudo. The insn requires 6 regs for 4
reload insn pseudos where two of them require 2 regs each. But we
have only 5 free regs as sp is a fixed reg, bp is fixed because of
-fno-omit-frame-pointer, bx is assigned to pic_offset_table_pseudo
because of -fPIC. LRA spills pic_offset_table_pseudo as the last
chance approach to allocate registers to the reload pseudo. Although
it makes 2 free registers for the unallocated reload pseudo requiring
also 2 regs, the pseudo still can not be allocated as the 2 free regs
are disjoint. The patch spills all pseudos conflicting with the
unallocated reload pseudo including already allocated reload insn
pseudos, then standard LRA code allocates spilled pseudos requiring
more one register first and avoid situation of the disjoint regs for
reload pseudos requiring more one reg.
gcc/ChangeLog:
PR target/116587
* lra-assigns.cc (find_all_spills_for): Consider all pseudos whose
classes intersect given pseudo class.
gcc/testsuite/ChangeLog:
PR target/116587
* gcc.target/i386/pr116587.c: New test.
Antoni Boucher [Mon, 23 Jan 2023 22:21:15 +0000 (17:21 -0500)]
libgccjit: Add support for machine-dependent builtins
gcc/jit/ChangeLog:
PR jit/108762
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_32): New ABI tag.
* docs/topics/functions.rst: Add documentation for the function
gcc_jit_context_get_target_builtin_function.
* dummy-frontend.cc: Include headers target.h, jit-recording.h,
print-tree.h, unordered_map and string, new variables (target_builtins,
target_function_types, and target_builtins_ctxt), new function
(tree_type_to_jit_type).
* jit-builtins.cc: Specify that the function types are not from
target builtins.
* jit-playback.cc: New argument is_target_builtin to new_function.
* jit-playback.h: New argument is_target_builtin to
new_function.
* jit-recording.cc: New argument is_target_builtin to
new_function_type, function_type constructor and function
constructor, new function
(get_target_builtin_function).
* jit-recording.h: Include headers string and unordered_map, new
variable target_function_types, new argument is_target_builtin
to new_function_type, function_type and function, new functions
(get_target_builtin_function, copy).
* libgccjit.cc: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.h: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.map: New functions
(gcc_jit_context_get_target_builtin_function).
gcc/testsuite:
PR jit/108762
* jit.dg/all-non-failing-tests.h: New test test-target-builtins.c.
* jit.dg/test-target-builtins.c: New test.
Andrew Pinski [Wed, 20 Nov 2024 03:49:38 +0000 (19:49 -0800)]
aarch64: Fix aarch64 after moving to C23
This fixes a few aarch64 specific testcases after the move to default to GNU C23.
For the SME testcases, the GNU C23 cases as `()` changing to mean `(void)` instead
of a non-prototype declaration; the non-prototype declaration merging was confusing
some of the time so the updated way is the expected way even for that.
For pic-*.c `-Wno-old-style-definition` was added not to warn about old style definitions.
For pr113573.c, I added `-std=gnu17` since I was not sure if `(...)` with C23 would invoke
the same issue.
Andrew Pinski [Wed, 20 Nov 2024 07:45:20 +0000 (23:45 -0800)]
rtl-reader: Disable reuse_rtx support for generator building
reuse_rtx is not documented nor the format to use it is ever documented.
So it should not be supported for the .md files.
This also fixes the problem if an invalid index is supplied for reuse_rtx,
instead of ICEing, put out a real error message. Note since this code
still uses atoi, an invalid index can still be used in some cases but that is
recorded as part of PR 44574.
Note I did a grep of the sources to make sure that this was only used for
the read rtl in the GCC rather than while reading in .md files.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* read-md.h (class rtx_reader): Don't include m_reuse_rtx_by_id
when GENERATOR_FILE is defined.
* read-rtl.cc (rtx_reader::read_rtx_code): Disable reuse_rtx
support when GENERATOR_FILE is defined.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Edwin Lu [Tue, 19 Nov 2024 20:55:15 +0000 (12:55 -0800)]
RISC-V: testsuite: restrict big endian test to non vector
RISC-V vector currently does not support big endian so the postcommit
was getting the sorry, not implemented error on vector targets. Restrict
the testcase to non-vector targets
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117595.c: Restrict to non vector targets.
Richard Biener [Wed, 20 Nov 2024 15:47:08 +0000 (16:47 +0100)]
tree-optimization/117709 - bogus offset for gather load
When diverting to VMAT_GATHER_SCATTER we fail to zero *poffset
which was previously set if a load was classified as
VMAT_CONTIGUOUS_REVERSE. The following refactors
get_group_load_store_type a bit to avoid this but this all needs
some serious TLC.
PR tree-optimization/117709
* tree-vect-stmts.cc (get_group_load_store_type): Only
set *poffset when we end up with VMAT_CONTIGUOUS_DOWN
or VMAT_CONTIGUOUS_REVERSE.
Richard Biener [Wed, 20 Nov 2024 12:32:48 +0000 (13:32 +0100)]
tree-optimization/117698 - SLP vectorization and alignment
When SLP vectorizing we fail to mark the general alignment check
as irrelevant when using VMAT_STRIDED_SLP (the implementation checks
for itself) and when VMAT_INVARIANT the override isn't effective.
This results in extra FAILs on sparc which the following fixes.
PR tree-optimization/117698
* tree-vect-stmts.cc (get_group_load_store_type): Properly
disregard alignment for VMAT_STRIDED_SLP and VMAT_INVARIANT.
(vectorizable_load): Adjust guard for dumping whether we
vectorize and unaligned access.
(vectorizable_store): Likewise.
Antoni Boucher [Thu, 15 Feb 2024 22:03:22 +0000 (17:03 -0500)]
libgccjit: Add option to allow special characters in function names
gcc/jit/ChangeLog:
* docs/topics/contexts.rst: Add documentation for new option.
* jit-recording.cc (recording::context::get_str_option): New
method.
* jit-recording.h (get_str_option): New method.
* libgccjit.cc (gcc_jit_context_new_function): Allow special
characters in function names.
* libgccjit.h (enum gcc_jit_str_option): New option.
OpenMP: common C/C++ testcases for dispatch + adjust_args
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/declare-variant-dup-match-clause.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
* c-c++-common/gomp/dispatch-8.c: New test.
* c-c++-common/gomp/dispatch-9.c: New test.
* c-c++-common/gomp/dispatch-10.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/dispatch-1.c: New test.
* testsuite/libgomp.c-c++-common/dispatch-2.c: New test.
OpenMP: C++ front-end support for dispatch + adjust_args
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.
Additional C/C++ common testcases are provided in a subsequent patch in the
series.
gcc/cp/ChangeLog:
* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add 'tree parms' parameter. Update
call to cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
(tsubst_stmt): Handle OMP_DISPATCH.
(tsubst_expr): Handle IFN_GOMP_DISPATCH.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/adjust-args-3.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
* g++.dg/gomp/dispatch-3.C: New test.
* g++.dg/gomp/dispatch-4.C: New test.
* g++.dg/gomp/dispatch-5.C: New test.
* g++.dg/gomp/dispatch-6.C: New test.
* g++.dg/gomp/dispatch-7.C: New test.
OpenMP: C front-end support for dispatch + adjust_args
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.
Additional common C/C++ testcases are in a later patch in the series.
* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
* gcc.dg/gomp/dispatch-2.c: New test.
* gcc.dg/gomp/dispatch-3.c: New test.
* gcc.dg/gomp/dispatch-4.c: New test.
* gcc.dg/gomp/dispatch-5.c: New test.
OpenMP: middle-end support for dispatch + adjust_args
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `omp_get_mapped_ptr` for the adequate device.
For dispatch, the following steps are performed:
* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.
* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.
OpenMP: dispatch + adjust_args tree data structures and front-end interfaces
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.
This patch adds support for FEAT_SME2p1. There are two sets of
new instructions: MOVAZ to read from ZA and zero the source data,
and new forms of ZERO. All of them require streaming mode.
MOVAZ can't reuse the existing UNSPEC_SME_READ* patterns because
of the write to ZA. I did wonder about trying to use a define_subst,
but it seemed a bit too awkward.
gcc/
* config/aarch64/aarch64-option-extensions.def (sme2p1): New extension.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64.h (TARGET_STREAMING_SME2p1): New macro.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Conditionally define __ARM_FEATURE_SME2p1.
* config/aarch64/iterators.md (UNSPEC_SME_READZ, UNSPEC_SME_READZ_HOR)
(UNSPEC_SME_READZ_VER): New unspecs.
(optab, hv): Handle them.
(SME_READZ_HV): New int iterator.
* config/aarch64/aarch64-sme.md
(UNSPEC_SME_ZERO_SLICES): New unspec.
(@aarch64_sme_<SME_READZ_HV:optab><v_int_container><mode>)
(*aarch64_sme_<SME_READZ_HV:optab><v_int_container><mode>_plus)
(@aarch64_sme_<SME_READZ_HV:optab><VNx1TI_ONLY:mode><SVE_FULL:mode>)
(@aarch64_sme_<SME_READZ_HV:optab><SVE_FULLx24:mode><mode>)
(*aarch64_sme_<SME_READZ_HV:optab><SVE_FULLx24:mode><mode>_plus)
(@aarch64_sme_readz<mode>, *aarch64_sme_readz<mode>_plus)
(@aarch64_sme_zero_za_slices<mode>): New patterns.
(*aarch64_sme_zero_za_slices<mode>_plus): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(inherent_za_slice): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(inherent_za_slice_def, inherent_za_slice): New shape.
* config/aarch64/aarch64-sve-builtins-sme.h (svreadz_za)
(svreadz_hor_za, svreadz_ver_za): Declare.
* config/aarch64/aarch64-sve-builtins-sme.cc
(svread_za_slice_base): New class, split out from...
(svread_za_impl): ...here.
(svreadz_za_impl, svreadz_za_tile_impl): New type aliases.
(zero_slices_mode): New function.
(svzero_za_impl::expand): Handle the slice forms.
(svreadz_za, svreadz_hor_za, svreadz_ver_za): New functions.
* config/aarch64/aarch64-sve-builtins-sme.def: Add the SME2p1
instructions.
This patch adds support for the SME_F16F16 extension. The extension
adds two new instructions to convert from a single vector of f16s
to two vectors of f32s. It also adds f16 variants of existing SME
ZA instructions.
gcc/
* config/aarch64/aarch64-option-extensions.def
(sme-f16f16): New extension.
* doc/invoke.texi: Document it. Also document that sme-i16i64 and
sme-f64f64 enable SME.
* config/aarch64/aarch64.h (TARGET_STREAMING_SME_F16F16): New macro.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Conditionally define __ARM_FEATURE_SME_F16F16.
* config/aarch64/aarch64-sve-builtins-sve2.def (svcvt, svcvtl): Add
new SME_F16F16 intrinsics.
* config/aarch64/aarch64-sve-builtins-sme.def: Add SME_F16F16 forms
of existing intrinsics.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_h_float)
(TYPES_cvt_f32_f16, TYPES_za_h_float): New type macros.
* config/aarch64/aarch64-sve-builtins-base.cc
(svcvt_impl::expand): Add sext_optab as another possibility.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtl): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtl_impl): New class.
(svcvtl): New function.
* config/aarch64/iterators.md (VNx8SF_ONLY): New mode iterator.
(SME_ZA_SDFx24): Replace with...
(SME_ZA_HSDFx24): ...this.
(SME_MOP_SDF): Replace with...
(SME_MOP_HSDF): ...this.
(SME_BINARY_SLICE_SDF): Replace with...
(SME_BINARY_SLICE_HSDF): ...this.
* config/aarch64/aarch64-sve2.md (extendvnx8hfvnx8sf2)
(@aarch64_sve_cvtl<mode>): New patterns.
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<SME_BINARY_SLICE_SDF:optab><mode>): Extend to...
(@aarch64_sme_<SME_BINARY_SLICE_HSDF:optab><mode>): ...this.
(*aarch64_sme_<SME_BINARY_SLICE_SDF:optab><mode>_plus): Extend to...
(*aarch64_sme_<SME_BINARY_SLICE_HSDF:optab><mode>_plus): ...this.
(@aarch64_sme_<SME_FP_TERNARY_SLICE:optab><mode><mode>): Extend to
HF modes.
(*aarch64_sme_<SME_FP_TERNARY_SLICE:optab><mode><mode>_plus)
(@aarch64_sme_single_<SME_FP_TERNARY_SLICE:optab><mode><mode>)
(*aarch64_sme_single_<SME_FP_TERNARY_SLICE:optab><mode><mode>_plus)
(@aarch64_sme_lane_<SME_FP_TERNARY_SLICE:optab><mode><mode>)
(*aarch64_sme_lane_<SME_FP_TERNARY_SLICE:optab><mode><mode>)
(@aarch64_sme_<SME_FP_MOP:optab><mode><mode>): Likewise.
gcc/testsuite/
* lib/target-supports.exp: Test the assembler for sve-f16f16 support.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Add tests for
__ARM_FEATURE_SME_F16F16. Also extend the existing SME tests.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_X2_WIDE): New macro
* gcc.target/aarch64/sme2/acle-asm/add_za16_f16_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/add_za16_f16_vg1x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/cvt_f32_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/cvtl_f32_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_f16_vg1x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_f16_vg1x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_f16_vg1x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_f16_vg1x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mls_lane_za16_f16_vg1x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mls_lane_za16_f16_vg1x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mls_za16_f16_vg1x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mls_za16_f16_vg1x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mopa_za16_f16.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/mops_za16_f16.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/sub_za16_f16_vg1x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/sub_za16_f16_vg1x4.c: Likewise.
This patch adds support for the SVE_B16B16 extension, which provides
non-widening BF16 versions of existing instructions.
Mostly it's just a simple extension of iterators. The main
complications are:
(1) The new instructions have no immediate forms. This is easy to
handle for the cond_* patterns (the ones that have an explicit
else value) since those are already divided into register and
non-register versions. All we need to do is tighten the predicates.
However, the @aarch64_pred_<optab><mode> patterns handle the
immediates directly. Rather than complicate them further,
it seemed best to add a single @aarch64_pred_<optab><mode> for
all BF16 arithmetic.
(2) There is no BFSUBR, so the usual method of handling reversed
operands breaks down. The patch deals with this using some
new attributes that together disable the "BFSUBR" alternative.
(3) Similarly, there are no BFMAD or BFMSB instructions, so we need
to disable those forms in the BFMLA and BFMLS patterns.
The patch includes support for generic bf16 vectors too.
It would be possible to use these instructions for scalars, as with
the recent FLOGB patch, but that's left as future work.
gcc/
* config/aarch64/aarch64-option-extensions.def
(sve-b16b16): New extension.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64.h (TARGET_SME_B16B16, TARGET_SVE2_OR_SME2)
(TARGET_SSVE_B16B16): New macros.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Conditionally define __ARM_FEATURE_SVE_B16B16
* config/aarch64/aarch64-sve-builtins-sve2.def: Add AARCH64_FL_SVE2
to the SVE2p1 requirements. Add SVE_B16B16 forms of existing
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc (type_suffixes): Treat
bfloat as a floating-point type.
(TYPES_h_bfloat): New macro.
* config/aarch64/aarch64.md (is_bf16, is_rev, supports_bf16_rev)
(mode_enabled): New attributes.
(enabled): Test mode_enabled.
* config/aarch64/iterators.md (SVE_FULL_F_BF): New mode iterator.
(SVE_CLAMP_F): Likewise.
(SVE_Fx24): Add BF16 modes when TARGET_SSVE_B16B16.
(sve_lane_con): Handle BF16 modes.
(b): Handle SF and DF modes.
(is_bf16): New mode attribute.
(supports_bf16, supports_bf16_rev): New int attributes.
* config/aarch64/predicates.md
(aarch64_sve_float_maxmin_immediate): Reject BF16 modes.
* config/aarch64/aarch64-sve.md
(*post_ra_<sve_fp_op><mode>3): Add BF16 support, and likewise
for the associated define_split.
(<optab:SVE_COND_FP_BINARY_OPTAB><mode>): Add BF16 support.
(@cond_<optab:SVE_COND_FP_BINARY><mode>): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_2_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_2_strict): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_3_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_3_strict): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_any_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_any_strict): Likewise.
(@aarch64_mul_lane_<mode>): Likewise.
(<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(@aarch64_pred_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(@cond_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_strict): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_strict): Likewise.
(@aarch64_<optab:SVE_FP_TERNARY_LANE>_lane_<mode>): Likewise.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<optab:SVE_COND_FP_BINARY><mode>): Define BF16 version.
(@aarch64_sve_fclamp<mode>): Add BF16 support.
(*aarch64_sve_fclamp<mode>_x): Likewise.
(*aarch64_sve_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
(*aarch64_sve_single_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Return false for BF16 modes.
This patch just renames the iterators SME_READ and SME_WRITE to
SME_READ_HV and SME_WRITE_HV, to distinguish them from other forms
of ZA read and write.
aarch64: Refactor SVE predicated-to-unpredicated splits
There are separate patterns for predicated FADD, FSUB, and FMUL.
Previously they each had their own in-built split to convert the
instruction to unpredicated form where appropriate. However, it's
more convenient for later patches if we use a single separate split
instead.
gcc/
* config/aarch64/iterators.md (SVE_COND_FP): New code attribute.
* config/aarch64/aarch64-sve.md: Use a single define_split to
handle the conversion of predicated FADD, FSUB, and FMUL into
unpredicated forms.
Many of the SME ZA intrinsics have two type suffixes: one for ZA
and one for the vectors. The ZA suffix only conveys an element
size, while the vector suffix conveys both an element type and
an element size. Internally, the ZA suffix maps to an integer mode;
e.g. za32 maps to VNx4SI.
For SME2, it was relatively convenient to use the modes associated
with both suffixes directly. For example, the (non-widening) FMLA
intrinsics used SME_ZA_SDF_I to iterate over the possible ZA modes,
used SME_ZA_SDFx24 to iterate over the possible vector tuple modes,
and used a C++ condition to make sure that the element sizes agree.
However, for later patches it's more convenient to rely only on
the vector mode in cases where the ZA and vector element sizes
are the same. This means splitting the widening MOPA/S patterns
from the non-widening ones, but otherwise it's not a big change.
gcc/
* config/aarch64/iterators.md (SME_ZA_SDF_I): Delete.
(SME_MOP_HSDF): Replace with...
(SME_MOP_SDF): ...this.
* config/aarch64/aarch64-sme.md: Change the non-widening FMLA and
FMLS patterns so that both mode parameters are the same, rather than
using both SME_ZA_SDF_I and SME_ZA_SDFx24 and checking that their
element sizes are the same. Split the FMOPA and FMOPS patterns
into separate non-widening and widening forms, then update the
non-widening forms in a similar way to FMLA and FMLS.
* config/aarch64/aarch64-sve-builtins-functions.h
(sme_2mode_function_t::expand): If the two type suffixes have the same
element size, use the vector tuple mode for both mode parameters.
Mikael Morin [Wed, 20 Nov 2024 12:59:51 +0000 (13:59 +0100)]
fortran: Evaluate once BACK argument of MINLOC/MAXLOC with DIM [PR90608]
Evaluate the BACK argument of MINLOC/MAXLOC once before the
scalarization loops in the case where the DIM argument is present.
This is a follow-up to r15-1994-ga55d24b3cf7f4d07492bb8e6fcee557175b47ea3
which added knowledge of BACK to the scalarizer, to r15-2701-ga10436a8404ad2f0cc5aa4d6a0cc850abe5ef49e which removed it to
handle it out of scalarization instead, and to more immediate previous
patches that added inlining support for MINLOC/MAXLOC with DIM. The
inlining support for MINLOC/MAXLOC with DIM introduced nested loops, which
made the evaluation of BACK (removed from the scalarizer knowledge by the
forementionned commit) wrapped in a loop, so possibly executed more than
once. This change adds BACK to the scalarization chain if MINLOC/MAXLOC
will use nested loops, so that it is evaluated by the scalarizer only once
before the outermost loop in that case.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc
(walk_inline_intrinsic_minmaxloc): Add a scalar element for BACK as
first item of the chain if BACK is present and there will be nested
loops.
(gfc_conv_intrinsic_minmaxloc): Evaluate BACK using an inherited
scalarization chain if there is a nested loop.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_8.f90: New test.
* gfortran.dg/minloc_9.f90: New test.
Uros Bizjak [Wed, 20 Nov 2024 11:57:25 +0000 (12:57 +0100)]
i386: Remove workaround for Solaris ld 64-bit TLS IE limitation
As detailed in PR target/43309, the Solaris linker initially took the
64-bit x86 TLS IE code sequence literally, assuming that the spec only
allowed %rax as target register.
A workaround has been in place for more than a decade, but is no longer
necessary. The bug had already been fixed for Solaris 11.1, while trunk
requires Solaris 11.4.
Uros pointed this out and suggested the attached patch.
Bootstrapped without regressions on i386-pc-solaris2.11.
Pan Li [Wed, 20 Nov 2024 07:16:22 +0000 (15:16 +0800)]
RISC-V: Refine the rtl dump expand check for vector SAT_ADD
This patch would like to remove the unnecessary option for the
vector SAT_ADD testcases at first. And the different optimization
option like O2 and O3 will be passed to the test files for rtl
expand dump check. If there are different dump check times for
different optimization options, the target no-opts and/or any-opts
will be leveraged for the dg-final check.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Pan Li [Wed, 20 Nov 2024 05:32:47 +0000 (13:32 +0800)]
RISC-V: Introduce riscv/rvv/autovec/sat folder to rvv.exp testsuite
After we move vector SAT_ADD testcases into a isolated folder, aka
riscv/rvv/autovec/sat. We would like to add the folder as one of
the test items of the rvv.exp testsuite.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add the vector sat folder to
the rvv.exp testsuite.
Pan Li [Wed, 20 Nov 2024 05:22:40 +0000 (13:22 +0800)]
RISC-V: Rearrange the test files for vector SAT_ADD [NFC]
The test files of scalar SAT_TRUNC only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_TRUNC will have -3-u32.c for asm check and
-run-3-u32.c for the run test.
Meanwhile, all related test files moved to riscv/rvv/autovec/sat/.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Richard Biener [Fri, 15 Nov 2024 10:56:14 +0000 (11:56 +0100)]
tree-optimization/117574 - bougs niter lt-to-ne
When trying to change a IV from IV0 < IV1 to IV0' != IV1' we apply
fancy adjustments to the may_be_zero condition we compute rather
than using the obvious IV0->base >= IV1->base expression (to be
able to use > instead of >=?). This doesn't seem to go well.
PR tree-optimization/117574
* tree-ssa-loop-niter.cc (number_of_iterations_lt_to_ne):
Use the obvious may_be_zero condition.
Expand can implement NEG and ABS of scalar floating-point modes
by using logic ops to manipulate the sign bit. This patch extends
that approach to vectors, since it fits relatively easily into the
same structure.
The motivating use case was to inline bf16 NEG and ABS operations
for AArch64. The patch includes tests for that.
get_absneg_bit_mode required a new opt_mode constructor, so that
opt_mode<T> can be constructed from opt_mode<U> if T is no less
general than U.
gcc/
* machmode.h (opt_mode::opt_mode): New overload.
* optabs-query.h (get_absneg_bit_mode): Declare.
* optabs-query.cc (get_absneg_bit_mode): New function, split
out from expand_absneg_bit.
(can_open_code_p): Use get_absneg_bit_mode.
* optabs.cc (expand_absneg_bit): Likewise. Take an outer and inner
mode, rather than just one. Handle vector modes.
(expand_unop, expand_abs_nojump): Update calls accordingly.
Handle vector modes.
This patch goes through the tree-vect-* code and mechanically replaces
all tests of optab_handler against CODE_FOR_nothing with calls to the
new helper functions.
Mikael Morin [Sat, 18 Nov 2023 19:54:20 +0000 (20:54 +0100)]
fortran: Check for empty MINLOC/MAXLOC ARRAY along DIM only
In the function generating inline code to implement MINLOC and MAXLOC, only
check for ARRAY size along DIM if DIM is present.
The check for ARRAY emptyness had been checking the size of the full array,
which is correct for MINLOC and MAXLOC without DIM. But if DIM is
present, the reduction is along DIM only so the check for emptyness
should consider that dimension only as well.
This sounds like a correctness issue, but fortunately the cases where it
makes a difference are cases where ARRAY is empty, so even if the value
calculated for MINLOC or MAXLOC is wrong, it's wrapped in a zero iteration
loop, and the wrong values are not actually used. In the end this just
avoids unnecessary calculations.
A previous version of this patch regressed on non-constant DIM with rank 1
ARRAY. The new testcase checks that that case is supported.
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Only get the size
along DIM instead of the full size if DIM is present.
MayShao-oc [Thu, 7 Nov 2024 02:57:02 +0000 (10:57 +0800)]
Add microarchtecture tunable for pass_align_tight_loops [PR117438]
Hi Hongtao:
Add m_CASCADELAK, and m_SKYLAKE_AVX512.
Place X86_TUNE_ALIGN_TIGHT_LOOPS in the appropriate section.
Bootstrapped X86_64.
Ok for trunk?
BR
Mayshao
gcc/ChangeLog:
PR target/117438
* config/i386/i386-features.cc (TARGET_ALIGN_TIGHT_LOOPS):
default true in all processors except for m_ZHAOXIN, m_CASCADELAKE, and
m_SKYLAKE_AVX512.
* config/i386/i386.h (TARGET_ALIGN_TIGHT_LOOPS): New Macro.
* config/i386/x86-tune.def (X86_TUNE_ALIGN_TIGHT_LOOPS):
New tune
testsuite: arm: Only check for absence of literal pools in no-literal-pool-m0.c
With the changes in r15-1579-g792f97b44ff, the constants have been
updated.
This patch drops the fragile check on the constants and instead only
checks that there is no literal pool generated.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pure-code/no-literal-pool-m0.c: Only check for
absence of literal pools.
Jonathan Wakely [Tue, 19 Nov 2024 23:59:00 +0000 (23:59 +0000)]
libstdc++: Use const_iterator in std::set::find<K> return type
François noticed that the "wrong" type is used in the return type for a
std::set member function template.
The iterator for our std::set is the same type as const_iterator,
so this doesn't actually matter. But it's clearer if the return type
matches the type used in the function body.
libstdc++-v3/ChangeLog:
* include/bits/stl_set.h (set::find): Use const_iterator in
return type, not iterator.
The __is_key_type specialization that matches a pair<key_type, T>
argument is intended for std::unordered_map, not for
std::unordered_set<std::pair<K,T>>.
This uses a pair<const Args&...> as the template argument for
__is_key_type, so that it won't match a set's key_type.
libstdc++-v3/ChangeLog:
PR libstdc++/117686
* include/bits/hashtable.h (_Hashtable::_M_emplace_uniq):
Adjust usage of __is_key_type to avoid false positive.
* testsuite/23_containers/unordered_set/insert/117686.cc:
New test.
Pan Li [Tue, 19 Nov 2024 07:27:39 +0000 (15:27 +0800)]
RISC-V: Refine the rtl expand check for strided ld/st
This patch would like to remove the unnecessary option for the
strided load/store testcases. After fix the option from the rvv.exp,
both the O2 and O3 will be passed to the test files for rtl expand
dump check but the O2 has 2 time for IFN while the O3 has 4 times with
-fvectorize specificed.
Thus, add xfail O2 for IFN 4 times check, as well as xfail O3 for 2
times check.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Pan Li [Tue, 19 Nov 2024 07:18:53 +0000 (15:18 +0800)]
RISC-V: Fix incorrect optimization options passing to strided ld/st test
The testcases of vector strided load/store are designed to pick up
different sorts of optimization options but actually these option
are ignored according to the Execution log of gcc.log. This patch
would like to make it correct, and then you will see the build option
similar as below from the gcc.log.
Jeff Law [Wed, 20 Nov 2024 02:24:41 +0000 (19:24 -0700)]
[RISC-V][PR target/117649] Fix branch on masked values splitter
Andreas reported GCC mis-compiled GAS for risc-v Thankfully he also reduced it
to a nice little testcase.
So the whole point of the pattern in question is to "reduce" the constants by
right shifting away common unnecessary bits in RTL expressions like this:
When applicable, the reduced constants in operands 2/3 fit into a simm12 and
thus do not need multi-instruction synthesis. Note that we have to also shift
operand 1.
That shift should have been an arithmetic shift, but was incorrectly coded as a
logical shift.
Fixed with the obvious change on the right shift opcode.
Expecting to push to the trunk once the pre-commit tester renders its verdict.
I've already tested in this my tester for rv32 and rv64.
PR target/117649
gcc/
* config/riscv/riscv.md (branch on masked/shifted operands): Use
arithmetic rather than logical shift for operand 1.
gcc/testsuite
* gcc.target/riscv/branch-1.c: Update expected output.
* gcc.target/riscv/pr117649.c: New test.
Joseph Myers [Wed, 20 Nov 2024 01:37:30 +0000 (01:37 +0000)]
c: Fix ICE for integer constexpr initializers of wrong type [PR115515]
Bug 115515 (plus its duplicate 117139) reports an ICE with constexpr
initializer for an integer type variable that is not of integer type.
Fix this by not calling int_fits_type_p unless the previous check for
an integer constant expression passes.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/115515
gcc/c/
* c-typeck.cc (check_constexpr_init): Do not call int_fits_type_p
for arguments that are not integer constant expressions.
gcc/testsuite/
* gcc.dg/c23-constexpr-10.c, gcc.dg/gnu23-constexpr-2.c: New
tests.
Pan Li [Sun, 17 Nov 2024 11:21:26 +0000 (19:21 +0800)]
RISC-V: Remove unnecessary option for all other scalar SAT_* testcase
After we create a isolated folder to hold all SAT scalar test,
we have fully control of what optimization options passing to
the testcase. Thus, it is better to remove the unnecessary
work around for flto option, as well as the -O3 option for
each cases. The riscv.exp will pass sorts of different optimization
options for each case.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Mikael Morin [Thu, 8 Aug 2024 10:23:16 +0000 (12:23 +0200)]
fortran: Inline non-character MINLOC/MAXLOC with DIM [PR90608]
Enable generation of inline MINLOC/MAXLOC code in the cases where DIM is a
constant, and either ARRAY is of REAL type or MASK is an array. Those cases
are the remaining bits to fully support inlining of non-CHARACTER
MINLOC/MAXLOC with constant DIM. They are treated together because they
generate similar code, the NANs for REAL types being handled a bit like a
second level of masking. These are the cases for which we generate two
loops.
This change affects the code generating the second loop, that was
previously accessible only in cases ARRAY had rank 1.
The main changes are in gfc_conv_intrinsic_minmaxloc the replacement of the
locally initialized scalarization loop with the one provided and previously
initialized by the scalarizer. Same goes for the locally initialized MASK
scalarizer chain.
As this is enabling the code generating a second loop in a context of
reduction and nested loops, care is taken not to advance the parent
scalarization chain twice.
The scalarization chain element(s) for an array MASK are inserted in the
chain at a different place from that of a scalar MASK. This is done on
purpose to match the code consuming the chains which are in different places
for scalar and array MASK.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return TRUE
for MINLOC/MAXLOC with constant DIM and either REAL ARRAY or
non-scalar MASK.
(walk_inline_intrinsic_minmaxloc): Walk MASK and if it's an array
add the chain obtained before that of ARRAY.
(gfc_conv_intrinsic_minmaxloc): Use the nested loop if there is one.
To evaluate MASK (respectively ARRAY in the second loop), inherit
the scalarizer chain if in a nested loop, otherwise keep using the
chain obtained by walking MASK (respectively ARRAY). If there is a
nested loop, avoid advancing the parent scalarization chain a second
time in the second loop.
Joseph Myers [Tue, 19 Nov 2024 21:31:24 +0000 (21:31 +0000)]
c: Do not register nullptr_t built-in type [PR114869]
As reported in bug 114869, the C front end wrongly creates nullptr_t
as a built-in typedef; it should only be defined in <stddef.h>. While
the type node needs a name for debug info generation, it doesn't need
to be a valid identifier; use typeof (nullptr) instead, similar to how
the C++ front end uses decltype(nullptr) for this purpose.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/114869
gcc/c/
* c-decl.cc (c_init_decl_processing): Register nullptr_type_node
as typeof (nullptr) not nullptr_t.
gcc/testsuite/
* gcc.dg/c23-nullptr-5.c: Use typeof (nullptr) not nullptr_t.
* gcc.dg/c11-nullptr-2.c, gcc.dg/c11-nullptr-3.c,
gcc.dg/c23-nullptr-7.c: New tests
Georg-Johann Lay [Tue, 19 Nov 2024 17:18:20 +0000 (18:18 +0100)]
AVR: target/54378 - Reconsider the default shift costs.
This patch calculates more accurate shift costs, but makes
the costs for larger offsets no more expensive than the costs
for an unrolled shift.
gcc/
PR target/54378
* config/avr/avr.cc (avr_default_shift_costs): New static function.
(avr_rtx_costs_1) [ASHIFT, LSHIFTRT, ASHIFTRT]: Use it
to determine the default shift costs for shifts with a
constant shift offset.
Mikael Morin [Tue, 19 Nov 2024 20:17:37 +0000 (21:17 +0100)]
fortran: Check MASK directly instead of its scalarization chain
Update the conditions used by the inline MINLOC/MAXLOC code generation
function to check directly the properties of MASK instead of the
variable holding its scalarization chain.
The inline implementation of MINLOC/MAXLOC in gfc_conv_intrinsic_minmaxloc
uses several conditions checking the presence of a scalarization chain for
MASK, which means that the argument is present and non-scalar. The next
patch will allow inlining MINLOC/MAXLOC with DIM and MASK, and in that
case the scalarization chain for MASK is initialized elsewhere, so the
variable usually holding it in the function is not used, and the conditions
won't work in that case.
This change updates the conditions to check directly the properties of
MASK so that they work even if the scalarization chain variable is not used.
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Use
conditionals based on the MASK expression rather than on its
scalarization chains.
Jakub Jelinek [Tue, 19 Nov 2024 19:36:00 +0000 (20:36 +0100)]
c-family: Fix ICE with __sync_*_and_* on _BitInt [PR117641]
Only __atomic_* builtins are meant to work on arbitrary _BitInt types
(if not supported in hw we emit a CAS loop which uses __atomic_load_*
in that case), the compatibility __sync_* builtins work only if there
is a corresponding normal integral type (for _BitInt on 32-bit ARM
we'll need to limit even that to no padding, because the padding bits
are well defined there and the hw or libatomic __sync_* APIs don't
guarantee that), IMHO people shouldn't mix very old APIs with very
new ones and I don't see a replacement for the __atomic_load_*.
For size > 16 that is how it already correctly behaves,
in the hunk shown in the patch it is immediately followed by
which returns -1 for the __atomic_* builtins (i.e. !orig_format),
which causes caller to use atomic_bitint_fetch_using_cas_loop,
and otherwise does diagnostic and return 0 (which causes caller
to punt). But for size == 16 if TImode isn't suipported (i.e.
mostly 32-bit arches), we return (correctly) -1 if !orig_format,
so again force atomic_bitint_fetch_using_cas_loop on those arches
for e.g. _BitInt(115), but for orig_format the function returns
16 as if it could do 16 byte __sync_*_and_* (which it can't
because TImode isn't supported; for 16 byte it can only do
(perhaps using libatomic) normal compare and swap). So we need
to error and return 0, rather than return 16.
The following patch ensures that.
2024-11-19 Jakub Jelinek <jakub@redhat.com>
PR c/117641
* c-common.cc (sync_resolve_size): For size == 16 fetch of
BITINT_TYPE if TImode isn't supported scalar mode diagnose
and return 0 if orig_format instead of returning 16.
Jakub Jelinek [Tue, 19 Nov 2024 19:34:36 +0000 (20:34 +0100)]
c: Fix up __builtin_stdc_rotate_{left,right} lowering [PR117456]
Apparently the middle-end/expansion can only handle {L,R}ROTATE_EXPR
on types with mode precision, or large/huge BITINT_TYPE.
So, the following patch uses the rotate exprs only in those cases
where it can be handled, and emits code with shifts/ior otherwise.
As types without mode precision including small/medium BITINT_TYPE
have unlikely power of two precision and TRUNC_MOD_EXPR is on many targets
quite expensive, I chose to expand e.g. __builtin_stdc_rotate_left (arg1,
arg2) as
((tem = arg1, count = arg2 % prec)
? ((tem << count) | (tem >> (prec - count))) : tem)
rather than
(((tem = arg1) << (count = arg2 % prec))
| (tem >> (-count % prec))
(where the assignments are really save_exprs, so no UB), because
I think another TRUNC_MOD_EXPR would be more costly in most cases
when the shift count is non-constant (and when it is constant,
it folds to 2 shifts by constant and ior in either case).
2024-11-19 Jakub Jelinek <jakub@redhat.com>
PR c/117456
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Use LROTATE_EXPR
or RROTATE_EXPR only if type_has_mode_precision_p or if arg1
has BITINT_TYPE with precision larger than MAX_FIXED_MODE_SIZE.
Otherwise build BIT_IOR_EXPR of LSHIFT_EXPR and RSHIFT_EXPR
and wrap it into a COND_EXPR depending on if arg2 is 0 or not.
* c-fold.cc (c_fully_fold_internal): Check for suppression of
-Wshift-count-overflow warning.
gcc/testsuite/
* gcc.dg/builtin-stdc-rotate-4.c: New test.