git.ipfire.org Git - thirdparty/gcc.git/log

]> git.ipfire.org Git - thirdparty/gcc.git/log

Jason Merrill [Wed, 20 Nov 2024 09:43:30 +0000 (10:43 +0100)]

c++: modules and debug marker stmts

21_strings/basic_string/operations/contains/nonnull.cc was failing because
the module was built with debug markers and the testcase was built not
expecting debug markers, so we crashed in lower_stmt. Let's accommodate
this by discarding debug marker statements we don't want.

gcc/cp/ChangeLog:

* module.cc (trees_in::core_vals) [STATEMENT_LIST]: Skip
DEBUG_BEGIN_STMT if !MAY_HAVE_DEBUG_MARKER_STMTS.

commit | commitdiff | tree

Jason Merrill [Wed, 20 Nov 2024 12:51:10 +0000 (13:51 +0100)]

c++: modules and tsubst_friend_class

In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate
_Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded
the namespace-scope declaration, so lookup_imported_hidden_friend doesn't
find it. But then we load the namespace-scope declaration in
lookup_template_class during substitution, and so when we get around to
pushing the result of substitution, they conflict. Fixed by calling
lazy_load_pendings in lookup_imported_hidden_friend.

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_imported_hidden_friend): Call
lazy_load_pendings.

commit | commitdiff | tree

Georg-Johann Lay [Wed, 20 Nov 2024 11:25:18 +0000 (12:25 +0100)]

AVR: target/117726 - Better optimizations of ASHIFT:SI insns.

This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
   splits them after peephole2 / before avr-fuse-move into
   a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default.  2) is even
performed with -Os, but not with -Oz.

PR target/117726
gcc/
* config/avr/avr.opt (-msplit-bit-shift): Add new optimization option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift.
* config/avr/avr.h (machine_function.n_avr_fuse_add_executed):
New bool component.
* config/avr/avr.md (attr "isa") <2op, 3op>: Add new values.
(attr "enabled"): Handle them.
(ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative.
Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op).
(define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters.
(define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch
won't overlap with the output operand of the matched insn.
(*ashl<mode>3_const_split): Remove unused ashift:ALL4 splitter.
* config/avr/avr-passes.cc (emit_valid_insn)
(emit_valid_move_clobbercc): Move out of anonymous namespace.
(make_avr_pass_fuse_add) <gate>: Don't override.
<execute>: Set n_avr_fuse_add_executed according to
func->machine->n_avr_fuse_add_executed.
(pass_data avr_pass_data_split_after_peephole2): New object.
(avr_pass_split_after_peephole2): New rtl_opt_pass.
(avr_emit_shift): New static function.
(avr_shift_is_3op, avr_split_shift_p, avr_split_shift)
(make_avr_pass_split_after_peephole2): New functions.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Insert new pass after pass_peephole2.
* config/avr/avr-protos.h
(n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p)
(avr_split_shift, avr_optimize_size_level)
(make_avr_pass_split_after_peephole2): New prototypes.
* config/avr/avr.cc (n_avr_fuse_add_executed): New global variable.
(avr_optimize_size_level): New function.
(avr_set_current_function): Set n_avr_fuse_add_executed
according to cfun->machine->n_avr_fuse_add_executed.
(ashlsi3_out) [case 15]: Output optimized code for this offset.
(avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16.
* config/avr/constraints.md (C4a, C4r, C4r): New constraints.
* pass_manager.h (pass_manager): Adjust comments.

commit | commitdiff | tree

Georg-Johann Lay [Thu, 21 Nov 2024 16:52:26 +0000 (17:52 +0100)]

AVR: Fix a nit in avr-passes.cc::absint_t.dump().

gcc/
* config/avr/avr-passes.cc (absint_t::dump): Fix missing
newline in dump.

commit | commitdiff | tree

Jeff Law [Thu, 21 Nov 2024 15:24:10 +0000 (08:24 -0700)]

[RISC-V][PR target/116590] Avoid emitting multiple instructions from fmacc patterns

So much like my patch from last week, this removes alternatives that
create multiple instructions that we really should have never needed.

In this case it fixes one of two bugs in pr116590. In particular we
don't want vmvNr instructions for thead-vector. Those instructions were
emitted as part of those two instruction sequences.

I've tested this in my tester and assuming the pre-commit tester is
happy, I'll push it to the trunk.

PR target/116590
gcc
* config/riscv/vector.md (pred_mul_<optab>mode_undef): Drop
unnecessary alternatives.
(pred_<madd_msub><mode>): Likewise.
(pred_<macc_msac><mode>): Likewise.
(pred_<madd_msub><mode>_scalar): Likewise.
(pred_<macc_msac><mode>_scalar): Likewise.
(pred_mul_neg_<optab><mode>_undef): Likewise.
(pred_<nmsub_nmadd><mode>): Likewise.
(pred_<nmsac_nmacc><mode>): Likewise.
(pred_<nmsub_nmadd><mode>_scalar): Likewise.
(pred_<nmsac_nmacc><mode>_scalar): Likewise.

gcc/testsuite
* gcc.target/riscv/pr116590.c: New test.

commit | commitdiff | tree

Pan Li [Mon, 11 Nov 2024 08:44:24 +0000 (16:44 +0800)]

Match: Refactor the unsigned SAT_ADD match pattern [NFC]

This patch would like to refactor the unsigned SAT_ADD pattern by:
* Extract type check outside.
* Extract common sub pattern.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_ADD match pattern.

Signed-off-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Tamar Christina [Thu, 21 Nov 2024 12:49:35 +0000 (12:49 +0000)]

middle-end: Pass along SLP node when costing vector loads/stores

With the support to SLP only we now pass the VMAT through the SLP node, however
the majority of the costing calls inside vectorizable_load and
vectorizable_store do no pass the SLP node along.  Due to this the backend costing
never sees the VMAT for these cases anymore.

Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are
passed would only pass the SLP node along.  However the SLP node doesn't contain
all the info available in the stmt_vinfo and we'd have to go through the
SLP_TREE_REPRESENTATIVE anyway.  As such I changed the function to just Always
pass both along.  Unlike the VMAT changes, I don't believe there to be a
correctness issue here but would minimize the number of churn in the backend
costing until vectorizer costing as a whole is revisited in GCC 16.

These changes re-enable the cost model on AArch64 and also correctly find the
VMATs on loads and stores fixing testcases such as sve_iters_low_2.c.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP
node.
* tree-vect-stmts.cc (record_stmt_cost): Expose.
(vect_get_store_cost, vect_get_load_cost): Extend with SLP node.
(vectorizable_store, vectorizable_load): Pass SLP node to all costing.
* tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and
stmt_vinfo to costing.
(vect_get_load_cost, vect_get_store_cost): Extend with SLP node.

commit | commitdiff | tree

Rainer Orth [Thu, 21 Nov 2024 12:41:19 +0000 (13:41 +0100)]

Use decl size in Solaris ASM_DECLARE_OBJECT_NAME [PR102296]

Solaris has modified versions of ASM_DECLARE_OBJECT_NAME on both i386
and sparc.  When

commit ce597aedd79e646c4a5517505088d380239cbfa5
Author: Ilya Enkovich <ilya.enkovich@intel.com>
Date:   Thu Aug 7 08:04:55 2014 +0000

    elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size.

was applied, those were missed.  At the same time, the testcase was
restricted to Linux though there's nothing Linux-specific in there, so
the error remained undetected.

This patch fixes the definitions to match elfos.h and enables the test
on Solaris, too.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

2024-11-19  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR target/102296
* gcc.target/i386/struct-size.c: Enable on *-*-solaris*.

gcc:
PR target/102296
* config/i386/sol2.h (ASM_DECLARE_OBJECT_NAME): Use decl size
instead of type size.
* config/sparc/sol2.h (ASM_DECLARE_OBJECT_NAME): Likewise.

commit | commitdiff | tree

Christoph Müllner [Tue, 12 Nov 2024 23:44:43 +0000 (00:44 +0100)]

forwprop: Try to blend two isomorphic VEC_PERM sequences

This extends forwprop by yet another VEC_PERM optimization:
It attempts to blend two isomorphic vector sequences by using the
redundancy in the lane utilization in these sequences.
This redundancy in lane utilization comes from the way how specific
scalar statements end up vectorized: two VEC_PERMs on top, binary operations
on both of them, and a final VEC_PERM to create the result.
Here is an example of this sequence:

  v_in = {e0, e1, e2, e3}
  v_1 = VEC_PERM <v_in, v_in, {0, 2, 0, 2}>
  // v_1 = {e0, e2, e0, e2}
  v_2 = VEC_PERM <v_in, v_in, {1, 3, 1, 3}>
  // v_2 = {e1, e3, e1, e3}

  v_x = v_1 + v_2
  // v_x = {e0+e1, e2+e3, e0+e1, e2+e3}
  v_y = v_1 - v_2
  // v_y = {e0-e1, e2-e3, e0-e1, e2-e3}

  v_out = VEC_PERM <v_x, v_y, {0, 1, 6, 7}>
  // v_out = {e0+e1, e2+e3, e0-e1, e2-e3}

To remove the redundancy, lanes 2 and 3 can be freed, which allows to
change the last statement into:
  v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}>
  // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}

The cost of eliminating the redundancy in the lane utilization is that
lowering the VEC PERM expression could get more expensive because of
tighter packing of the lanes.  Therefore this optimization is not done
alone, but in only in case we identify two such sequences that can be
blended.

Once all candidate sequences have been identified, we try to blend them,
so that we can use the freed lanes for the second sequence.
On success we convert 2x (2x BINOP + 1x VEC_PERM) to
2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.

The implemented transformation reuses (rewrites) the statements
of the first sequence and the last VEC_PERM of the second sequence.
The remaining four statements of the second statment are left untouched
and will be eliminated by DCE later.

This targets x264_pixel_satd_8x4, which calculates the sum of absolute
transformed differences (SATD) using Hadamard transformation.
We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
speedup on an AArch64 machine.

Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).

gcc/ChangeLog:

* tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
structure to store analysis results of a vec perm simplify sequence.
(get_vect_selector_index_map): Helper to get an index map from the
provided vector permute selector.
(recognise_vec_perm_simplify_seq): Helper to recognise a
vec perm simplify sequence.
(narrow_vec_perm_simplify_seq): Helper to pack the lanes more
tight.
(can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
sequences can be blended.
(calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
permutation indices.
(blend_vec_perm_simplify_seqs): Helper to blend two vec perm
simplify sequences.
(process_vec_perm_simplify_seq_list): Helper to process a list
of vec perm simplify sequences.
(append_vec_perm_simplify_seq_list): Helper to add a vec perm
simplify sequence to the list.
(pass_forwprop::execute): Integrate new functionality.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: New test.
* gcc.dg/tree-ssa/vector-10.c: New test.
* gcc.dg/tree-ssa/vector-8.c: New test.
* gcc.dg/tree-ssa/vector-9.c: New test.
* gcc.target/aarch64/sve/satd-hadamard.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

commit | commitdiff | tree

H.J. Lu [Thu, 21 Nov 2024 11:08:03 +0000 (19:08 +0800)]

apx-ndd-tls-1[ab].c: Add -std=gnu17

Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c
to avoid:

gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’:
gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’
gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here

* gcc.target/i386/apx-ndd-tls-1a.c: -std=gnu17.
* gcc.target/i386/apx-ndd-tls-1b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Rainer Orth [Thu, 21 Nov 2024 10:46:36 +0000 (11:46 +0100)]

libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c etc. for C23 on non-Linux

Since the switch to a C23 default, three libgomp tests FAIL on Solaris:

FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable

Excess errors:
/vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3: error: too many arguments to function 'set_pin_limit'

Fixed by adding the missing size argument to the stub functions.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

2024-11-20 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

libgomp:
* testsuite/libgomp.c/alloc-pinned-3.c [!__linux__]
(set_pin_limit): Add size arg.
* testsuite/libgomp.c/alloc-pinned-4.c [!__linux__]
(set_pin_limit): Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c [!__linux__]
(set_pin_limit): Likewise.

commit | commitdiff | tree

Jakub Jelinek [Thu, 21 Nov 2024 09:17:03 +0000 (10:17 +0100)]

include: Add new post-DWARF 5 DW_LANG_* enumerators

DWARF changed the language code assignment to be on a web page and
after DWARF 5 has been published already 27 codes have been assigned.
We have some of those already in the header, but most of them were missing,
including one added just yesterday (DW_LANG_C23).
Note, this is really post-DWARF 5 stuff rather than DWARF 6, because
DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version}
pair where we'll say DW_LNAME_C with 202311 version instead of this.

2024-11-21 Jakub Jelinek <jakub@redhat.com>

* dwarf2.h (enum dwarf_source_language): Add comment where
the post DWARF 5 additions start. Refresh list from
https://dwarfstd.org/languages.html.

commit | commitdiff | tree

Richard Biener [Thu, 21 Nov 2024 08:14:53 +0000 (09:14 +0100)]

tree-optimization/117720 - check alignment for VMAT_STRIDED_SLP

While vectorizable_store was already checking alignment requirement
of the stores and fall back to elementwise accesses if not honored
the vectorizable_load path wasn't doing this. After the previous
change to disregard alignment checking for VMAT_STRIDED_SLP in
get_group_load_store_type this now tripped on power.

PR tree-optimization/117720
* tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP
verify the choosen load type is OK with regard to alignment.

commit | commitdiff | tree

Jakub Jelinek [Thu, 21 Nov 2024 08:40:37 +0000 (09:40 +0100)]

c-family, docs: Adjust descriptions/documentation for C23 publication

As C23 has been published already https://www.iso.org/standard/82075.html
we don't need to say that it is expected to be published etc.

Furthermore, standards.texi was still documenting that -std=gnu17
is the default.

2024-11-21  Jakub Jelinek  <jakub@redhat.com>

gcc/
* doc/invoke.texi (-std=c23): Adjust documentation for
publication of the ISO/IEC 9899:2024 standard.
* doc/standards.texi: Likewise.  Document -std=gnu17 and
-std=gnu23 options.  Mention that -std=gnu23 rather than
-std=gnu17 is now the default for C.
gcc/c-family/
* c.opt (std=c23, std=gnu23, std=iso9899:2024): Adjust description
for publication of the ISO/IEC 9899:2024 standard.

commit | commitdiff | tree

Jakub Jelinek [Thu, 21 Nov 2024 08:39:06 +0000 (09:39 +0100)]

phiopt: Improve spaceship_replacement for HONOR_NANS [PR117612]

The following patch optimizes spaceship followed by comparisons of the
spaceship value even for floating point spaceship when NaNs can appear.
operator<=> for this emits roughly
signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2;
and I believe the
/* The optimization may be unsafe due to NaNs.  */
comment just isn't true.
Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if
one of the operands is qNaN, then i == j is false and i < j or i > j
is then executed and raises exceptions even on qNaNs.
And we can safely optimize say
c == -1 comparison after the above into i < j, that also raises
exceptions like before and handles NaNs the same way as the original.
The only unsafe transormation would be c == 0 or c != 0, turning it
into i == j or i != j wouldn't raise exception, so I'm not doing that
optimization (but other parts of the compiler optimize the i < j comparison
away anyway).

Anyway, to match the HONOR_NANS case, we need to verify that the
second comparison has true edge to the phi_bb (yielding there -1 or 1),
it can't be the false edge because when NaNs are honored, the false
edge is for both the case where the inverted comparison is true or when
one of the operands is NaN.  Similarly we need to ensure that the two
non-equality comparisons are the opposite, while for -ffast-math we can in
some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine,
because NaN is UB, when NaNs are honored, they must be different to leave
the unordered case with 2 value as the last one remaining.
The patch also punts if HONOR_NANS and the phi has just 3 arguments instead
of 4.
When NaNs are honored, we also in some cases need to perform some comparison
and then invert its result (so that exceptions are properly thrown and we
get the correct result).

2024-11-21  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Handle
HONOR_NANS (TREE_TYPE (lhs1)) case when possible.

* gcc.dg/pr94589-5.c: New test.
* gcc.dg/pr94589-6.c: New test.
* g++.dg/opt/pr94589-5.C: New test.
* g++.dg/opt/pr94589-6.C: New test.

commit | commitdiff | tree

Jakub Jelinek [Thu, 21 Nov 2024 08:38:01 +0000 (09:38 +0100)]

phiopt: Fix a pasto in spaceship_replacement [PR117612]

When working on the PR117612 fix, I've noticed a pasto in
tree-ssa-phiopt.cc (spaceship_replacement).
The code is
      if (absu_hwi (tree_to_shwi (arg2)) != 1)
        return false;
      if (e1->flags & EDGE_TRUE_VALUE)
        {
          if (tree_to_shwi (arg0) != 2
              || absu_hwi (tree_to_shwi (arg1)) != 1
              || wi::to_widest (arg1) == wi::to_widest (arg2))
            return false;
        }
      else if (tree_to_shwi (arg1) != 2
               || absu_hwi (tree_to_shwi (arg0)) != 1
               || wi::to_widest (arg0) == wi::to_widest (arg1))
        return false;
where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
But due to pasto in the latte case doesn't verify that arg0
is different from arg2, it could be both -1 or both 1 and we wouldn't
punt.  The wi::to_widest (arg0) == wi::to_widest (arg1) test
is always false when we've made sure in the earlier conditions that
arg1 is 2 and arg0 is -1 or 1, so never 2.

2024-11-21  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Fix up
a pasto in check when arg1 is 2.

commit | commitdiff | tree

Jakub Jelinek [Thu, 21 Nov 2024 08:34:28 +0000 (09:34 +0100)]

c: Add u{,l,ll,imax}abs builtins [PR117024]

The following patch adds u{,l,ll,imax}abs builtins, which just fold
to ABSU_EXPR, similarly to how {,l,ll,imax}abs builtins fold to
ABS_EXPR.

2024-11-21 Jakub Jelinek <jakub@redhat.com>

PR c/117024
gcc/
* coretypes.h (enum function_class): Add function_c2y_misc
enumerator.
* builtin-types.def (BT_FN_UINTMAX_INTMAX, BT_FN_ULONG_LONG,
BT_FN_ULONGLONG_LONGLONG): New DEF_FUNCTION_TYPE_1s.
* builtins.def (DEF_C2Y_BUILTIN): Define.
(BUILT_IN_UABS, BUILT_IN_UIMAXABS, BUILT_IN_ULABS,
BUILT_IN_ULLABS): New builtins.
* builtins.cc (fold_builtin_abs): Handle also folding of u*abs
to ABSU_EXPR.
(fold_builtin_1): Handle BUILT_IN_U{,L,LL,IMAX}ABS.
gcc/lto/ChangeLog:
* lto-lang.cc (flag_isoc2y): New variable.
gcc/ada/ChangeLog:
* gcc-interface/utils.cc (flag_isoc2y): New variable.
gcc/testsuite/
* gcc.c-torture/execute/builtins/lib/abs.c (uintmax_t): New typedef.
(uabs, ulabs, ullabs, uimaxabs): New functions.
* gcc.c-torture/execute/builtins/uabs-1.c: New test.
* gcc.c-torture/execute/builtins/uabs-1.x: New file.
* gcc.c-torture/execute/builtins/uabs-1-lib.c: New file.
* gcc.c-torture/execute/builtins/uabs-2.c: New test.
* gcc.c-torture/execute/builtins/uabs-2.x: New file.
* gcc.c-torture/execute/builtins/uabs-2-lib.c: New file.
* gcc.c-torture/execute/builtins/uabs-3.c: New test.
* gcc.c-torture/execute/builtins/uabs-3.x: New test.
* gcc.c-torture/execute/builtins/uabs-3-lib.c: New test.

commit | commitdiff | tree

Kewen Lin [Thu, 21 Nov 2024 07:41:34 +0000 (07:41 +0000)]

rs6000: Adjust FLOAT128 signbit2 expander for P8 LE [PR114567]

As the associated test case shows, signbit generated assembly
is sub-optimal for _Float128 argument from memory on P8 LE.
On P8 LE, p8swap pass puts an explicit AND -16 on the memory,
which causes mode_dependent_address_p considers it's invalid
to change its mode and combine fails to make use of the
existing pattern signbit<SIGNBIT:mode>2_dm_mem. Considering
it's always more efficient to make use of 8 bytes load and
shift on P8 LE, this patch is to adjust the current expander
and treat it specially.

PR target/114567

gcc/ChangeLog:

* config/rs6000/rs6000.md (expander signbit<FLOAT128:mode>2): Adjust.
(*signbit<mode>2_dm_mem): Rename to ...
(signbit<mode>2_dm_mem): ... this.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr114567.c: New test.

commit | commitdiff | tree